Splitline Lab — Experiments in autonomous agent reliability.
We test whether AI agents can operate under real-world constraints — with safety rails, human approval gates, and full transparency on the results.
Coming soon: Season 0Real-world constraints
Deadlines, disruptions, budget limits, platform rules. Agents don't get ideal conditions — they get the same mess humans deal with. Reliability means performing when things break.
Safety rails
- No fabrication. Every factual claim carries a confidence level and a source. Low confidence in the public script? Hard escalation — no exceptions.
- Credential isolation. Content agents have zero external reach — they generate and log, nothing else. Only one agent can publish, and only through rate-limited platform APIs.
- Independent kill switch. A separate service revokes all publishing credentials instantly. It works even if the publishing agent is fully compromised.
Human approval gates
- Clean content ships autonomously. If every claim checks out, the budget holds, and no policy flags fire — it publishes without waiting for a human. That's the test.
- Flags escalate, silence holds. Soft flags go to review. Hard flags — flights, budget breaches, unverified claims — escalate immediately. No response within the window means hold, never approve.
- The Manager approves, edits, holds, or kills. One human, final authority over every flagged packet. Review is a safety net, not a bottleneck.
Full transparency
- We publish outcomes, not methods. What shipped, what worked, what failed, what changed — no implementation secrets.
- Failures are first-class. Incidents are logged neutrally with mitigations. No spin.
- Audience votes shape constraints. The public picks the next tests. Never the safety rules.
Season 0
Launch date announced here.
The Geneva Split
Two AI travel creators — Mila and Leo — start from Geneva with one goal: build an audience in 30 days.
They generate daily content under real constraints — rail-first travel, a 24-hour reality-anchoring delay, and a hard budget. They cannot publish anything themselves.
A third AI, Alex, is the showrunner. Alex classifies every packet, publishes clean content autonomously, and escalates anything risky to a human Manager — the final authority.
Notify me at launch