How I Built an AI Agent Content Pipeline — And What Actually Broke

by Bryan Chua

The first time the pipeline broke, it was because Roy — my AI pressure-tester — decided my post idea was too generic to publish.

He wasn’t wrong. That was the uncomfortable part.

I’d given the content team a brief about enterprise AI adoption, the kind of thing I’ve been living and breathing at GoPomelo for the past two years. Roy came back with a list of objections: no strong POV, no counterintuitive angle, nothing a reader couldn’t find in a McKinsey slide deck. I sat with that feedback for a moment, reminded myself it was an LLM, and then rewrote the brief anyway. Because he was right.

That’s not the story I expected to tell when we started building this.

The Octonauts is a six-agent AI system designed to run a full editorial pipeline — from brief to published post — with minimal human intervention at each stage.

How the Octonauts AI Pipeline Works

Somewhere between running engineering teams at Amazon and co-founding ShopBack, I learned that the bottleneck in most content operations isn’t ideas — it’s throughput. Good ideas rot in backlogs. The brief that would’ve been timely in October gets published in February when no one cares anymore.

I wanted to fix that for bryanchua.com. Not with a content agency, not by hiring a writer I’d spend half my time managing. I wanted to see if a multi-agent AI system could actually carry a real editorial pipeline from brief to published post — with me only showing up where it mattered.

We called them the Octonauts. Six agents, one agentic workflow:

  • Uma orchestrates. She receives the brief and routes the work.
  • Roy pressure-tests. He plays devil’s advocate on the topic, the angle, the positioning.
  • Looker validates the market signal — is this actually something people are searching for, thinking about, talking about right now?
  • Deer drafts. In my voice. (Yes, this post was drafted by an agent. We’ll come back to that.)
  • Queen handles SEO and AEO — structured to be found, structured to be cited.
  • Jelly publishes. GitHub commit, live to the site.

On paper, this is elegant. In practice, it’s been more like running a junior team that’s occasionally brilliant, frequently overconfident, and completely incapable of knowing what they don’t know.

What Broke in Our AI Agent Workflow

The feedback loops took longer than expected. I assumed agents passing outputs to each other would be fast. It is fast — but fast and right are different things. When Looker surfaced market data that contradicted Roy’s positioning recommendation, we didn’t have a clean escalation path. The agents didn’t naturally reconcile; they needed me to arbitrate. That’s a design gap I hadn’t anticipated.

Roy got in his own way. Pressure-testing is only useful if the pressure-tester has calibrated taste. Roy’s default mode early on was skepticism for its own sake — flagging things as too generic when they were actually just direct. I had to tune his prompting to distinguish between vague and uncommitted (bad) and clear and confident (fine, actually good). That tuning took real iteration.

Voice is surprisingly hard to delegate. Deer drafts in my voice. But my voice is built from years of specific experience — the texture of what it actually felt like to scale ShopBack’s engineering team from six to sixty, the particular kind of exhaustion that comes with enterprise sales cycles, the way I think about the gap between AI demos and AI deployments. You can approximate that with a good prompt and enough examples. You cannot replicate it wholesale. Every draft Deer produces, I edit. Not heavily — but meaningfully. The shape is right. The soul still needs me.

The approval step is not a formality. I thought my job in this pipeline would be a quick read-through and a thumbs up. That’s not how it works. Bryan-approves is load-bearing. If I’m distracted, if I skim, things slip through that I’d never consciously publish. The pipeline is only as good as my attention when it counts.

What the AI Agents Got Right — And Where They Still Fall Short

The agents are better at structure than I expected and worse at judgment than I hoped.

Roy structuring a critique, Looker pulling together a market framing, Queen organizing a post for discoverability — that’s genuinely good work. I would’ve paid a freelancer for that output and been satisfied.

But judgment — the call about whether a paragraph lands, whether a comparison is apt, whether now is the right moment to be provocative versus measured — that still sits with me. It’s not that the agents don’t try. It’s that they can’t feel the weight of a room. They don’t know when a claim needs more humility because of something that happened in the industry last week that changed the conversation.

That gap is real. And it’s not closing as fast as the hype would suggest.

The other surprise: the pipeline made me more intentional. When I know Roy is going to push back, I pre-sharpen my angle. The discipline the pipeline demands of me has made the content better — even in the moments when the AI content automation itself isn’t adding much.

Is an AI Agent Content Pipeline Worth It? The Honest Cost-Benefit

Is this faster than doing it myself? Yes — on the mechanical parts. Research synthesis, initial structure, SEO optimization, publishing. I’ve probably reclaimed two to three hours per post.

Is it faster than hiring a good human writer? Not obviously. A great content operator would catch the judgment gaps the agents miss. They’d also push back on me in ways that are harder to dismiss than an LLM objection. The human variable is underrated.

What the agents do give me that a human doesn’t: no ego, no availability constraints, no context-switching cost. They’re ready when I am. They don’t need a debrief call.

For where I am right now — building this as a side system while running full-time at GoPomelo and Digital China Group — that tradeoff works.

My Honest Take on AI Agents for Content Creation

I used to be skeptical of ‘AI agents for content’ as a category. Too much demo energy, not enough production reality.

I’m still skeptical of the hype. But I’m no longer skeptical of the underlying capability. The Octonauts aren’t replacing editorial judgment. They’re handling the 60% of the work that doesn’t require it — which frees me to spend more time on the 40% that does.

The honest version of what I built: a pipeline that makes my thinking more rigorous and my publishing more consistent, in exchange for real setup cost, ongoing calibration, and the humility to accept that the output will always need me.

That’s not an AI taking over content creation. That’s a well-designed system making a busy operator slightly less bottlenecked.

Which, honestly, was always the point.


This model works when: You’re a solo operator or small team with high output pressure, your voice is established enough to be approximated, and your bottleneck is throughput not ideation.

This model breaks when: You’re scaling a team that needs a strong editorial voice, you don’t have time to calibrate the agents, or judgment-heavy content is the core of what you publish.


Key Lessons From Running an AI Agent Editorial Team

  1. Feedback loops between agents need explicit escalation paths — they won’t self-arbitrate.
  2. Pressure-testing agents need calibrated taste, not default skepticism.
  3. Voice approximation is achievable; voice replication is not.
  4. The human approval step is load-bearing, not ceremonial.
  5. Agents outperform on structure; humans still own judgment.

Bryan Chua is CTO & Director at GoPomelo and Digital China Group, and co-founder of ShopBack. He writes about enterprise technology, AI adoption, and what building actually looks like.