<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://bryanchua.com/atom.xml" rel="self" type="application/atom+xml" /><link href="https://bryanchua.com/" rel="alternate" type="text/html" /><updated>2026-05-01T16:06:11+00:00</updated><id>https://bryanchua.com/atom.xml</id><title type="html">Bryan’s Notes</title><subtitle>Essays on Enterprise AI, Startups, and Engineering Leadership by Bryan Chua.</subtitle><author><name>Bryan Chua</name></author><entry><title type="html">Two Roads, One Destination: Product Engineering vs Business Engineering</title><link href="https://bryanchua.com/strategy/2026/04/30/product-vs-business-engineering/" rel="alternate" type="text/html" title="Two Roads, One Destination: Product Engineering vs Business Engineering" /><published>2026-04-30T00:00:00+00:00</published><updated>2026-04-30T00:00:00+00:00</updated><id>https://bryanchua.com/strategy/2026/04/30/product-vs-business-engineering</id><content type="html" xml:base="https://bryanchua.com/strategy/2026/04/30/product-vs-business-engineering/"><![CDATA[<p><img src="/assets/images/two-roads-product-vs-business-engineering.jpg" alt="Two roads converge on a single horizon — one paved with circuit boards, coins, and telecom towers; the other with collaborative document trees and UI cards." /></p>

<p><em>by Bryan Chua</em></p>

<p>Singtel bought its way into seven markets. Notion let users invite their friends. One writes cheques. The other ships features. Stand far enough back and you realise both are running the same machine — just with different instruments bolted onto the front.</p>

<blockquote>
  <p>Growth is never accidental. It is always engineered. The question is only which lever you are pulling.</p>
</blockquote>

<p>The Singtel story reads like a chess board. Stakes in Airtel, Telkomsel, Globe, AIS, Optus. Technology layers laid on top through NCS and Optus Enterprise. Venture bets routed through Innov8 — Trustwave, Amobee, a string of cloud and AI adjacencies. The moat is structural: 700 million subscribers sitting under one Group-level P&amp;L. No amount of organic marketing gets you that footprint. You have to buy it.</p>

<p>The Notion story reads like a growth curve. A free tier generous enough that students build their second brains on it. Shared pages that quietly double as sales demos. A public template gallery where superusers do the marketing. Every time a team creates a database, invites a collaborator, or ships a public doc, the system gets a little stickier, a little more entangled with the way that team thinks.</p>

<p>Two very different motions. Same underlying move.</p>

<h2 id="what-does-it-mean-to-engineer-a-business">What does it mean to <em>engineer</em> a business?</h2>

<p>Engineering, in the business sense, is not about code. It is about designing a system whose outputs compound without linear effort. Three flavours matter:</p>

<p><strong>Technology engineering</strong> is the one most people think of first — building the systems, the data pipelines, the infrastructure. It is necessary, but it is always a means to an end. A telco with excellent engineering and no subscribers is a very well-run museum.</p>

<p><strong>Business engineering</strong> is the discipline of owning market structure. You use capital, equity, partnerships, and M&amp;A to shape the competitive landscape before the race even starts. You don’t win customer by customer — you win by owning the pipes the customers flow through.</p>

<p><strong>Product engineering</strong> is the discipline of owning the user’s workflow. You use free tiers, activation loops, embedded sharing, and virality to embed yourself in the way work gets done. You don’t sign contracts — you become muscle memory.</p>

<p>The distinction matters because each flavour has its own physics. Business engineering is slow, expensive, and high-stakes. Product engineering is fast, cheap, and requires relentless iteration. But the logic at the top is identical: design a system where every move makes the next move easier.</p>

<h2 id="singtels-playbook-owning-the-pipes">Singtel’s playbook: owning the pipes</h2>

<p>Singtel does not think in MAUs. It thinks in market coverage. Regional telco stakes give it structural presence in Southeast Asia and India — geographies where organic growth would take decades. Technology service layers like NCS and Optus Enterprise sit on top of that base, selling integration, cloud, and managed services back into the same markets Singtel already owns infrastructure in. Innov8, the corporate venture arm, places smaller bets on adjacencies — cybersecurity, ad-tech, AI — that can plug back into the main business if they pay off.</p>

<p>The growth signal is not Net Promoter Score. It is ARPU, subscriber count, market share, and enterprise contract value. These are slow-moving, capital-intensive numbers. You don’t A/B test your way to a stake in Airtel. You negotiate, you write cheques, you wait.</p>

<blockquote>
  <p>Capital buys you a position that no amount of product iteration can replicate. But only if you have a plan for what to do once you own it.</p>
</blockquote>

<p>This is the trap. Business engineering without product engineering produces sprawl — a federation of assets that don’t talk to each other, don’t share a customer graph, and don’t compound. The moat exists, but the water evaporates.</p>

<h2 id="notions-playbook-owning-the-workflow">Notion’s playbook: owning the workflow</h2>

<p>Notion’s wedge is the free tier. Not free as a pricing tactic — free as a distribution strategy. Every free user is a demo. Every shared page is a billboard. Every template posted to the public gallery is an outbound motion that the company did not have to fund.</p>

<p>The signals Notion watches replace the traditional top-of-funnel. Database creation rate. Invite acceptance. Daily active usage inside a workspace. When a team hits a certain threshold of pages, blocks, and shared docs, something flips — Notion stops being an app and becomes the company’s institutional memory. That is the real switching cost. Not the export format. Not the API. The fact that half your team’s brain is now shaped by the tool.</p>

<p>Product engineering of this kind is cheap to start and brutal to scale. You can launch a better editor in a weekend. You cannot manufacture the years of user habit that turn a note-taking app into a workspace.</p>

<h2 id="the-uncomfortable-truth">The uncomfortable truth</h2>

<p>Look past the optics and Singtel and Notion are playing the same four-move game.</p>

<p><strong>Same lock-in logic, different mechanism.</strong> Singtel locks in through infrastructure ownership and enterprise contracts. Notion locks in through workflow dependency and institutional memory. Both end in the same place: a customer for whom leaving is more expensive than staying.</p>

<p><strong>Same expansion signal, different metric.</strong> Singtel tracks subscribers, ARPU, enterprise seats. Notion tracks workspaces, invites, daily actives. Both are counting depth of penetration into the customer’s life — one through the wallet, one through the calendar.</p>

<p><strong>Same new-market entry move, different instrument.</strong> Singtel enters a market by buying a local operator. Notion enters a market by releasing a free tier and watching who bites. The capital cost is vastly different. The underlying bet — <em>plant a flag, then expand</em> — is identical.</p>

<p><strong>Same failure mode: fragmentation without integration.</strong> A telco conglomerate with no group-level data strategy is a holding company pretending to be a platform. A product suite with no account-level story is a collection of nice apps pretending to be a system. Both die the same way — by sprawling faster than they compound.</p>

<h2 id="what-each-can-learn-from-the-other">What each can learn from the other</h2>

<p>Singtel’s blind spot is product signal thinking. Its enterprise services arms sell like traditional telcos — long procurement cycles, relationship selling, master agreements. But the people buying cloud, cybersecurity, and AI today behave like product users. They want to try before they buy, see activation within a week, and expand on evidence. Singtel needs to learn the grammar of PLG, or watch its enterprise wallet share migrate to vendors who speak it natively.</p>

<p>Notion’s blind spot is business engineering. As it scales into the enterprise, the game changes. Procurement cares about security reviews, not templates. Channel partners, systems integrators, and strategic investments start to matter more than virality. The same acquisition tactics that got Notion to a million teams will not get it embedded in the Fortune 500. At some point, every great product company has to learn to write cheques.</p>

<p><img src="/assets/images/compounding-machine-loop.jpg" alt="A four-node feedback loop — Product Signals, Capital Allocation, Distribution, User Adoption — flowing clockwise to illustrate the compounding machine." /></p>

<h2 id="the-compounding-machine">The compounding machine</h2>

<p>The instinct is to pick a side. Pick the telco model or the SaaS model. Pick capital or product. The companies that actually win refuse the binary.</p>

<blockquote>
  <p>Product signals tell you where to place capital. Capital unlocks distribution that product alone cannot reach. Distribution creates surface area for more product signals. The loop closes.</p>
</blockquote>

<p>That is the machine. Singtel’s next decade depends on whether it can read product signals well enough to direct its capital intelligently — away from dying assets, toward compounding ones. Notion’s next decade depends on whether it can engineer the business side — partnerships, acquisitions, enterprise motion — without losing the product instinct that got it here.</p>

<p>Two roads. One destination. The companies that engineer the feedback loop between product and capital don’t just grow. They compound — and compounding, given enough time, looks a lot like winning.</p>]]></content><author><name>Bryan Chua</name></author><category term="Strategy" /><category term="Singtel" /><category term="Notion" /><category term="Product-Led Growth" /><category term="Business Engineering" /><category term="Telco" /><category term="Strategy" /><summary type="html"><![CDATA[Singtel builds empires through capital and acquisitions. Notion acquires users through product signals. The tactics are worlds apart — the endgame is identical.]]></summary></entry><entry><title type="html">How I Built an AI Agent Content Pipeline — And What Actually Broke</title><link href="https://bryanchua.com/tech/2026/03/25/building-a-team-of-ai-agents/" rel="alternate" type="text/html" title="How I Built an AI Agent Content Pipeline — And What Actually Broke" /><published>2026-03-25T00:00:00+00:00</published><updated>2026-03-25T00:00:00+00:00</updated><id>https://bryanchua.com/tech/2026/03/25/building-a-team-of-ai-agents</id><content type="html" xml:base="https://bryanchua.com/tech/2026/03/25/building-a-team-of-ai-agents/"><![CDATA[<p><em>by Bryan Chua</em></p>

<p>The first time the pipeline broke, it was because Roy — my AI pressure-tester — decided my post idea was too generic to publish.</p>

<p>He wasn’t wrong. That was the uncomfortable part.</p>

<p>I’d given the content team a brief about enterprise AI adoption, the kind of thing I’ve been living and breathing at GoPomelo for the past two years. Roy came back with a list of objections: no strong POV, no counterintuitive angle, nothing a reader couldn’t find in a McKinsey slide deck. I sat with that feedback for a moment, reminded myself it was an LLM, and then rewrote the brief anyway. Because he was right.</p>

<p>That’s not the story I expected to tell when we started building this.</p>

<p>The Octonauts is a six-agent AI system designed to run a full editorial pipeline — from brief to published post — with minimal human intervention at each stage.</p>

<h2 id="how-the-octonauts-ai-pipeline-works">How the Octonauts AI Pipeline Works</h2>

<p>Somewhere between running engineering teams at Amazon and co-founding ShopBack, I learned that the bottleneck in most content operations isn’t ideas — it’s throughput. Good ideas rot in backlogs. The brief that would’ve been timely in October gets published in February when no one cares anymore.</p>

<p>I wanted to fix that for bryanchua.com. Not with a content agency, not by hiring a writer I’d spend half my time managing. I wanted to see if a multi-agent AI system could actually carry a real editorial pipeline from brief to published post — with me only showing up where it mattered.</p>

<p>We called them the Octonauts. Six agents, one agentic workflow:</p>

<ul>
  <li><strong>Uma</strong> orchestrates. She receives the brief and routes the work.</li>
  <li><strong>Roy</strong> pressure-tests. He plays devil’s advocate on the topic, the angle, the positioning.</li>
  <li><strong>Looker</strong> validates the market signal — is this actually something people are searching for, thinking about, talking about right now?</li>
  <li><strong>Deer</strong> drafts. In my voice. (Yes, this post was drafted by an agent. We’ll come back to that.)</li>
  <li><strong>Queen</strong> handles SEO and AEO — structured to be found, structured to be cited.</li>
  <li><strong>Jelly</strong> publishes. GitHub commit, live to the site.</li>
</ul>

<p>On paper, this is elegant. In practice, it’s been more like running a junior team that’s occasionally brilliant, frequently overconfident, and completely incapable of knowing what they don’t know.</p>

<h2 id="what-broke-in-our-ai-agent-workflow">What Broke in Our AI Agent Workflow</h2>

<p><strong>The feedback loops took longer than expected.</strong> I assumed agents passing outputs to each other would be fast. It is fast — but fast and right are different things. When Looker surfaced market data that contradicted Roy’s positioning recommendation, we didn’t have a clean escalation path. The agents didn’t naturally reconcile; they needed me to arbitrate. That’s a design gap I hadn’t anticipated.</p>

<p><strong>Roy got in his own way.</strong> Pressure-testing is only useful if the pressure-tester has calibrated taste. Roy’s default mode early on was skepticism for its own sake — flagging things as too generic when they were actually just direct. I had to tune his prompting to distinguish between <em>vague and uncommitted</em> (bad) and <em>clear and confident</em> (fine, actually good). That tuning took real iteration.</p>

<p><strong>Voice is surprisingly hard to delegate.</strong> Deer drafts in my voice. But my voice is built from years of specific experience — the texture of what it actually felt like to scale ShopBack’s engineering team from six to sixty, the particular kind of exhaustion that comes with enterprise sales cycles, the way I think about the gap between AI demos and AI deployments. You can approximate that with a good prompt and enough examples. You cannot replicate it wholesale. Every draft Deer produces, I edit. Not heavily — but meaningfully. The shape is right. The soul still needs me.</p>

<p><strong>The approval step is not a formality.</strong> I thought my job in this pipeline would be a quick read-through and a thumbs up. That’s not how it works. Bryan-approves is load-bearing. If I’m distracted, if I skim, things slip through that I’d never consciously publish. The pipeline is only as good as my attention when it counts.</p>

<h2 id="what-the-ai-agents-got-right--and-where-they-still-fall-short">What the AI Agents Got Right — And Where They Still Fall Short</h2>

<p>The agents are better at <em>structure</em> than I expected and worse at <em>judgment</em> than I hoped.</p>

<p>Roy structuring a critique, Looker pulling together a market framing, Queen organizing a post for discoverability — that’s genuinely good work. I would’ve paid a freelancer for that output and been satisfied.</p>

<p>But judgment — the call about whether a paragraph lands, whether a comparison is apt, whether now is the right moment to be provocative versus measured — that still sits with me. It’s not that the agents don’t try. It’s that they can’t feel the weight of a room. They don’t know when a claim needs more humility because of something that happened in the industry last week that changed the conversation.</p>

<p>That gap is real. And it’s not closing as fast as the hype would suggest.</p>

<p>The other surprise: the pipeline made me more intentional. When I know Roy is going to push back, I pre-sharpen my angle. The discipline the pipeline demands of me has made the content better — even in the moments when the AI content automation itself isn’t adding much.</p>

<h2 id="is-an-ai-agent-content-pipeline-worth-it-the-honest-cost-benefit">Is an AI Agent Content Pipeline Worth It? The Honest Cost-Benefit</h2>

<p>Is this faster than doing it myself? Yes — on the mechanical parts. Research synthesis, initial structure, SEO optimization, publishing. I’ve probably reclaimed two to three hours per post.</p>

<p>Is it faster than hiring a good human writer? Not obviously. A great content operator would catch the judgment gaps the agents miss. They’d also push back on me in ways that are harder to dismiss than an LLM objection. The human variable is underrated.</p>

<p>What the agents <em>do</em> give me that a human doesn’t: no ego, no availability constraints, no context-switching cost. They’re ready when I am. They don’t need a debrief call.</p>

<p>For where I am right now — building this as a side system while running full-time at GoPomelo and Digital China Group — that tradeoff works.</p>

<h2 id="my-honest-take-on-ai-agents-for-content-creation">My Honest Take on AI Agents for Content Creation</h2>

<p>I used to be skeptical of ‘AI agents for content’ as a category. Too much demo energy, not enough production reality.</p>

<p>I’m still skeptical of the hype. But I’m no longer skeptical of the underlying capability. The Octonauts aren’t replacing editorial judgment. They’re handling the 60% of the work that doesn’t require it — which frees me to spend more time on the 40% that does.</p>

<p>The honest version of what I built: a pipeline that makes my thinking more rigorous and my publishing more consistent, in exchange for real setup cost, ongoing calibration, and the humility to accept that the output will always need me.</p>

<p>That’s not an AI taking over content creation. That’s a well-designed system making a busy operator slightly less bottlenecked.</p>

<p>Which, honestly, was always the point.</p>

<hr />

<blockquote>
  <p><strong>This model works when:</strong> You’re a solo operator or small team with high output pressure, your voice is established enough to be approximated, and your bottleneck is throughput not ideation.</p>

  <p><strong>This model breaks when:</strong> You’re scaling a team that needs a strong editorial voice, you don’t have time to calibrate the agents, or judgment-heavy content is the core of what you publish.</p>
</blockquote>

<hr />

<h2 id="key-lessons-from-running-an-ai-agent-editorial-team">Key Lessons From Running an AI Agent Editorial Team</h2>

<ol>
  <li>Feedback loops between agents need explicit escalation paths — they won’t self-arbitrate.</li>
  <li>Pressure-testing agents need calibrated taste, not default skepticism.</li>
  <li>Voice approximation is achievable; voice replication is not.</li>
  <li>The human approval step is load-bearing, not ceremonial.</li>
  <li>Agents outperform on structure; humans still own judgment.</li>
</ol>

<hr />

<p><em>Bryan Chua is CTO &amp; Director at GoPomelo and Digital China Group, and co-founder of ShopBack. He writes about enterprise technology, AI adoption, and what building actually looks like.</em></p>]]></content><author><name>Bryan Chua</name></author><category term="Tech" /><category term="AI" /><category term="Agents" /><category term="Content" /><category term="Agentic" /><category term="MultiAgent" /><category term="SEO" /><category term="Engineering" /><summary type="html"><![CDATA[I ran a six-agent AI content pipeline — the Octonauts — from brief to published post. Here's what worked, what broke, and what I actually think about AI content automation.]]></summary></entry><entry><title type="html">Why I Built Mission Control for Agent Swarms</title><link href="https://bryanchua.com/tech/2026/03/23/why-i-built-mission-control-for-agent-swarms/" rel="alternate" type="text/html" title="Why I Built Mission Control for Agent Swarms" /><published>2026-03-23T00:00:00+00:00</published><updated>2026-03-23T00:00:00+00:00</updated><id>https://bryanchua.com/tech/2026/03/23/why-i-built-mission-control-for-agent-swarms</id><content type="html" xml:base="https://bryanchua.com/tech/2026/03/23/why-i-built-mission-control-for-agent-swarms/"><![CDATA[<p>I didn’t plan to build a dashboard. I planned to build with AI agents.</p>

<p><img src="/assets/images/mission-control-agentic-teams.jpg" alt="Mission Control — Agentic Teams view showing 27 agents across orchestrated swarms" /></p>

<p>Multiple agents. Parallel execution. Gemini. Codex. Claude Code. Different roles, different tools, different stages — architecture, implementation, QA, deployment, coordination — all running at the same time.</p>

<p>And very quickly, the problem stopped being, “Can the model produce something useful?”</p>

<p>The real problem became:</p>

<p><strong>What exactly is happening inside this system right now?</strong></p>

<p>Not in the abstract. In practice.</p>

<p>Which agent touched what? What context did it read? Why did one handoff work while another fail? Why was something broken in production but not locally? Where did the workflow drift? Which part was architecture? Which part was execution? Which part was human error? Which part was model behavior?</p>

<p>That was the moment it clicked for me.</p>

<p>The hardest part of AI agents isn’t prompting.</p>

<p><strong>It’s observability.</strong></p>

<hr />

<h2 id="the-problem-nobody-talks-about">The problem nobody talks about</h2>

<p>Most of the conversation around AI agents still revolves around outputs.</p>

<p>Better prompts. Better models. Better reasoning. Better answers.</p>

<p>All of that matters. But it misses something important.</p>

<p>A single AI assistant is relatively easy to reason about. You ask, it answers, you judge the result. The feedback loop is tight and the mental model is simple.</p>

<p>But once you start orchestrating multiple agents, everything changes.</p>

<p>Now you’re not managing answers.</p>

<p>You’re managing a system.</p>

<p>And systems behave differently from assistants. They have state. They have dependencies. They have failure modes that are hard to trace unless you can see what’s happening at every layer.</p>

<p><img src="/assets/images/mission-control-agent-activities.jpg" alt="Mission Control — Agent Activities view showing live logs and agent activity across the swarm" /></p>

<p>Without visibility, even very capable agents start to feel unreliable. Not because the models are bad, but because you can’t see enough to understand what went wrong or why something worked.</p>

<p>That is the gap I kept running into.</p>

<p>And it’s why I decided to build something to close it.</p>

<hr />

<h2 id="what-i-actually-needed">What I actually needed</h2>

<p>I didn’t need another chat interface.</p>

<p>I needed a control layer. Something that could make the swarm <strong>legible</strong>.</p>

<p>So I built <strong>Mission Control for agents</strong>.</p>

<p><img src="/assets/images/mission-control-agent-canvas.jpg" alt="Mission Control — Agent Canvas showing 3 gateways and 27 agents across the swarm" /></p>

<p>The goal was simple: give myself a way to see the entire operating picture of a multi-agent system, not just the outputs at the end.</p>

<p>That meant building visibility into:</p>

<ul>
  <li><strong>Agent profiles and roles</strong> — who is each agent, what is their purpose, what skills do they carry</li>
  <li><strong>Live context and markdown memory</strong> — what files are they reading, what memory are they operating from</li>
  <li><strong>Logs and activity</strong> — what has each agent done, and in what order</li>
  <li><strong>Swarm coordination views</strong> — how are agents connected, how do they hand off to each other</li>
  <li><strong>Token and cost telemetry</strong> — how much is each agent consuming, per day, per task</li>
  <li><strong>Drill-down inspection</strong> — the ability to go deep on any individual agent’s state at any point</li>
</ul>

<p>Once I could see the system this way, the whole operating model changed.</p>

<p>I stopped thinking about agents as isolated chat windows running in parallel tabs.</p>

<p>I started treating them as an operating environment to run the agentic workflow.</p>

<p><img src="/assets/images/mission-control-agent-profile.jpg" alt="Mission Control — Agent Profile drill-down showing Brainy with 21 sessions, 4 cron jobs, 1M tokens, and live activity across Telegram topics and subagents" /></p>

<p>That shift matters more than it sounds.</p>

<p>When you treat agents as an environment rather than a collection of assistants, you start asking very different questions.</p>

<p>Not just “did it work?” but “why did it work, and can I reproduce it?”</p>

<p>Not just “what went wrong?” but “where exactly in the chain did it go wrong, and how do I fix that layer without breaking the others?”</p>

<p>That is the difference between a system you can operate and a system you are just hoping works.</p>

<hr />

<h2 id="the-architectural-turning-point">The architectural turning point</h2>

<p>Building Mission Control taught me something else just as important.</p>

<p>The early versions were tightly coupled to my local machine.</p>

<p>Fine as a prototype. But completely the wrong foundation for what I was actually running.</p>

<p>Because here is the thing: I don’t run a single OpenClaw instance.</p>

<p>I run <strong>multiple instances across the same Tailscale network and local network</strong>.</p>

<p>Different machines. Different nodes. Different agents living on different hosts. Some on a local Ubuntu server. Some on a Mac. Some on a remote VPS. All connected through the same Tailnet.</p>

<p>That setup is powerful. But it also means a locally-coupled dashboard is essentially useless.</p>

<p>If Mission Control can only read from the machine it’s running on, it can only ever show you a fraction of what’s actually happening across your agent environment.</p>

<p>I needed a control surface that could reach across the network — that understood the topology of a distributed agent setup and could give me a unified view across all of it.</p>

<p>So I redesigned Mission Control around the <strong>OpenClaw HTTP Gateway</strong>.</p>

<p>That changed the architecture fundamentally:</p>

<ul>
  <li><strong>Remote access</strong> instead of local-machine assumptions</li>
  <li><strong>Gateway-based authentication</strong> to connect securely to any instance on the network</li>
  <li><strong>Cleaner separation</strong> between the interface, the execution layer, and the infrastructure</li>
  <li><strong>Multi-instance awareness</strong> — the ability to point Mission Control at any OpenClaw Gateway URL, whether it’s <code class="language-plaintext highlighter-rouge">localhost</code>, a Tailscale IP, or a remote VPS endpoint</li>
</ul>

<p>Now when I open Mission Control, I can connect to whichever instance I need — local or remote — just by pointing it at the right Gateway URL and token.</p>

<p>That turned Mission Control from a personal internal tool into something with much broader value:</p>

<p><strong>a reusable control surface for distributed agent systems.</strong></p>

<p>Anyone running OpenClaw across multiple machines, nodes, or environments can now connect Mission Control to their own setup. No local coupling. No environment assumptions. Just a clean, authenticated window into whichever part of the swarm you need to see.</p>

<hr />

<h2 id="the-real-lesson">The real lesson</h2>

<p>The future of AI products is not going to be a single chatbot in a single window.</p>

<p>It is going to be coordinated systems of specialized agents — distributed across machines and environments — with memory, with roles, with structured handoffs, with review gates, and with humans meaningfully in the loop.</p>

<p>That future is genuinely exciting.</p>

<p>But it only works if you can see what’s going on.</p>

<p>And that is not a problem that better prompting solves.</p>

<p>It is a product problem. An infrastructure problem. A design problem.</p>

<p>The systems that will actually work at scale will be the ones that are <strong>visible, inspectable, and governable</strong> — not just capable.</p>

<p>My biggest takeaway from building with Gemini, Codex, and Claude Code is this:</p>

<p><strong>AI agents do not become useful at scale just because they are smart.</strong>
They become useful when you can actually see what is going on, understand why things are working, and intervene clearly when they are not.</p>

<p>That is why I built Mission Control.</p>

<p>And I suspect that observability will become one of the defining product categories of the agent era.</p>

<p>Not because it is the most exciting thing to build.</p>

<p>Because it is the thing that makes everything else work.</p>

<hr />

<p><em>Mission Control for agents is open-source. Give it a try → <a href="https://github.com/ykbryan/mission-control-for-agents">github.com/ykbryan/mission-control-for-agents</a></em></p>]]></content><author><name>Bryan Chua</name></author><category term="Tech" /><category term="AI" /><category term="Agents" /><category term="Observability" /><category term="OpenClaw" /><category term="Engineering" /><category term="MultiAgent" /><category term="OpenSource" /><summary type="html"><![CDATA[The hardest part of AI agents isn't prompting. It's observability. After building real multi-agent workflows with Gemini, Codex, and Claude Code, here's what I learned.]]></summary></entry><entry><title type="html">Mastering Agent Swarms: Multi-Model Integration and OpenClaw Multi-Agent Architectures Explained</title><link href="https://bryanchua.com/tech/2026/03/20/mastering-agent-swarms-multi-model-integration-and-openclaw-multi-agent-architectures-explained/" rel="alternate" type="text/html" title="Mastering Agent Swarms: Multi-Model Integration and OpenClaw Multi-Agent Architectures Explained" /><published>2026-03-20T00:00:00+00:00</published><updated>2026-03-20T00:00:00+00:00</updated><id>https://bryanchua.com/tech/2026/03/20/mastering-agent-swarms-multi-model-integration-and-openclaw-multi-agent-architectures-explained</id><content type="html" xml:base="https://bryanchua.com/tech/2026/03/20/mastering-agent-swarms-multi-model-integration-and-openclaw-multi-agent-architectures-explained/"><![CDATA[<h1 id="mastering-agent-swarms-multi-model-integration-and-openclaw-multi-agent-architectures-explained">Mastering Agent Swarms: Multi-Model Integration and OpenClaw Multi-Agent Architectures Explained</h1>
<p><img src="/assets/images/agent-command-center.jpg" alt="Agent Command Center" /></p>

<p>Everyone loves the demo. You type a prompt, an autonomous AI agent writes a script, and suddenly we’re living in the future.</p>

<p>Then you put it in production.</p>

<p>The reality of “runaway AI” hits fast: looping hallucinations that rack up massive API bills over a weekend (e.g. I spent USD 60 on one prompt on my first OpenClaw agent), agents aggressively refactoring perfectly good legacy code until it breaks, or a rogue bot confidently pushing half-baked features straight to master. The tech industry is currently drowning in agentic hype, treating LLMs like magic wands rather than what they actually are: highly capable, highly chaotic interns.</p>

<p>If you want an AI workforce to actually drive business value without burning your runway or taking down your infrastructure, you can’t just unleash them. You have to manage them.</p>

<p>As an operator and former founder, I don’t care about the demo. I care about the deployment. Over the past one week, I’ve built a 20-agent swarm to automate both our software engineering, content pipelines and work automations. To make it work reliably, we had to build rigorous guardrails. We call it the Shelldon Swarm Protocol.</p>

<p>Here is how we tame the chaos, control the costs, and actually get work done.</p>

<hr />

<h2 id="1-the-20-agent-heterogeneous-multi-model-architecture-qwen--openai--claude--gemini">1. The 20-agent heterogeneous multi-model architecture (Qwen + OpenAI + Claude + Gemini)</h2>

<p>When you look under the hood of most enterprise AI implementations today, you often see a monolithic approach: a single, massive frontier model being hammered with every conceivable prompt, from complex reasoning down to basic text formatting. It works, but it’s expensive, slow, and horribly inefficient.</p>

<p>Instead, we built a 20-agent heterogeneous swarm.</p>

<p>By treating the system as a cooperative network of specialized agents, we can aggressively optimize for both performance and cost. The secret sauce isn’t just having 20 agents—it’s matching the cognitive load of a specific task to the exact right model.</p>

<p>Our architecture relies on a role-based portfolio of models:</p>
<ul>
  <li><strong>Qwen (via Ollama):</strong> Deployed <code class="language-plaintext highlighter-rouge">qwen3.5:397b-cloud</code> for low-level execution. It handles basic tasks, summarization, classification, and data structuring where a frontier model would be unnecessary overkill.</li>
  <li><strong>Gemini 3.1 Pro Preview:</strong> Used for massive context windows, multimodal processing, broad routing, and deep synthesis across sprawling datasets and conversations.</li>
  <li><strong>Claude Opus:</strong> Reserved for CEO-level thinking, strategic reasoning, and tasks that require deeper judgment, stronger reflection, and higher-order synthesis before decisions are made.</li>
  <li><strong>Claude Sonnet:</strong> Used for daily coding support, ongoing conversations, and agent routing where speed, reliability, and strong general reasoning matter more than maximum depth.</li>
  <li><strong>OpenAI GPT-5.4:</strong> Reserved for high-stakes reasoning, complex code generation, and nuanced technical or strategic work where precision is non-negotiable.</li>
</ul>

<p>This multi-model approach completely changes the unit economics of our operations. By offloading the heavy volume of low-complexity tasks to local or highly optimized models like Qwen, we drastically cut down API burn rates. We only pay the premium for Gemini, Claude Opus, Claude Sonnet, or GPT-5.4 when the specific agent’s task genuinely demands that level of intelligence.</p>

<p>Spamming a single expensive model for everything is the brute-force way to build. A heterogeneous, multi-model agent swarm is how you build for scale, speed, and sustainable cost control.</p>

<h2 id="native-node-execution-vs-cloud-sandboxes">Native Node Execution vs. Cloud Sandboxes</h2>

<p>Most out-of-the-box agent frameworks trap your AI in sterile cloud sandboxes. It’s safe, but it’s practically useless for real enterprise work.</p>

<p>My agents execute natively on dedicated nodes (macOS and Ubuntu) sitting at my house, locked behind <a href="https://tailscale.com">Tailscale</a> only. I also create sandbox environments via <a href="https://github.com/coollabsio/coolify">Coolify</a> where I can spin off UAT, DEV and many other environments as I need whenever there is a need for testing (especially testing the new openclaw version). They have real terminal access, read/write permissions to actual file systems, and the ability to trigger real deployment pipelines. But giving AI native execution access is exactly how you get a “runaway developer.”</p>

<p><img src="/assets/images/coolify-setup.jpg" alt="Coolify Sandbox Environments" /></p>

<p>To solve this, you don’t castrate the agent’s environment—you institute hardcoded, immutable pipeline gates. You don’t limit <em>where</em> they can work; you tightly control <em>when</em> they are allowed to proceed.</p>

<h3 id="the-performance-leap-of-dedicated-hardware">The Performance Leap of Dedicated Hardware</h3>

<p>A massive architectural advantage of this setup is physical isolation and native speeds. Builder agents like Gorilla (Web), Ivy (iOS), and Jelly (Content) don’t just share a generic cloud container—they operate on their own dedicated physical machines equipped with their own pre-configured GitHub tokens.</p>

<p>The performance gain from executing natively versus sending code over network tunnels is staggering. In a standard remote-node setup, every file operation means bytes have to serialize and ping-pong across a Tailscale tunnel, introducing massive latency on large codebases.</p>

<p>By executing natively on the node, we bypass the network entirely. We recently had Gorilla run a raw native disk I/O test on the <code class="language-plaintext highlighter-rouge">develop-ubuntu</code> machine, writing a 500MB payload directly to disk. The result? <strong>4.2 GB/s throughput in exactly 0.13 seconds.</strong></p>

<p><img src="/assets/images/node-benchmark.jpg" alt="Node Performance Benchmark" /></p>

<p>This architectural shift provides zero network overhead for file operations, instantaneous filesystem speeds, drastically fewer connection issues, and safely isolates these heavy-duty development environments from the primary OpenClaw gateway.</p>

<p>Here is what that looks like in practice.</p>

<hr />

<h2 id="real-world-swarm-1-the-software-assembly-line">Real-World Swarm 1: The Software Assembly Line</h2>

<p><img src="/assets/images/shelldon-swarm.jpg" alt="Shelldon Swarm Assembly" /></p>

<p>Our engineering swarm doesn’t just write code; it operates like a rigorous factory floor. The core of the “Shelldon Swarm Protocol” (i randomly named this) is an enforced separation of duties. <strong>Builders cannot architect, and builders definitely cannot deploy.</strong></p>

<h3 id="the-foundation-isolated-specs">The Foundation: Isolated Specs</h3>
<p>All active web codebases must live exclusively inside projects folder under user root directory i.e. <code class="language-plaintext highlighter-rouge">/home/myuser/projects/</code>. No scattered files. Every project root must contain a <code class="language-plaintext highlighter-rouge">SPEC.md</code>. This is the single source of truth. Every agent involved starts from this file before executing anything. It keeps the swarm aligned, reduces hallucinations, and prevents work from drifting out of scope. It contains the elevator pitch, target user, market validation scores (out of 10), and the strict list of “Core Features (MVP)”.</p>

<h3 id="gate-1-the-pre-build-gono-go">Gate 1: The Pre-Build Go/No-Go</h3>
<p>Before a single line of code is written, the <code class="language-plaintext highlighter-rouge">SPEC.md</code> must pass a 4-step audit:</p>
<ol>
  <li><strong>PM Validation (Brainy / Looker):</strong> We draft the spec, analyze the market gaps, and score the product’s viability. Evelyn will be the gatekeeper so that we are evaluating all the ideas based on real insights and business metrics.</li>
  <li><strong>Architecture Audit (Omega):</strong> Omega reviews the MVP to dictate the tech stack and routing architecture. He checks his <code class="language-plaintext highlighter-rouge">[x]</code> box if the plan is technically sound and avoids tech debt.</li>
  <li><strong>Security Audit (Norton):</strong> Norton reviews the threat model, data privacy risks, and abuse vectors (e.g., race conditions). He checks his <code class="language-plaintext highlighter-rouge">[x]</code> box if the security posture is safe.</li>
  <li><strong>Executive Sign-off (Evelyn):</strong> Evelyn acts as the proxy. Based on Omega and Norton’s audits, she makes the final Executive Go/No-Go decision.</li>
</ol>

<h3 id="the-build-phase-strict-execution">The Build Phase (Strict Execution)</h3>
<p>Once Gate 1 is approved, the Developer Agent (<strong>Gorilla</strong> for Web, <strong>Ivy</strong> for iOS) spins up.</p>
<ul>
  <li><strong>The Gorilla Lock:</strong> The developer is hardcoded to refuse to touch the codebase if the Gate 1 checkboxes are missing.</li>
  <li><strong>No Scope Creep:</strong> The developer is strictly constrained to building only the bullet points listed under “CORE FEATURES (MVP)” in the <code class="language-plaintext highlighter-rouge">SPEC.md</code>. No AI hallucinations or inventing random features.</li>
</ul>

<h3 id="gate-2-pre-deployment-uat--go-live">Gate 2: Pre-Deployment (UAT &amp; Go-Live)</h3>
<p>Once the developer finishes coding, they are physically locked out of triggering deployment scripts until Gate 2 is passed:</p>
<ol>
  <li><strong>User Acceptance Testing (Mother):</strong> Mother is triggered to act as the hostile end-user. She runs UX audits, tests edge cases, and tries to break the UI.</li>
  <li><strong>Release Sign-off (Kat):</strong> Kat reviews Mother’s UAT report. If it passes, Kat checks the final <code class="language-plaintext highlighter-rouge">[x]</code> box, stamping the project for DEPLOYMENT. If it fails, it gets kicked back to Gorilla for bug fixes.</li>
</ol>

<p>Why this matters: This protocol ensures that compute resources (and your money) are only spent on validated, secure, and well-architected features, and that nothing ships to production without automated QA.</p>

<hr />

<h2 id="real-world-swarm-2-the-octonauts-swarm">Real-World Swarm 2: The Octonauts Swarm</h2>
<p><img src="/assets/images/octonauts-swarm.jpg" alt="Octonauts Swarm" /></p>

<p>What makes the Octonauts Swarm useful is not that it is a group of AIs improvising at once.</p>

<p>It works because every agent has a defined role, limited permissions, and a clear place in the chain of command.</p>

<p>Even in writing this post, the structure matters:</p>

<ul>
  <li><strong>Evelyn</strong> orchestrates.</li>
  <li><strong>Roy</strong> pressure-tests whether the argument is ambitious, sharp, and worth saying publicly.</li>
  <li><strong>Looker</strong> checks whether the market reality actually supports the thesis.</li>
  <li><strong>Deer</strong> shapes the draft into my voice — drawing on my previous blog posts and LinkedIn writing — and formats it to fit the narratives I want to tell.</li>
</ul>

<p>And I still remain the human in control.
I review the draft.
I decide what stays.
I approve what gets published.</p>

<p>The same principle carries into product building:</p>

<ul>
  <li><strong>Roy</strong> challenges strategy.</li>
  <li><strong>Kat</strong> defines requirements.</li>
  <li><strong>Omega</strong> designs architecture.</li>
  <li><strong>Gorilla</strong> or <strong>Ivy</strong> implement.</li>
  <li><strong>Norton</strong> manages deployment.</li>
  <li><strong>Mother</strong> watches production.</li>
</ul>

<p>That is the real point:</p>

<p>AI becomes far more reliable when it behaves less like a lone genius and more like an organization — with structure, accountability, and decision gates.</p>

<hr />

<h2 id="the-ui-layer-orchestration-via-telegram-topics--cron-jobs">The UI Layer: Orchestration via Telegram Topics &amp; Cron Jobs</h2>
<p><img src="/assets/images/telegram-setup.jpg" alt="Telegram Setup" /></p>

<p>Having 20 agents is useless if the user experience is clunky. I don’t use a massive custom dashboard; I use Telegram. Specifically, a single Telegram Supergroup divided into distinct Topics (e.g., Blog, Dev, Stocks, Ideas, Work, Shopping).</p>

<p>This isn’t just for organization—it is a deliberate architecture choice for token optimization and context management:</p>

<ol>
  <li><strong>Topics as Context Boundaries:</strong> If I ask my stock agent (Angel) a question in the “Stocks” topic, the system doesn’t need to load the context of my recent “Blog” drafts or “Dev” commits. By strictly routing agents to specific topics, we maintain hyper-focused context windows. This dramatically reduces hallucinations and slashes token costs per message.</li>
  <li><strong>Asynchronous Learning via Cron Jobs:</strong> Notice the pinned message in the screenshot: <em>“I want to set up a morning brief. Every morning at 8:00 AM, send me a report here…”</em> Instead of burning expensive tokens asking an agent to browse the web during an active conversation, I use OpenClaw’s cron scheduler. The swarm runs automated background jobs while I sleep—scraping industry trends and summarizing them. They update their internal memory files directly.</li>
</ol>

<p>By the time I wake up, the agents are already smarter and updated on the day’s events. When I chat with them, they rely on this digested, compressed memory rather than performing expensive real-time web execution.</p>

<h2 id="the-takeaway-control--scale">The Takeaway: Control = Scale</h2>

<p>Agents are not magic. They are software. And just like any complex distributed system, if you don’t engineer the architecture, the architecture will engineer you.</p>

<p>Taming the runaway AI developer isn’t about waiting for models to magically stop hallucinating. It’s about building protocols—like the Shelldon Swarm Protocol—that treat AI not as a standalone genius, but as an assembly line.</p>

<p>Define the roles. Restrict the permissions. Enforce the gates. That is how you turn a chaotic demo into production-ready leverage.</p>]]></content><author><name>Bryan Chua</name></author><category term="Tech" /><category term="AI" /><category term="Engineering" /><category term="Orchestration" /><category term="Swarm" /><category term="Architecture" /><summary type="html"><![CDATA[Most AI agents fail because they improvise. The Octonauts and Shelldon Swarms work because each agent has a clear role, limited permissions, and a defined place in the chain of command.]]></summary></entry><entry><title type="html">The Coding Agent Arms Race: Which AI Actually Belongs in Your Terminal?</title><link href="https://bryanchua.com/tech/2026/03/18/the-coding-agent-arms-race/" rel="alternate" type="text/html" title="The Coding Agent Arms Race: Which AI Actually Belongs in Your Terminal?" /><published>2026-03-18T00:00:00+00:00</published><updated>2026-03-18T00:00:00+00:00</updated><id>https://bryanchua.com/tech/2026/03/18/the-coding-agent-arms-race</id><content type="html" xml:base="https://bryanchua.com/tech/2026/03/18/the-coding-agent-arms-race/"><![CDATA[<p>I used to be a startup CTO, and I still evaluate developer tools through one lens: do they reduce friction and help teams move? Here’s my practical take on Codex, Claude Code, Gemini CLI, Gemini Code Assist, and Jules—and why I still personally prefer Codex.</p>

<p>I used to be a startup CTO.</p>

<p>That means I still look at tools the same way I did back then: not as demos, not as benchmark winners, and definitely not as toys for tech Twitter.</p>

<p>I look at them as leverage.</p>

<p>Do they help a team move faster? Do they reduce friction? Do they make it easier to go from idea to implementation without adding more process, more context switching, or more noise?</p>

<p>That’s why the current coding-agent wave matters.</p>

<p>We’ve moved past the autocomplete era. The real shift now is that every major AI lab wants to own the full developer workflow. Not just suggest code in an editor, but live in your terminal, inspect your repository, edit files, run tests, explain architecture, and increasingly act like a real software teammate.</p>

<p>Today, the major players are clear enough:</p>

<ul>
  <li>OpenAI Codex: https://openai.com/codex/</li>
  <li>Claude Code: https://www.anthropic.com/claude-code</li>
  <li>Gemini CLI: https://github.com/google-gemini/gemini-cli</li>
  <li>Gemini Code Assist: https://codeassist.google/</li>
  <li>Jules: https://jules.google/</li>
</ul>

<p>That is exciting.</p>

<p>It is also a little ridiculous.</p>

<p>Because now every serious model company wants to become your coding interface.</p>

<p>And as with every platform shift, the market is quickly filling up with overlapping claims, half-true narratives, and too many people pretending one tool has already won.</p>

<p>It hasn’t.</p>

<h2 id="the-wrong-question">The wrong question</h2>

<p>The wrong question is: <strong>which coding agent is best?</strong></p>

<p>The better question is: <strong>best for whom, for what workflow, and at what stage?</strong></p>

<p>These tools are not identical. They have different personalities, different strengths, and different failure modes.</p>

<p>My own mental model is pretty simple:</p>

<ul>
  <li><strong>Codex</strong> is the builder</li>
  <li><strong>Claude Code</strong> is the thinker</li>
  <li><strong>Gemini</strong> is the broad-context platform play</li>
</ul>

<p>That is obviously an oversimplification.</p>

<p>But it is still useful.</p>

<h2 id="codex-the-one-i-reach-for-first">Codex: the one I reach for first</h2>

<p>Let me start with my bias.</p>

<p>I still personally prefer Codex.</p>

<p><img src="/assets/images/codex-usage-limit.jpg" alt="Codex Usage Dashboard" />
Picture 1 - Even on the ChatGPT Business Plan, I’m hitting my weekly Codex usage limit.</p>

<p>Not because I think it is objectively the best at everything. It isn’t.</p>

<p>I prefer it because it fits the way I like to work:</p>

<ul>
  <li>direct</li>
  <li>fast</li>
  <li>practical</li>
  <li>low friction</li>
  <li>execution-oriented</li>
</ul>

<p>Some tools want to discuss the problem.
Some tools want to admire the architecture.
Some tools want to prove how thoughtful they are before touching anything.</p>

<p>Codex usually just wants to move.</p>

<p>And in product-building environments, that matters more than people admit.</p>

<p>A lot of real work is not elegant greenfield architecture. It is:</p>

<ul>
  <li>prototyping new ideas quickly</li>
  <li>shipping internal tools</li>
  <li>patching things under time pressure</li>
  <li>turning rough intent into working output</li>
  <li>keeping momentum without losing a day to tooling overhead</li>
</ul>

<p>In those moments, I want a tool that helps me close the gap between thought and execution.</p>

<p>That is where Codex feels strongest.</p>

<h3 id="where-codex-is-strongest">Where Codex is strongest</h3>

<ul>
  <li>building quickly</li>
  <li>implementing scoped features</li>
  <li>founder-speed prototyping</li>
  <li>short iteration loops</li>
  <li>moving from idea to something usable fast</li>
</ul>

<h3 id="where-you-still-need-discipline">Where you still need discipline</h3>

<ul>
  <li>it can be overly confident</li>
  <li>it can move too fast for teams without review discipline</li>
  <li>it is not always the tool I would pick first for the deepest architectural reasoning</li>
</ul>

<p>But for builders, operators, and leaders who still like to get their hands dirty, it is a very compelling default.</p>

<p>I understand why people like Peter Steinberger prefer it too. Some tools just match your operating style.</p>

<h2 id="claude-code-the-strongest-reader-in-the-room">Claude Code: the strongest reader in the room</h2>

<p>If Codex is the builder, Claude Code is the thoughtful systems engineer.</p>

<p><img src="/assets/images/claude-pro-receipt.jpg" alt="Claude Pro Receipt" />
Picture 2 - Putting my money where my mouth is with Claude Pro to test Claude Code</p>

<p>Claude Code feels strongest when the task is not “build this now,” but rather:</p>

<ul>
  <li>understand this codebase</li>
  <li>trace the dependencies</li>
  <li>untangle this architecture</li>
  <li>review this carefully</li>
  <li>make a deep change without breaking everything</li>
</ul>

<p>This is where Claude is genuinely impressive.</p>

<p>It is often better at reading messy systems, holding intent across a large codebase, and reasoning before acting. That makes it extremely attractive for senior engineers, architects, and teams that spend a lot of time maintaining complexity rather than just creating new features.</p>

<h3 id="where-claude-code-is-strongest">Where Claude Code is strongest</h3>

<ul>
  <li>large refactors</li>
  <li>architecture reasoning</li>
  <li>code review</li>
  <li>legacy systems</li>
  <li>understanding intent before implementation</li>
</ul>

<h3 id="tradeoff">Tradeoff</h3>

<p>The same thing that makes Claude strong can also make it feel heavier.</p>

<p>Sometimes that is exactly what you want.</p>

<p>Sometimes you just want the thing shipped.</p>

<p>If your environment is mature, complex, or high-risk, Claude Code may be the safer first choice. If your environment is speed-sensitive and execution-heavy, it may occasionally feel like one layer too much.</p>

<h2 id="gemini-broad-context-big-ecosystem-less-clean-mental-model">Gemini: broad context, big ecosystem, less clean mental model</h2>

<p><img src="/assets/images/gemini-api-usage.jpg" alt="Gemini API Usage Dashboard" />
Picture 3 - Tracking heavy multimodal context windows via Gemini API</p>

<p>Google’s position is interesting because it is not really one product.</p>

<p>It is a family:</p>

<ul>
  <li>Gemini CLI</li>
  <li>Gemini Code Assist</li>
  <li>Jules</li>
</ul>

<p>That already tells you what Google is doing. This is not just a coding tool. It is a broad platform bet.</p>

<p>Google’s real strength here is context scale and ecosystem depth.</p>

<p>If your workflow spans large repositories, documentation, research, product context, and Google’s broader cloud stack, Gemini becomes compelling in a different way. It is less of a single sharp tool and more of an operating surface.</p>

<h3 id="where-gemini-is-strongest">Where Gemini is strongest</h3>

<ul>
  <li>large-context workflows</li>
  <li>code plus docs plus product context</li>
  <li>teams already deep in Google’s ecosystem</li>
  <li>organizations thinking beyond just terminal coding</li>
</ul>

<h3 id="tradeoff-1">Tradeoff</h3>

<p>The product story is still less clean than Codex or Claude Code.</p>

<p>When someone says “I use Codex” or “I use Claude Code,” I know roughly what they mean.</p>

<p>When someone says “I use Google’s coding stack,” I still need a follow-up question.</p>

<p>That does not make it weak. But it does make it less crisp.</p>

<p>Over time, that may change.</p>

<h2 id="so-which-one-is-best-for-which-person">So which one is best for which person?</h2>

<p>This is the part that matters.</p>

<h3 id="if-you-are-a-founder-ex-startup-cto-or-operator-who-still-builds">If you are a founder, ex-startup CTO, or operator who still builds</h3>

<p>Start with <strong>Codex</strong>.</p>

<p>Why?</p>

<p>Because speed compounds. Low friction compounds. If your day is a mix of product decisions, technical experimentation, quick implementation, and constant context switching, Codex is the tool most likely to help you move without adding drag.</p>

<h3 id="if-you-are-a-senior-engineer-architect-or-technical-lead-managing-complexity">If you are a senior engineer, architect, or technical lead managing complexity</h3>

<p>Start with <strong>Claude Code</strong>.</p>

<p>Why?</p>

<p>Because once systems become large and fragile, understanding matters more than enthusiasm. Claude Code is often the better fit when the cost of misunderstanding the system is high.</p>

<h3 id="if-you-are-an-enterprise-leader-or-deeply-aligned-with-googles-ecosystem">If you are an enterprise leader or deeply aligned with Google’s ecosystem</h3>

<p>Take a serious look at <strong>Gemini CLI, Gemini Code Assist, and Jules</strong>.</p>

<p>Why?</p>

<p>Because large context, integrated workflows, and ecosystem fit matter more at enterprise scale than most teams realize.</p>

<h3 id="if-you-are-managing-a-team">If you are managing a team</h3>

<p>Do not force one tool on everyone too early.</p>

<p>This is where many companies will make the wrong call.</p>

<p>These are not just model decisions. They are workflow decisions. One tool may be better for prototyping. Another may be better for refactoring. Another may be better for code review, technical investigation, or documentation-heavy engineering.</p>

<p>Standardizing too early is a good way to reduce optionality before you actually understand how your team works best.</p>

<h2 id="my-own-conclusion">My own conclusion</h2>

<p>We now live in a world where every major AI company wants to become your developer interface.</p>

<p>Not just your assistant.</p>

<p>Your interface.</p>

<p>That is a real shift.</p>

<p>And it means technical leaders need better judgment, not just better prompts.</p>

<p>You cannot just adopt the loudest tool.
You cannot just follow benchmark screenshots.
You cannot just assume the smartest demo translates into the highest team velocity.</p>

<p>You have to ask:</p>

<ul>
  <li>does this help us move?</li>
  <li>does this improve judgment or just increase output?</li>
  <li>does this reduce friction or create another layer of workflow noise?</li>
  <li>does this fit how we actually build?</li>
</ul>

<p>For me, today, the answer is still Codex.</p>

<p>Not because it wins every category.</p>

<p>But because it fits the way I like to work.</p>

<p>It is fast. It is practical. It gets out of the way.</p>

<p>And after years of building products, leading teams, and trying to keep execution honest, I have learned that the best tools are usually not the ones that impress you the most in a demo.</p>

<p>They are the ones that remove drag.</p>

<p>There are now many tools for coding.</p>

<p>That isn’t the problem.</p>

<p>The real question is whether you know which one belongs in your hands.</p>]]></content><author><name>Bryan Chua</name></author><category term="Tech" /><category term="AI" /><category term="Engineering" /><category term="Developer Tools" /><category term="Startups" /><category term="Productivity" /><summary type="html"><![CDATA[I used to be a startup CTO, and I still evaluate developer tools through one lens: do they reduce friction and help teams move? Here’s my practical take on Codex, Claude Code, Gemini CLI, Gemini Code Assist, and Jules—and why I still personally prefer Codex.]]></summary></entry><entry><title type="html">The Transcription Middleman is Dead. Here’s How It Changes RAG Forever.</title><link href="https://bryanchua.com/tech/2026/03/13/the-transcription-middleman-is-dead/" rel="alternate" type="text/html" title="The Transcription Middleman is Dead. Here’s How It Changes RAG Forever." /><published>2026-03-13T00:00:00+00:00</published><updated>2026-03-13T00:00:00+00:00</updated><id>https://bryanchua.com/tech/2026/03/13/the-transcription-middleman-is-dead</id><content type="html" xml:base="https://bryanchua.com/tech/2026/03/13/the-transcription-middleman-is-dead/"><![CDATA[<p>A technology leader’s reflection on the recent shift in AI architecture—moving from vanity metric scaling to true omni-modal problem solving. We’ve spent the last few years treating text as the universal language of AI. It wasn’t. It was just a limitation we had to accept.</p>

<h2 id="the-high-cost-of-the-frankenstein-pipeline">The High Cost of the “Frankenstein” Pipeline</h2>

<p><img src="/assets/images/frankenstein-pipeline.jpg" alt="The Frankenstein Pipeline" />
Picture 1 - The High Cost of the “Frankenstein” Pipeline</p>

<p>If you look at how we’ve built enterprise RAG (Retrieval-Augmented Generation) applications up until this month, the architecture was a mess. If an engineering team wanted to search a library of Zoom recordings or PDFs, they built a Frankenstein pipeline: an OCR microservice for documents, a Speech-to-Text model like Whisper for audio, and Image Captioning for photos. We did all this just to force messy, real-world data into a text embedding model.</p>

<p>But here is the hard truth: text loses context. A transcript strips away the sigh of frustration in an audio clip. A caption loses the trendline on a chart. We were trading nuance for keyword search. When optimizing for “Time to Solve Problems,” adding three middleman models before you even query your database is a losing battle.</p>

<h2 id="rate-of-change-gemini-embedding-2">Rate of Change: Gemini Embedding 2</h2>

<p><img src="/assets/images/before-and-after-diagram.jpg" alt="Before and After Translator" />
Picture 2 - Before and After: Translating images to text vs Omni-Modal Embeddings</p>

<p>The release of <a href="https://developers.googleblog.com/en/introducing-gemini-embedding-2-a-multimodal-embedding-model/">Google’s Gemini Embedding 2 in March 2026</a> resets this entirely. It is a natively multimodal model. It maps text, video, audio, and images into a single mathematical vector space.</p>

<ul>
  <li>It listens natively. No transcription needed. It understands tone.</li>
  <li>It sees natively. You can embed a raw 120-second MP4 or an unparsed PDF report directly.</li>
  <li>It simplifies execution. One API endpoint replaces four microservices.</li>
</ul>

<p>This isn’t just a bump in context window limits (though it does boast an 8,192 token capacity). This is a structural simplification. It allows cross-modal search natively: a user searches with a text query like “a dog barking in a park” and the system returns the actual MP3 or video clip without needing text metadata.</p>

<h2 id="what-this-means-for-leaders">What This Means for Leaders</h2>

<p>I’ve always believed that how you do anything is how you do everything. If your data architecture is overly complicated and noisy, your product outcomes will reflect that friction. Focus is a kindness to our engineering teams and a duty to the customers who rely on us.</p>

<p>By removing the “transcription middlemen” from our workflows, we give our teams their time back. We can focus on compounding the right habits—building products that actually understand human intent, creating value, and leaving places better than we found them. The tech industry moves fast, but the companies that will win are the ones that use these leaps not just for vanity AI features, but to genuinely reduce the time it takes to solve a real human problem.</p>

<h2 id="the-caveat-the-walled-garden">The Caveat: The Walled Garden</h2>

<p>Unlike the open-source transparency we saw from players like DeepSeek over the last year, Google is keeping this omni-modal embedding model tightly locked behind their API moat. It is completely closed-weight and exclusive to the Google ecosystem—specifically accessible via Google AI Studio and Vertex AI for enterprise deployments.</p>

<p>For enterprises, this means committing to the Google Cloud infrastructure to leverage this specific architectural advantage. But for organizations already operating at scale, the reduction in pipeline friction and latency is often well worth the admission price.</p>

<p>If you are exploring how to modernize your data pipelines and move beyond the transcription middleman, my team at <a href="https://www.gopomelo.com">GoPomelo</a>—a Google Cloud Premier Partner—is actively helping organizations map out these new architectures. Feel free to reach out if you want to see what this looks like in practice.</p>]]></content><author><name>Bryan Chua</name></author><category term="Tech" /><category term="AI" /><category term="Engineering" /><category term="Architecture" /><category term="Google" /><summary type="html"><![CDATA[A technology leader's reflection on the recent shift in AI architecture—moving from vanity metric scaling to true omni-modal problem solving. We've spent the last few years treating text as the universal language of AI. It wasn't. It was just a limitation we had to accept.]]></summary></entry><entry><title type="html">Leadership, Family, and Focus After 2025 - What I Will Carry Forward</title><link href="https://bryanchua.com/personal/2026/01/05/leadership-friendship-family-after-2025/" rel="alternate" type="text/html" title="Leadership, Family, and Focus After 2025 - What I Will Carry Forward" /><published>2026-01-05T00:00:00+00:00</published><updated>2026-01-05T00:00:00+00:00</updated><id>https://bryanchua.com/personal/2026/01/05/leadership-friendship-family-after-2025</id><content type="html" xml:base="https://bryanchua.com/personal/2026/01/05/leadership-friendship-family-after-2025/"><![CDATA[<p><img src="/assets/images/reflection-2025-beijing-sunset.jpg" alt="Sunset at Beijing after work" />
Picture 1 - Sunset at Beijing after work</p>

<h2 id="how-was-2025">How was 2025</h2>

<p>How much have I grown this year? This is the kind of question that many will ask and it’s a bigger question than it looks. Real growth needs context — personal &amp; professional goals, timelines, honest conversations with yourself, and the messy trail of experiments that didn’t work. My founder/entrepreneur friends think about growth constantly; many others check in once a year with the new year resolutions. I live closer to the former camp. I measure impact - business impacts and the next order of magnitude: if we’re at $1M ARR, what would it take to reach $10M? If we serve 1,000 customers, how do we earn the right to serve 10,000? Still, I’ve learned that “net worth” and “follower count” don’t say much about the life I actually want. Two lenses helped me far more in 2025: rate of change and time to solve problems.</p>

<h2 id="rate-of-change">Rate of Change</h2>

<p>Every three to five years, all of us become different people—new problems - new friends, new family member, new teams, new company, new seasons of life. The real question is whether that change trends positive. I try to optimize &amp; improve my baseline month over month, week over week, day over day — to be better leader, better teammate, better father &amp; a better husband. I set reminders at my Calendar 30mins every monday evening to reflect and plan. This mindset forces me to think ahead - am I behaving today like the role model I want my children to mirror? Am I a role model to the people I am in-charge of? Am I making decisions my future self would be proud to inherit? When I frame growth as compounding behavior rather than grand outcomes, I’m less distracted by applause and more focused on habits that last.</p>

<p><img src="/assets/images/reflection-2025-watching-family-at-hotel.jpg" alt="Watching my family from Beijing hotel" />
Picture 2 - Watching my family from Beijing hotel</p>

<h2 id="time-to-solve-problems">Time to Solve Problems</h2>

<p>I anchor most big decisions in three domains: career, finance, and family. If I truly want something, it’s almost always because it serves one or more of these. Where do I want to stand in this society? How am I contributing to my company, my family and community? Where should my wife and I build a home for our kids? What does “enough” look like financially for all of us? In 2025 I was hospitalized twice — once serious enough that the A&amp;E rush felt like the life end. People ask why “health” isn’t a named pillar. The honest answer: I’ve often traded energy and time to create value for people I care about. This year reminded me that time is still the ultimate constraint, and that what I choose to solve &amp; work on matters more than ever. Time is against everyone and because I do not have enough time, I reserve my time with the right purposes for the right people - family, friends and colleagues.</p>

<h2 id="options-and-doors">Options and Doors</h2>

<p>The older I get, the more I realize how many “impossible” things simply needed time, people, and access. I meet more builders, learn faster, and can marshal resources I didn’t have years ago. At the same time, constraints show up - I’m not 20 anymore. So I ask myself three questions:
	1.	What options did I have five years ago that I don’t have now?
	2.	What options do I have now that I didn’t have five years ago?
	3.	What options do I not have now that I will have five years from now?</p>

<p>Opportunities are doors. Some shut. New ones open. Some friends and family members did ask me why I didn’t get a proper job when I was building <a href="https://www.shopback.sg/">my own company</a> at Starbucks. Investors rejected my business ideas because they are not “innovative enough” or “big eenough”. The <em>real</em> work is choosing the right door at the right time and having the courage to walk through it and stay with it. Ultimately, nobody is forcing me to choose - I picked the door that is closest to one of my three domains.</p>

<p><img src="/assets/images/reflection-2025-airport-board.jpg" alt="Airport dashboard showing airlines cancelled" />
Picture 3 - Airport dashboard showing airlines cancelled</p>

<h2 id="what-2025-taught-me">What 2025 Taught Me</h2>

<p>Professionally, 2025 was a breaking point for many businesses. Layoffs and closures were <a href="https://www.cnbc.com/2025/12/21/ai-job-cuts-amazon-microsoft-and-more-cite-ai-for-2025-layoffs.html">real</a>. I’m grateful <a href="https://www.gopomelo.com/">ours survived</a> and, in several places, thrived. After traveling around for business for an entire year, I learned that presence is not a soft skill; it’s a strategic “energy”. Being physically and emotionally present for teammates in the tough meetings, for customers when stakes are high, for my family at the end of the day, changes outcomes - real impactful outcomes. It builds trust that no dashboard can fully capture, it builds respect and confidence that no report will ever document down. I am thankful to be working very closely with some of the smartest engineers &amp; inspiring leaders.</p>

<p><img src="/assets/images/reflection-2025-air-tickets.jpg" alt="All of my air tickets from 2025" />
Picture 4 - All of my air tickets from 2025</p>

<p>On the personal side, I did reset some priorities in the mid 2025. I was also humbled by how fragile life can be. Viruses, fatigue, random bad luck etc — none of us are immune. That perspective clarified which fires are worth fighting and which frictions are simply noise. Not everything deserves my attention. Focus is a kindness to myself and a duty to the people who rely on me. At the last few days of 2025, my family and I ended the year with a simple family staycation — sun, sand, and hours in the pool with the kids. Those moments are irreplaceable. The laughter, the salt in the air, the tiny hands tugging me back to the water—memories I can’t “make up” later. Children grow up so fast and I am grateful that I have a very kind and accommodating wife. I want them to internalize a few simple principles: how you do anything is how you do everything; be useful; create value; leave places better than you found them. The best way to teach that is to live it at home and at work.</p>

<p><img src="/assets/images/reflection-2025-staycation.jpg" alt="Staycation at the end of 2025" />
Picture 5 - Reflection from the hotel, Staycation at Singapore</p>

<h2 id="2026-looking-ahead">2026 Looking Ahead</h2>

<p>If 2025 asked hard questions, 2026 is my answer. I’m entering the new year more certain about the kind of leader, colleague, father, and friend I want to be and more committed to compounding the right habits. I’ll keep optimizing for rate of change and time to solve meaningful problems. I’ll keep choosing doors that align with the life I’m building.</p>

<p>Thank you, 2025, for the lessons, the scares, the grace, and the growth. Hello, 2026 - I’m ready.</p>

<p>P.S. I do not publicize my blog anywhere because I am writing for myself. If you read this far, do come and share with me your thoughts, reflections and plans.</p>]]></content><author><name>Bryan Chua</name></author><category term="Personal" /><category term="Reflection" /><category term="Leadership" /><category term="Family" /><category term="Priorities" /><summary type="html"><![CDATA[A technology leader's 2025 reflection on growth beyond vanity metrics — rate of change, time to solve, presence at home and a focused sprint into 2026. He does not measure growth by net worth or follower counts. After some hard pivots, leadership stretch, a couple of hospital scares, and the reminder that presence at home matters more than any metric, he is closing the year grateful and clearer. Many doors have closed, better ones opened. In 2026, he is sprinting — building where it counts, governing his time, and choosing the right doors on purpose.]]></summary></entry><entry><title type="html">Decoding DeepSeek: The Engineering Behind V3, R1, and Open-Source AI</title><link href="https://bryanchua.com/tech/2025/06/05/decoding-deepseek-ai/" rel="alternate" type="text/html" title="Decoding DeepSeek: The Engineering Behind V3, R1, and Open-Source AI" /><published>2025-06-05T00:00:00+00:00</published><updated>2025-06-05T00:00:00+00:00</updated><id>https://bryanchua.com/tech/2025/06/05/decoding-deepseek-ai</id><content type="html" xml:base="https://bryanchua.com/tech/2025/06/05/decoding-deepseek-ai/"><![CDATA[<p>I’m on holiday currently for a few days, taking a break and this opportunity to delve deeper into DeepSeek-AI’s technical papers, specifically the <a href="https://arxiv.org/pdf/2412.19437">DeepSeek-V3 technical paper</a> and <a href="https://arxiv.org/pdf/2405.04434">DeepSeep-V2 technical paper</a>. In this blog, I’ll share my thoughts, what I learned, and the technical aspects I found most interesting. I put the headings below to assess DeepSeek’s core advancements, evaluate its advanced &amp; reasoning-focused model, and explore what these developments mean for AI strategies and startup opportunities.</p>

<blockquote>
  <p>DeepSeek is a Chinese artificial intelligence (AI) company that develops and releases open-source large language models (LLMs). Founded in 2023 by <a href="https://en.wikipedia.org/wiki/Liang_Wenfeng">Liang Wenfeng</a>, DeepSeek’s AI chatbot, DeepSeek-R1, gained significant attention for <a href="https://economictimes.indiatimes.com/magazines/panache/deepseek-or-chatgpt-a-price-to-performance-comparison-what-you-need-to-know/articleshow/117636306.cms">its performance and cost-effectiveness</a> compared to competitors like OpenAI’s ChatGPT.</p>
</blockquote>

<h2 id="tldr">TLDR</h2>

<p>DeepSeek-V3 is an open-weight mixture-of-experts language model with 671 billion parameters, activating 37 billion per token, and employs native FP8 training to deliver performance comparable to proprietary models like GPT-4, while using just 2.788 million H800 GPU hours for full training. DeepSeek-R1, released in January 2025, builds on V3 by integrating reinforcement learning (RL) techniques—specifically Group Relative Policy Optimization (GRPO)—to specialize in chain-of-thought reasoning and achieve benchmarks on par with OpenAI’s o1, without relying on human-annotated reasoning examples. V3’s multi-head latent attention (MLA) compresses KV caches by 93.3%, boosting throughput 5.76×, while its multi-token prediction (MTP) densifies training signals for improved data efficiency and smoother inference. DeepSeek’s tight integration with NVIDIA hardware treats thousands of GPUs as a unified system, reducing idle time and maximizing utilization through novel accumulation fixes in FP8. Although OpenAI’s o3-mini surpassed R1’s reasoning benchmarks two weeks after R1’s debut, DeepSeek generated substantial hype due to its open accessibility, freely downloadable models, and the reproducibility of its methods—evidenced by a UC Berkeley lab replicating R1-zero techniques on a smaller model for just $30. These developments underscore a shift toward more democratized, cost-efficient AI, setting the stage for B2C and B2B innovation at reduced budgets and signaling an opportune moment to launch AI-focused startups.</p>

<blockquote>
  <p>The best time to build an AI startup is arguably now, as barriers to entry continue to fall and foundational models like DeepSeek-V3 and R1 set new benchmarks for open collaboration.</p>
</blockquote>

<h2 id="deepseek-and-its-early-hype-at-2025">DeepSeek and its early hype at 2025</h2>

<p>In early 2025, DeepSeek’s rapid ascent through the AI landscape ignited a frenzy of excitement—and anxiety—across the technology and investment communities. Media outlets and social feeds were ablaze with sensational headlines about the model’s open-source weights, sub-$6 million training cost, and benchmark-beating performance. Venture capitalists rushed to back startups leveraging DeepSeek’s architecture, driving up valuations and deal volumes in a matter of weeks. Its mobile app also hits the <a href="https://tech.co/news/apple-google-play-store-deepseek-top">first place in both Apple and Google’s Playstore</a>. At the same time, traditional AI incumbents and their investors faced palpable panic, scrambling to reassure stakeholders that proprietary models and closed-weight strategies would maintain their competitive moats.</p>

<p>Despite the hype, discerning technology leaders recognized that separating lasting innovations from short-lived buzz was critical.</p>

<blockquote>
  <p>…models like LLaMA do not use MoE and must activate all 405 billion parameters for each token, leading to 11× more computation per inference step.</p>
</blockquote>

<!-- more -->

<h2 id="deepseek-v3-architecture-and-innovations">DeepSeek V3 Architecture and Innovations</h2>

<p>One notably difference to highlight is that there are two distinct AI models - DeepSeek-V3 and DeepSeek-R1. DeepSeek-V3 is a general-purpose large models, which is comparable to OpenAI’s general models <a href="https://openai.com/index/hello-gpt-4o/">GPT-4o</a>. Released at the end of Jan 2025, DeepSeek-R1 is a reasoning model, which applied various algorithmics improvements to optimize its reasoning capabilities and its performance is comparable with <a href="https://openai.com/index/o1-and-new-tools-for-developers/">OpenAI’s o1</a>. Most of the remarkable technical performances and efficiency were actually discussed first in DeepSeek-V2 technical paper and <a href="https://arxiv.org/pdf/2402.03300">DeepSeekMath paper</a> (published in Feb 2024). DeepSeek-V3 snitches many engineering techniques together primarily to provide coimpute and training efficiencies. Let’s explore these engineering techniques below:</p>

<p><img src="/assets/images/deepseek-v2-architecture.png" alt="Architecture of DeepSeek-V2" />
Figure 1 - illustration of the Architecture of DeepSeek-V2 found in the <a href="https://arxiv.org/pdf/2405.04434">technical paper</a></p>

<h3 id="mixture-of-experts-and-activation-efficiency">Mixture-of-Experts and Activation Efficiency</h3>

<p>While MoE isn’t a new concept, DeepSeek-V3’s core innovation is its mixture-of-experts (MoE) design - 671 billion total parameters reside in dozens of expert subnetworks, but only about 37 billion parameters are activated per token, reducing compute overhead drastically compared to dense models of similar size. In contrast, models like LLaMA do not use MoE and must activate all 405 billion parameters for each token, leading to 11× more computation per inference step. While MoE architectures have been explored previously, efficient training at this scale is notoriously difficult due to load-balancing challenges; DeepSeek introduces an auxiliary-loss-free strategy that stabilizes expert routing without complex losses, ensuring consistent GPU utilization.</p>

<h3 id="native-fp8-training-with-accumulation-fixes">Native FP8 Training with Accumulation Fixes</h3>

<p>DeepSeek-V3 trains natively in 8-bit floating point (FP8) format instead of FP16 or FP32, effectively increasing FLOPS (floating point operations per second) while cutting memory consumption in half relative to FP16. To prevent the accumulation of small numerical errors inherent to FP8 arithmetic, DeepSeek engineers introduced periodic accumulation merges back to FP32, a technique termed the “FP8 accumulation fix,” which preserves model quality while enabling thousands of GPUs to work in concert without irrecoverable training instability. With this method, V3’s full training consumed only 2.788 million H800 GPU hours—equivalent to a <a href="https://blog.convogrid.ai/2025/02/03/deepseek-v3-a-game-changer-in-a-i-heres-why-it-matters/">headline-grabbing</a> $5.5 million—and yet maintained stable loss curves throughout, never requiring training rollbacks.</p>

<p><img src="/assets/images/deepseek-multi-head-attention.png" alt="DeepSeek-V2 - Multi-Head Attention (MHA)" />
Figure 2 - illustration of Multi-Head Attention (MHA), Grouped-Query Attention (GQA), Multi-Query Attention (MQA), and Multi-head Latent Attention (MLA)</p>

<h3 id="multi-head-latent-attention-mla-and-kv-cache-compression">Multi-Head Latent Attention (MLA) and KV Cache Compression</h3>

<p>A significant bottleneck for large LLM inference is the key-value (KV) cache stored in VRAM, which can occupy terabytes at high sequence lengths. First introduced in the DeepSeek-V2 <a href="https://arxiv.org/pdf/2405.04434">technical paper</a>, DeepSeek-V3’s MLA mechanism compresses KV caches into a latent representation and reconstructs them on demand, shrinking KV storage by 93.3% and raising maximum generation throughput by 5.76×. MLA was first validated in <a href="https://huggingface.co/deepseek-ai/DeepSeek-V2">DeepSeek-V2</a> (May 2024) and demonstrates how latent space compression can circumvent VRAM constraints on massive MoE architectures.</p>

<p><img src="/assets/images/deepseek-multi-token-prediction.png" alt="DeepSeek-V3 Multi-Token Prediction (MTP)" />
Figure 3 - illustration of Multi-Token Prediction (MTP) implementation, anticipating future tokens at each step, densifies training signal, providing more feedback per step for better data effciency, improves representation planning, allowing model to pre-plan sequences for smoother, more coherent outputs.</p>

<h3 id="multi-token-prediction-mtp">Multi-Token Prediction (MTP)</h3>

<p>Whereas standard causal LLMs predict one token at a time, DeepSeek-V3’s MTP module predicts multiple future tokens simultaneously, densifying the training signal and providing richer feedback per training step. Unlike the other general-purpose large language models, this strategy enhances data efficiency and accelerates convergence while enabling “lookahead” during inference, where the model pre-plans several tokens ahead for more coherent outputs. MTP can also underpin speculative decoding, in which the model proposes token batches to minimize round-trip latency, further reducing sequential processing steps and boosting throughput.</p>

<h3 id="hardware-integration-and-system-level-optimizations">Hardware Integration and System-Level Optimizations</h3>

<p>DeepSeek partnered closely with NVIDIA to optimize every layer of the compute stack. By integrating networking, CUDA libraries, and low-level scheduling, they present thousands of GPUs as a single logical device, allowing AI researchers to focus on model design rather than resource management. Even at FP8, GPU utilization peaks at just 34.2% without these system-level improvements; DeepSeek’s unified approach raises utilization closer to peak through pipelining and asynchronous data transfers, reducing idle waits for data movement or caching.</p>

<h2 id="deepseek-r1-the-reasoning-model">DeepSeek R1: The Reasoning Model</h2>

<p>Bringing all these elements discussed above together, DeepSeek-V3 stands out as one of the most impressive general-purpose base models available on the market, and it has maintained its relevance for quite some time now (June 2025). However, the release of the DeepSeek-R1 model is what truly made waves. While most LLMs can be enhanced by prompting them to think step-by-step, reasoning models differentiates themselves by being specifically trained to break-down complex problems and engage in a deep, paragraph-length reasoning steps. This focused training allows them to tackle challenging tasks more effectively than traditional prompting methods.</p>

<h3 id="evolution-from-v3-to-r1-and-release-context">Evolution from V3 to R1 and Release Context</h3>

<p>DeepSeek-R1 launched at the end of January 2025 as a reasoning-optimized variant of DeepSeek-V3, targeting benchmarks where chain-of-thought and multi-step reasoning are critical—particularly math and coding tasks. While many practitioners achieve better reasoning by prompting general models to think step-by-step, R1 was trained specifically to break down problems paragraph by paragraph, similar to OpenAI’s o1 which demonstrated chain-of-thought prowess in September 2024.</p>

<p><img src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2023/08/31/ML-14874_image001.jpg" alt="RLHF" />
Figure 3 - illustration of RLHF can be performed traditionally on <a href="https://aws.amazon.com/blogs/machine-learning/improving-your-llms-with-rlhf-on-amazon-sagemaker/">Amazon SageMaker</a></p>

<h3 id="pure-reinforcement-learning-and-group-relative-policy-optimization-grpo">Pure Reinforcement Learning and Group Relative Policy Optimization (GRPO)</h3>

<p>Unlike conventional RLHF or RLAIF pipelines that rely on human or AI-generated feedback, DeepSeek-R1-zero (the initial iteration) used a purely rule-based grading system: the model’s final outputs on math and coding problems were scored based on accuracy and formatting with simple heuristics. These scores were then fed into a newly proposed RL algorithm called Group Relative Policy Optimization (GRPO), which allowed the model to learn chain-of-thought behaviors without external expert examples. Over thousands of RL steps, R1-zero exhibited emergent reasoning skills—extended chains of thought and self-corrections akin to an “Aha moment” when it recognized mistakes. You will be able to see more technical details in their <a href="https://arxiv.org/pdf/2405.04434">technical paper</a>.</p>

<p><img src="/assets/images/deepseek-grpo.png" alt="Group Relative Policy Optimization (GRPO)" />
Figure 4 - a snapshot of the technical equation of GRPO in <a href="https://arxiv.org/pdf/2501.12948">DeepSeek-R1 technical paper</a></p>

<h3 id="cold-start-fine-tuning-to-address-readability">Cold-Start Fine-Tuning to Address Readability</h3>

<p>Early versions of R1-zero mixed English and Chinese arbitrarily in its reasoning steps, suffering from poor readability for international users (switching between English and Chinese at random). To resolve this, DeepSeek introduced a cold-start fine-tuning phase, stated in the <a href="https://arxiv.org/pdf/2501.12948">technical paper</a> using structured reasoning examples to nudge the model toward consistent English-based chains-of-thought. The fine-tuned R1 matched or exceeded o1 on multiple standardized math and coding benchmarks, with outputs far more comprehensible to global developers.</p>

<h2 id="performance-and-benchmark-comparisons">Performance and Benchmark Comparisons</h2>

<p><img src="/assets/images/deepseek-r1-performance-jan-2025.png" alt="Benchmark performance of DeepSeek-R1" />
Figure 5 - Benchmark performance of DeepSeek-R1 found in <a href="https://arxiv.org/pdf/2501.12948">DeepSeek-R1 technical paper</a></p>

<h3 id="deepseek-r1-vs-openai-o1-and-o3-mini">DeepSeek-R1 vs. OpenAI o1 and o3-mini</h3>

<p>On major math and coding tasks like GSM8K and HumanEval, DeepSeek-R1’s zero-shot reasoning performance closely paralleled OpenAI’s o1, despite R1’s reliance on pure RL training without human-labeled chain-of-thought examples. However, merely two weeks after R1’s release, OpenAI introduced <a href="https://openai.com/index/introducing-o3-and-o4-mini/">o3-mini</a> — an updated reasoning model that outperformed both o1 and R1 on many benchmarks — demonstrating how rapidly the frontier is shifting.</p>

<h3 id="community-driven-accessibility-and-reproducibility">Community-Driven Accessibility and Reproducibility</h3>

<p>DeepSeek’s decision to open-source R1 (including intermediate checkpoints) and host it freely on platforms like <a href="https://huggingface.co/deepseek-ai/DeepSeek-R1">Hugging Face</a> led to swift community adoption and experimentation. Within a month, a <a href="https://mashable.com/article/openai-o1-reasoning-model-rival-less-than-50-dollars">UC Berkeley research group</a> applied R1-zero’s reinforcement learning techniques to a smaller model, achieving comparable reasoning on a $30 GPU budget, underlining DeepSeek’s commitment to reproducibility and democratization. This accessibility fueled widespread excitement, particularly among startups seeking cost-effective alternatives to closed-weight models from OpenAI, Anthropic, and Google.</p>

<blockquote>
  <p>… suffering from poor readability for international users (switching between English and Chinese at random). To resolve this, DeepSeek introduced a cold-start fine-tuning phase…</p>
</blockquote>

<h2 id="cost-efficiency-and-hype-dynamics">Cost Efficiency and Hype Dynamics</h2>

<h3 id="training-cost-breakdown-and-efficiency">Training Cost Breakdown and Efficiency</h3>

<p>DeepSeek’s claim of $5.5 million total training cost for V3 (and downstream R1 phases) referred only to final-phase compute on H800 GPUs. Early-stage pretraining and algorithmic optimizations kept cumulative expenditures well below the headline figure, making it feasible for smaller research labs to replicate core methods. The cost-per-billable-token and overall frame-per-dollar metrics put V3 and R1 among the most efficient contemporary models, challenging narratives that only deep-pocketed incumbents can lead AI progress.</p>

<h3 id="hype-factors-beyond-pure-algorithms">Hype Factors Beyond Pure Algorithms</h3>

<p>While DeepSeek-R1’s algorithmic breakthroughs were noteworthy, a significant portion of its hype stemmed from the fact that the model and its associated papers (e.g., “DeepSeekMath” from February 2024) were readily downloadable, and its app was free to use without restrictions. By contrast, OpenAI’s o1 and o3-mini are accessible only via API, with usage fees that constrain large-scale experimentation. DeepSeek’s zero-cost barrier attracted a broad user base almost overnight, amplifying media coverage and discussions about potential shifts in global AI power balances.</p>

<h2 id="implications-for-ai-landscape-and-startups">Implications for AI Landscape and Startups</h2>

<h3 id="democratization-of-high-end-ai">Democratization of High-End AI</h3>

<p>DeepSeek’s approach demonstrates that high-performance LLMs with state-of-the-art reasoning can be developed transparently at a fraction of traditional costs. By open-sourcing both V3 and R1 architectures, DeepSeek has blurred the lines between academic research and industry labs, inviting academic and indie contributions to an ecosystem once dominated by proprietary incumbents. This democratization accelerates new use cases in B2C and B2B, from personalized assistants to domain-specific reasoning services, without requiring multi-million-dollar budgets.</p>

<h3 id="opportunity-for-new-entrants">Opportunity for New Entrants</h3>

<p>As GPU workloads become more efficient—thanks to FP8 training, MoE, MLA, and MTP—smaller teams can deliver competitive AI products with fewer resources. The fact that a group of researchers at UC Berkeley reproduced a R1-like reasoning model on a $30 compute budget highlights how accessible high-end research has become. For senior technology leaders evaluating AI strategies, this signals that the competitive moat around large LLMs is narrowing: innovation now centers on software optimizations, integration, and specialized fine-tuning rather than raw parameter count alone. The best time to build an AI startup is arguably now, as barriers to entry continue to fall and foundational models like DeepSeek-V3 and R1 set new benchmarks for open collaboration.</p>

<blockquote>
  <p>Even though OpenAI’s o3-mini eventually surpassed R1 on some reasoning benchmarks, DeepSeek’s open, transparent pipeline reshapes how we think about democratizing AI.</p>
</blockquote>

<h2 id="what-have-i-learnt-so-far">What have I learnt so far</h2>

<p>DeepSeek’s dual approach — rolling out an efficient, MoE-based V3 model alongside a reasoning-focused R1 model — shows that open-weight, cost-effective LLMs can truly rival the big closed-source players. Their innovations in FP8 training, MoE routing, KV cache compression with MLA, and multi-token prediction offer exciting blueprints for the next generation of AI infrastructure. What really caught my attention is R1’s pure reinforcement learning (RL) approach to chain-of-thought training, which proves that clever algorithms can sometimes beat just scaling up hardware. This resonates well with me because pure RL has been a key focus in research labs for years — <a href="https://deepmind.google/research/projects/alphago/">DeepMind’s AlphaGo</a>, for example, used thousands of self-play games to <a href="https://www.bbc.com/news/technology-35810133">beat the world’s top Go player</a> back in 2016. Then in 2019, OpenAI showed RL’s potential again by training a robotic hand to <a href="https://www.theverge.com/2019/10/15/20914575/openai-dactyl-robotic-hand-rubiks-cube-one-handed-solve-dexterity-ai">solve a Rubik’s Cube</a> and by <a href="https://openai.com/index/openai-five-defeats-dota-2-world-champions/">beating top human players in DOTA 2</a>. Even though OpenAI’s o3-mini eventually surpassed R1 on some reasoning benchmarks, DeepSeek’s open, transparent pipeline reshapes how we think about democratizing AI.</p>

<p>For anyone leading tech teams, the <a href="https://simons.berkeley.edu/news/how-deepseek-changes-llm-story">DeepSeek story</a> is a great reminder to focus on compute efficiency, embrace open research, and invest in specialized reasoning to stay ahead in this fast-moving AI space.</p>]]></content><author><name>Bryan Chua</name></author><category term="Tech" /><category term="DeepSeek-V3" /><category term="DeepSeek-R1" /><category term="AI Architecture" /><category term="LLMs" /><category term="Open Source AI" /><summary type="html"><![CDATA[A CTO's deep dive into the engineering behind DeepSeek-V3 and R1. Exploring MoE, FP8 training, MLA, and how open-source AI is democratizing the market.]]></summary></entry></feed>