Parallel Code Audits with Agent Teams — Five Opuses Arguing at Once
A hands-on account of using Claude Code's Agent Teams to run a five-agent parallel audit across the whole project. Role separation, the false-positive retraction workflow, and a few surprising findings.
Opening — Hitting a Wall as the Project Grew
This blog is developed entirely solo. In the early days — a handful of pages and two or three games — I could keep the whole codebase in my head. Over time that broke down. Containers, hooks, and shared utilities piled up, and subtle side effects of the same patterns started scattering across files. I hit the point where "I should really audit the whole thing".
So on April 10, 2026, I started a paid Claude subscription. The reasoning was straightforward:
- The site had grown noticeably (file count, feature count, external API integrations)
- Keeping solo tabs on codebase-wide consistency, security, and performance had hit its limit
- Large-scale refactors and optimizations needed a reliable "reviewer AI"
For a few days after subscribing I used Claude Code's everyday features — file editing, testing, refactoring — to get comfortable. Then I noticed Agent Teams: a setup where multiple AI agents run in independent parallel sessions and can message each other. The thought arrived naturally: "what if I ran five Opus agents in parallel and had them audit the whole project?" I ran it, and the findings exceeded expectations.
This is the write-up.
What Is Agent Teams?
An experimental Claude Code feature where multiple agents run concurrently in independent sessions and can message each other. It's more than parallel invocation:
- Each agent has its own context
- A shared task list distributes work
- SendMessage enables agent-to-agent debate / collaboration
- The leader (main session) makes final decisions and applies fixes
"One AI with several hats" and "several AIs with distinct roles arguing with each other" produce very different outputs. The latter yields surprisingly thorough audits.
Team Composition — 5 Roles
The team I assembled:
| Role | Agent Type | Area |
|---|---|---|
| Frontend auditor | yujh-auditor-frontend | components / views / routing / styles / a11y |
| Core logic auditor | yujh-auditor-core | business logic / state machines / pure functions |
| Data auditor | yujh-auditor-data | API / models / storage / caching / security |
| Infra auditor | yujh-auditor-infra | build tools / deps / CI / manifest |
| Verifier | yujh-auditor-verifier | disproves other auditors' findings |
The key piece is the Verifier. When the four auditors raise 🔴 issues, the verifier independently checks "is this really a problem?". Its default assumption is "every 🔴 might be a false positive". A devil's advocate that doesn't simply trust auditor claims.
The Actual Audit Flow
1. Leader Preflight
The main session (leader) surveys the project and builds a project profile — framework, build tooling, rules files, etc. This profile is passed to each teammate at spawn so they know "you're auditing this project".
2. Parallel Audit Starts
Each auditor walks its domain in its own session, raising 🔴 issues as tasks. Example titles:
B2-01 [data] NASA API key exposed in client bundle
B2-02 [data] visit_log lacks spam prevention
D-03 [infra] GitLab Variables Protected flag not verified
A-08 [frontend] Cosmic Barrage resize debounce 400ms lacks comment
B1-07 [core] useCosmicBarrageAudio missing visibilitychange handler
...
3. Verifier Disproves
Verifier claims each 🔴 task. Re-reads the actual code and renders:
- ✅ Valid — confirmed; needs a fix
- ❌ False positive — rebut via SendMessage to the auditor
- ⚠️ Partial — partly right; propose downgrade to 🟡
4. Cyclical Debate → Leader Escalation
If auditor and verifier do 3 round trips without agreement, the verifier escalates to the leader. Leader reads the files directly and makes the final call.
Memorable Findings
Real ones from the actual audits:
🔴 → ✅ Confirmed
- NASA API key exposed in the client bundle (B2-01) — the 40-char key was readable in the deployed bundle. Switched to edge injection via CloudFront Function
- Guestbook lacked rate limiting — only 30-second session cooldown; swapping sessions bypassed it. Added daily and per-post caps via triggers
🔴 → ❌ Retracted
- "Exoplanet API rewrite doesn't work in prod" — the data agent raised this citing that
next.config rewrites()is dev-only. Leader checked: the production CloudFront has a behavior-based proxy for that path. False positive — couldn't be verified from files alone since it's infrastructure outside the repo - "
useMemo(() => getUserId(), [])re-runs every render — perf issue" — verifier's initial verdict. Reality: the SSR-computednullgets cached permanently — a real bug. The auditor's rebuttal kept it as 🔴
🔴 → 🟡 Downgraded
- Many "optimization opportunity" issues were verdicted "acceptable as-is for now" and downgraded to 🟡 (backlog)
Real Numbers
Results across two runs:
| Metric | Value |
|---|---|
| Total 🔴 raised | 30+ |
| Actually fixed | 8 |
| False positives retracted | 4 |
| 🟡 downgrades | 6 |
| POST-AUDIT pending | 2 |
The "4 false positives" number matters. If only the auditors had run (no verifier), those 4 likely would have been fixed — breaking actually-fine code and introducing new bugs. The devil's-advocate role paid off concretely.
Limitations — Know Before You Use
Agent Teams is powerful but has clear constraints.
No Session Resume
Create a team, pause it, and you can't pick up later. In-process teammates don't restore via /resume or /rewind. Design for one-shot runs.
One Team Per Session
Can't have multiple teams simultaneously. To re-invoke, the previous team must be explicitly cleaned up (TeamDelete).
Shutdown Delay
Even after sending shutdown_request, teammates only terminate after their current turn. Can take minutes.
Cost
Five Opuses in parallel isn't cheap. Weekly regular audits would be excessive — it fits "event-based" usage better: pre-release deep audits, post-large-refactor inspection.
Usage Tips
Pass Project Profile Explicitly
Include the leader-detected info (framework / build / rules files) in the spawn prompt. Otherwise the auditor can't judge against the project's conventions.
Aggressive Verifier Prompt
Strongly state "Verifier must suspect every 🔴 as a false positive". Otherwise verifiers tend to accept auditor claims verbatim.
Account for Out-of-Repo Infrastructure
3 of the 4 false positives this time involved CloudFront settings outside the repo. Auditors only see files, so assumptions about CloudFront / nginx / external API config must be re-verified by the leader.
Retrospective
Running audits via Agent Teams made "multiple AIs collaborating" feel concrete. The false-positive retraction workflow was the standout — findings that would have been silently fixed by a solo auditor got rebutted and retracted after the verifier checked actual infra.
The biggest value: even a solo project can get peer-review-like pressure. Especially powerful for global infrastructure and security issues.
If you're about to ship a big release, or need to sanity-check after a large refactor, it's worth trying. There's cost, but it's smaller than finding that one missed issue in production.
Guestbook
Leave a short note about this post
Loading...