# PKA

### How one bookkeeper built a 16-specialist AI workforce inside a command line, and used it to run a real practice.

*By Jimmie Needles · J2 Bookkeeping · New Braunfels, TX*
*First draft, May 2026*

---

## The Friction

A 40-client bookkeeping practice is not one job. It is roughly thirty jobs trench-coated as one. The client work itself is the smallest piece. Around it sits content marketing, tax research, payroll seams, app-stack triage, the website, the email backlog, the social calendar, the prospect list, the close checklist, the year-end planning, the "I think QuickBooks just broke again" support tickets. None of it is hard in isolation. All of it together is what makes a practice owner spend their Saturday on bookkeeping software instead of with their family.

I tried the usual answers. More VAs. More software. A practice management tool. None of it solved the actual problem, which was not "too much work." It was that I was the only person in the building who could route the work, hold the context, and make the judgment calls. The bottleneck was me.

In late 2025 I started building something different. By May 2026 it had a name, a sixteen-specialist roster plus an orchestrator at the front door, a Postgres database, a chat UI, and a real client book running through it every day. This is what it is and how it works.

---

## What PKA Is

PKA is an AI workforce. Sixteen specialists and one orchestrator, each a deeply-defined role with persistent memory, file-based deliverables, and integrations into the live tools the business actually runs on. It is built inside Claude Code, runs on one Windows machine, and is operated by one person from a command line and a local browser tab.

It is not a chatbot. It is not a prompt library. It is not a single AI assistant doing many things badly. It is a team, in the same sense that a small accounting firm is a team. Each member knows their job, the others' jobs, and how to hand work off.

Every team member has shipped real work. Not demo work. Tax returns, journal entries, monthly closes, content calendars, websites, integrations. The same firm software, the same clients, the same QuickBooks files a human staffer would touch.

---

## What It Has Actually Done

The most honest way to describe a system like this is to list what came out of it.

**One afternoon, one client, a full year of cleanup.** Reconciled twelve months of bank and credit card statements for a contractor client. Verified $144K of payroll reclass against the W-3. Imported and categorized a full year of American Express activity that had never been in QuickBooks. Resolved $14,304 of unapplied cash payment income through payment matching, invoice creation, and journal entries. Identified a $24K uncollected sales tax liability. Three work paper PDFs and a CPA cover note delivered, ready for the client's tax preparer.

**565 vendor checks pushed into QuickBooks in a single session.** A client needed to issue 565 player balance withdrawal checks totaling $18,059.03. Two Python scripts and one careful pass through the QBO API: 565 unique vendors created with full addresses, 565 Check-type purchases drawn against the right cash account, and the print queue flipped for the external check printer. The whole pipeline ran end-to-end without a human in the loop after the trigger.

**A 56-post monthly content calendar across five platforms.** Loomly-ready CSVs, captions in five platform-native voices, mapping to a 500-piece infographic library, plus AI-generated video reels. Produced monthly. The owner does not film. The owner does not write captions.

**A working budget-vs-actuals engine, in-house.** Built into the internal app. Pulls QBO data, runs client-specific adapters, produces variance analysis, stores run history. Q1 2026 reports delivered. No third-party BvA SaaS subscription.

**A document library with 5,810 documents indexed and searchable.** Full-text search across PDFs, spreadsheets, and Word docs. Drag-and-drop ingest. Linked to clients, projects, and tags.

**A chat UI for the entire team.** Forty-two threads, 1,009 messages, multi-thread history per agent, drag-and-drop attachments, streaming responses. The same skill prompts that the CLI uses, served through a web app at `http://atlas.local`.

If you wanted to argue this is a toy, none of these would exist.

---

## The Four Design Choices That Made It Work

If I had to point at why this works and a stack of impressive prompts on a desktop does not, it comes down to four things.

### 1. One orchestrator, never two

Every request goes to one front door: a router named Larry. He does no work himself. He reads what you typed, decides which specialist owns it, calls that specialist, and reviews what comes back before it lands in your hands.

This sounds small. It is not. The moment you have two front doors you have two opinions about who handles what, and the moment that happens you start losing things. One orchestrator is the simplest possible answer to "who is in charge here," and it is the answer that lets the rest of the system scale.

### 2. Team members are roles, not prompts

Every specialist is a full system prompt: who they are, what they know, how they decide, what they will and will not do, what they have to coordinate with other team members on. Not a one-paragraph "you are a helpful X" instruction. A full role definition, the kind you would write for a new hire.

When Larry calls Ledger (the bookkeeper), the model loads Ledger's complete role definition for the duration of the task and operates with that specialist's depth. When the task ends, the role goes away. The next task can be a different specialist with completely different domain knowledge, loaded fresh.

This is the difference between an "AI assistant" and an "AI team." A team has people who know their job cold and stay in their lane.

### 3. Memory has to live in two places

The model itself does not remember between conversations. So memory has to live outside it, in two layers.

The first layer is local, in a file the harness reads automatically on every conversation. It holds user preferences, feedback corrections, project context, and reference pointers. Things that should never be forgotten. Things that, if forgotten, would cause the same conversation to be had twice.

The second layer is shared, in a cloud-hosted knowledge base where any team member can drop notes, meeting transcripts, research outputs, and client context. Things that one team member learns and the others might need.

Without both layers, the team has amnesia and you have to re-explain everything every morning. With both layers, a specialist invoked in a new conversation can pick up exactly where the last session left off.

### 4. Deliverables are files, not chat messages

The team produces real artifacts. PDFs, Excel files, CSVs, markdown documents, Python scripts, web pages. Not chat responses with copy-pasteable content. Actual files written to disk, sitting in folders the business operates out of.

A structured folder hierarchy makes this work. There is an "Owner Inbox" for finished, reviewed work. A "Team Inbox" for working files and handoffs between specialists. A "Team" folder for role definitions and templates. Every specialist knows where to deliver and where to look.

Why does this matter? Because chat is ephemeral and files are durable. A bookkeeping practice does not run on chat. It runs on the spreadsheet someone can open six months from now.

---

## The One App You Had To Build

Most of the system runs in plain markdown files and shell tools. One thing did not: the internal app.

The app is called Atlas. It is a single Next.js application running on the same machine as everything else, serving two faces on a local network address. One face is the document library: 5,810 documents indexed, full-text searchable, content-addressed so the same file ingested twice produces one record with two locations. The other face is the team chat UI: a sidebar of every chat-enabled specialist, multi-thread history per agent, drag-and-drop attachments, streaming responses.

Under both faces is a single Postgres database, sixteen tables, one schema. The chat backend spawns a Claude CLI process per turn, so every conversation in the web UI uses the exact same skill prompts the CLI does. There is no second source of truth.

The latest addition, finished on the day this was written, is a project-events ingest endpoint. Every specialist can now POST a structured event into a shared event stream: a decision, a next action, a handoff, a chat summary. An inline classifier (a small, fast model) reads the event and routes it to the right project automatically. Unclassified events land in a bucket UI for one-click human re-tagging. This is what lets the team write its own history without a human transcribing every session.

The point of Atlas is not "I built a CRM in a weekend." The point is that the team needed a single system of record, and a thin local-first web app was the lowest-friction way to get one.

If you are doing this yourself, this is the one piece of custom software you will likely need. Everything else can be markdown and shell.

---

## The Roster

Sixteen specialists and one orchestrator. Each one a full role definition, each one with a slash command that loads it.

| Name | Department | Role | What They Do |
|------|-----------|------|--------------|
| **Larry** | Orchestration | Personal Assistant | The front door. Routes every request, delegates, reviews before delivery. Never does the work. |
| **Ledger** | Operations | Senior Accounting & Finance | QuickBooks and Intuit Enterprise Suite bookkeeping, FP&A, tax planning support across the full client book. |
| **Slate** | Operations | FinOptimal & Sheets Expert | Automates QBO closes with Accruer, Booker, Wrangler. Expert Google Sheets modeling, Apps Script, FP&A. |
| **Cord** | Operations | Data Connector & Spreadsheet | Pulls QBO and other sources into Sheets via Coefficient. Builds financial models in Quadratic. |
| **Rex** | Operations | Tax Strategist | High-net-worth and business tax: entity structure, S-Corp compensation, QBI, cost segregation, sunset planning. |
| **Reid** | Operations | Operations Strategist | Priority setting, quarterly goals, accountability partner to the owner. |
| **Echo** | Operations | Email Manager | Inbox triage, tagging, drafting, client correspondence, on-demand summaries. |
| **Tally** | Operations | AP Specialist | Bill capture (WellyBox / Spark Receipt), Ramp bill pay, Rippling payroll seam, QBO bill coding and approval, 1099 prep. |
| **Loom** | Operations | Workflow Architect | Audits the firm's day-to-day, maps and automates the right workflows, kills the wrong ones, delegates the builds. |
| **Vox** | Marketing | B2B Content Marketing | LinkedIn strategy, social posts, AI video reels, CapCut and Canva production end to end. |
| **Pixel** | Marketing | Canva Pro Brand Designer | Branded visuals, templates, graphics, brand consistency. |
| **Wren** | Technology | WordPress & Elementor | Designs, builds, and optimizes the firm's WordPress site. |
| **Atlas** | Technology | Full-Stack Developer | Designs and builds the internal app, the document library, the team chat, the integrations. Owns the entire stack. |
| **Riv** | Technology | Integrations Specialist | API, MCP, Zapier connections. Custom Python for QBO bulk operations and any cross-tool plumbing. |
| **Pax** | Research | Senior Researcher | Researches the full skill profile of any role before it is hired. |
| **Nolan** | Research | HR Manager | Builds new specialists from Pax's research. |
| **Corvus** | Personal | Fantasy Lore Archivist | Encyclopedia content for a personal archmaester site. The proof that the same pattern works outside the firm. |

The roster is not fixed. Sage and Finn were hired, did not work, were retired, and were replaced by Ledger. Maya, Lane, and Lux were three marketing specialists that collapsed into one (Vox) once it became clear the seam between them was costing more than the depth was earning. Tally was hired this week. Loom and Vega were hired earlier this month.

The team is supposed to evolve. The orchestrator-and-Nolan loop is what makes evolution cheap.

---

## What Makes This Different

| It is not... | It is... |
|--------------|----------|
| A prompt library | Persistent specialists with domain knowledge, system access, memory, and real deliverable output. |
| A single AI assistant | Sixteen specialists with narrow expertise: a bookkeeper who knows every QuickBooks menu path, a tax strategist who models S-Corp comp, a developer who ships production Postgres and Next.js, a content marketer who runs five platforms. |
| A demo | Running a 40-client practice daily. Deliverables go to real CPAs, real clients, real social media accounts. Code runs in production on a Windows service that survives reboots. |
| Static | Self-expanding. Need a new capability, Pax researches the role, Nolan builds the specialist, and it is producing work within a session. |

---

## Lessons I Did Not Expect

Three of these were not in the plan. They were forced on me by what actually happened.

### Lesson one: kill the team members that do not work, on purpose

The first two specialists I built (Sage, Finn) did not earn their keep. They were "almost right" for too long. I kept patching their role definitions, hoping a small tweak would land them. It never did.

What worked was retiring both of them and replacing the function with one new specialist (Ledger) defined from scratch around the work I had actually needed them to do. Two months of patching, deleted in a sentence.

The lesson is not "kill what does not work." Everyone says that. The lesson is "your specialists are markdown files, the cost of replacement is one session, and the longer you patch the harder it gets to see that the role definition was wrong, not the prompt." If you build with that in mind, you delete more, sooner.

### Lesson two: consolidate when the handoff costs more than the depth

I had three marketing specialists for a stretch: one for social media, one for LinkedIn, one for AI video. Each one had real depth. The problem was the handoffs. A single LinkedIn carousel needed three of them to coordinate, and the coordination cost more than the specialist depth saved.

They collapsed into one specialist (Vox) that owns the full content pipeline. Output got better, not worse. Specialization is only worth it if the seam between specialists is cheap.

### Lesson three: write the team's history before you need it

For a long time, the only memory of what happened on any given day was in my head and in chat scrollback. That worked for one specialist. It did not work for sixteen.

The fix was a shared event stream. Every specialist can post a structured event (a decision, a next action, a chat summary, a handoff) to a single endpoint. A small classifier auto-routes the event to the right project. Unclassified events land in a bucket for one-click human re-tagging.

This sounds boring. It is the single most important plumbing in the system. Without it, "the team" is a fiction held together by my memory. With it, the team has a history I can read.

---

## The Stack (Appendix)

| Layer | Technology |
|-------|-----------|
| AI runtime | Claude Code (Claude Opus 4.7, 1M context) |
| Orchestration | Larry, a system prompt in `CLAUDE.md` |
| Team members | Claude Code skills (markdown system prompt per role) |
| Memory | A local auto-loaded `MEMORY.md` plus a cloud knowledge base (Mem.ai) |
| Integrations | MCP servers (Mem, Canva, Google Workspace, Gmail, Playwright, Windows desktop) |
| Internal app | Atlas (Next.js 15, React 19, TypeScript, Tailwind) |
| Database | PostgreSQL 18 (local Windows service), full-text search via `tsvector` |
| Production | PM2 as elevated Windows service, reachable on the LAN at `http://atlas.local` |
| File system | Content-addressed storage by SHA-256, structured folder hierarchy |
| Accounting | QuickBooks Online, Intuit Enterprise Suite via QBOA, FinOptimal |
| Content | Canva Pro, Loomly, HeyGen, ElevenLabs, CapCut, Quadratic |
| Website | WordPress with Neve and Elementor Pro |
| Automation | Zapier, custom QBO OAuth 2.0, Google Drive watcher |

Zero of these required a license the firm did not already need for its normal operations. The marginal software cost of running the AI workforce on top of a normal practice is the Claude Code subscription. Everything else was already paid for.

---

## By the Numbers

| Metric | Number |
|--------|--------|
| AI team members (16 specialists + orchestrator) | 17 |
| Bookkeeping clients managed | 40 |
| Documents in the internal library | 5,810 |
| Clients linked in the database | 52 |
| Chat threads in team chat | 42 |
| Chat messages logged | 1,009 |
| Posts per month across five platforms | 56 |
| Unapplied cash resolved in one afternoon | $14,304 |
| Largest single-session QBO push | 565 checks / $18,059.03 |
| Additional software licenses required | 0 |

---

## What This Is Not

It is not magic. It is a stack of careful decisions about who does what, where memory lives, how work moves, and what the front door looks like.

It is also not finished. The roster will keep changing. The internal app will keep adding tables. The lessons section will keep getting longer.

But it works. It is what is running my practice today, and it is what will be running it tomorrow.

---

*Built by Jimmie Needles using Claude Code. Every specialist, workflow, and deliverable above is real and in production at the time of writing.*
