Comparison Build vs Buy

"We'll just use Claude Code." What that actually costs an FM company.

Every FM IT director in 2026 has fielded the same board email. Here is the honest version of the answer, from a team that runs on Claude Code every day.

Topic: Build vs buy with AI dev tools · 10 min read · Published May 2026

   FROM PROTOTYPE TO PRODUCTION FM AUTOMATION.

   weekend 1     ████████████████████████░░░░░░    "Claude wrote a bill extractor!"
   month 1       ██████████████████░░░░░░░░░░░░    edge cases, quoted-reply tails, dozens of supplier formats
   month 3       ██████████████░░░░░░░░░░░░░░░░    a CAFM ships a breaking API change. silent failures.
   month 6       ████████████░░░░░░░░░░░░░░░░░░    eval set? regression tests? drift detection?
   month 9       ██████████░░░░░░░░░░░░░░░░░░░░    new model lands. half the prompts re-baseline.
   month 12      ████████░░░░░░░░░░░░░░░░░░░░░░    "Sam (the AI champion) handed in his notice."
   month 18      ██████░░░░░░░░░░░░░░░░░░░░░░░░    "should we look at a vendor?"

   CUMULATIVE £   £15k → £40k → £85k → £140k → £210k → £320k     [1]
   ACCURACY       vibes → first eval → drift → re-baseline → unknown after upgrade

   [1] One UK senior engineer loaded + Anthropic API + SOC 2 readiness + legal.
       Sources at the end of the article.

The board email is always the same shape.

"Saw a demo of Claude Code at the conference. It writes working software. Why are we paying TYTEN £X a month when our IT team can just use this themselves?"

It is a fair question. Claude Code is genuinely remarkable.

So here is the honest answer, written by a team with twenty years of building and running production software behind it, and a clear view of what these tools change and what they do not. The answer is not "you cannot do it." The answer is that the gap between a working prototype and an FM back-office that runs in production for fifty thousand work orders a month, across six clients, with audit logs and an on-call rotation, is much wider than the demo suggests.

What Claude Code is genuinely great at

Before the case against, the case for. We are not neutral here. We ship faster because of Claude Code. Specifically:

Scaffolding. Spinning up a new admin page, a small internal tool, the bones of a service: minutes, not days.
One-off scripts. "Pull last month's contractor data from the supplier portal and reformat it for finance." Done before the kettle boils.
Refactors. Renaming a field across dozens of files, tidying up dead code, splitting a tangled module: machine-pace work that used to be a half-day.
First-pass analysis. "Here is a 200-row export from the portal, what does it actually contain?" Saves two hours every time.
Documentation. Inline notes, runbooks, the project docs that always slip.

If your IT team is not already using Claude Code on every other ticket, that is the conversation to have first. It is the highest-leverage productivity tool to land in 2026 and it is cheaper than a meeting room.

What Claude Code is not

Claude Code writes code. An FM back-office automation platform is not a codebase. It is a codebase, plus thousands of small decisions that nobody on your IT team has ever made before. The decisions are the product.

Six categories of decision that Claude Code cannot make for you, and that absorb the bulk of an FM platform's lifetime engineering cost:

1. Domain knowledge

An FM bill extractor is not "extract supplier name, line items, totals." It is: handle DiscountRate-shaped lines that Xero will reject if you forward them verbatim. Recognise that "callout fee" and "minimum charge" mean the same thing on three different supplier templates and different things on a fourth. Detect when a supplier has rebadged their invoices mid-quarter and the ALIASES table needs an update. Know that this client requires a 12% markup pre-VAT and that one requires it post-VAT.

None of that is in any prompt that Claude can be handed cold. It is two years of edge cases, captured one supplier at a time. Your IT team starts at row zero.

2. Eval and regression

Here is the question every FM finance director asks within the first three months of an in-house build: "How accurate is it, actually?"

"Vibes" is not an answer. The answer requires a labelled holdout set. Hundreds of bills, manually corrected by a finance person who did the work before the AI did. A nightly job that runs the current model and prompt against the holdout and reports field-level accuracy. A drift detector that flags when this week's accuracy is two points below last week's. A regression suite that runs on every prompt change.

None of that is code that Claude Code wants to write for you. It is the unglamorous infrastructure that decides whether you trust the output enough to send the bill to the customer. We have been building ours for two years. It is the single biggest reason our clients let the AI post supplier invoices unattended.

3. The integration treadmill, with AI-flavoured extras

The integration list is the same one that sinks every classical in-house FM build: Infraspeak, SimPRO, JobLogic, Unifocus, Xero, Sage, Microsoft 365, the client portals. Two to four dev-months per year per integration. Eighteen to thirty-six dev-months a year before you ship a new feature.

The AI build adds two more lines on top:

Model upgrades. Anthropic ships a new flagship model every few months, each one better than the last. Each upgrade reshapes what your prompts need to say to keep producing the same answers. Some need reworking. Some look fine on a quick check and break in subtle ways only an automated test set will catch. It is a permanent re-tuning project.
Prompt regressions. A small change to handle one supplier silently breaks extraction for two others. Without a proper regression process, you find out when a customer emails. The discipline to catch that before it ships is something every AI team has to learn, usually the hard way.

4. Production reliability

Claude Code writes "happy path." Production is the other 5% that breaks at 3am.

A short list of the kind of safeguards a production platform needs, that a Claude-generated v1 will not have:

An always-on watcher that pages the on-call within seconds of the first failed request.
A health check that runs immediately after every release, because the most subtle failures are the ones that happen quietly at restart.
Sensible behaviour when the database has a brief blip, so a five-second wobble does not cascade into thousands of failed background jobs.
Guards against accidents in production — a stray destructive command, a misconfigured deploy — before they hit live data.
A multi-stage kill switch on anything that emails customers, so a routing bug never sends thousands of messages to the wrong tenant.
Nightly database backups, retained, and tested for restore.

Each of those is learned from a real incident, somewhere, by someone. Each takes a senior engineer days to design and harden. Claude Code can help write the implementation, but only once you know what to ask for. Knowing what to ask for is the experience an FM IT team starts without.

5. Compliance and audit

Your enterprise client asks, in week one of the contract, for: GDPR data residency, audit-log retention with immutability, MFA enforcement on every admin surface, a data processing agreement, a SOC 2 statement of work, evidence that customer data is not used to train a third-party model, and a runbook for data subject access requests.

Claude Code can draft any of those documents. Claude cannot tell you whether your architecture actually meets them, whether the contract you signed with Anthropic has the right data-handling carve-outs, whether your in-house deployment of an LLM-driven platform has a lawful basis under UK GDPR Article 6, or whether your audit-log table is robust enough to satisfy an enterprise security review. Those are answers an experienced platform engineer with a privacy lawyer on speed-dial gives. Not a chat session.

6. The bus factor

The dirty secret of LLM-assisted development is that it makes the bus factor worse, not better.

Claude Code writes code at a velocity that no individual human ever fully reads. The "AI champion" on your IT team is the only person with end-to-end mental model of why a particular prompt has the shape it has, why three different retry strategies live in different services, why one queue handler swallows a specific exception class on purpose. When that person leaves, you do not have a codebase you can hand to the next engineer. You have a transcript graveyard.

This is the same risk the in-house build has always had, accelerated. The Stack Overflow tenure data still applies: roughly 30% turnover per year. With LLM-generated code, the institutional knowledge that walks out of the door with a single resignation is larger.

The maths

The naive build pitch looks like this:

£95k–£140k

One senior software engineer, total UK employer cost. Glassdoor's May 2026 London median is £91,669; PayScale and Indeed sit between £76k and £88k. Loaded with employer NIC (15% above the £5k threshold), 3% pension, holiday, equipment, and workspace lifts the all-in cost to roughly 1.4×–1.5× base. [1]

£400–£1,200

Anthropic API spend per month for a small FM operation. Anthropic publishes Sonnet 4.6 at $3 / $15 per million input / output tokens and Opus 4.7 at $5 / $25, with up to 90% off via prompt caching. A few hundred bills and job sheets a day lands in this range. [2]

£0

Vendor licence fees you are no longer paying.

That is the pitch. Roughly £100k–£160k a year, total. Now the line items the pitch leaves off:

Line item

What the pitch shows / What it actually costs

Engineering coverage

1 senior engineer, "they'll do the AI bits in 30% of their time" / Production AI plus integrations is 1.0 FTE, minimum. The 30% number assumes a steady state your build never reaches.

On-call & reliability

Not in the pitch / If finance bills run on this, you have just put one senior engineer on a 24/7 rotation with no backup. Add a second person (~0.4 FTE, ~£40k–£60k loaded) or accept the risk.

Eval infrastructure

Not in the pitch / Two to three months of one engineer's time to build a labelled set, regression suite, and drift detector. Then permanent maintenance of the labels as suppliers change formats.

Domain capture

"We know our processes" / Knowing a process is not the same as turning it into prompt context the model can use. Eighteen months to accumulate a guidance store that a vendor has had two years to build across six clients.

Model upgrade churn

Not in the pitch / One engineer-week per major model upgrade across the platform. Three to four upgrades a year. A twelve-week annual "tax" you cannot opt out of.

Integrations

"Claude can write those" / Two to four dev-months per integration per year, forever, irrespective of whether Claude or a human writes the diff. Breaking changes are not coding effort, they are detection-and-response effort.

Compliance posture

Not in the pitch / SOC 2 readiness alone runs $20k–$80k (~£15k–£65k) for a small organisation in year one, plus six months and a fractional CISO. Add legal time for the DPA and GDPR review. Recurring on every renewal of every enterprise client. [3]

Bus factor

"Sam loves AI, he'll own it" / Stack Overflow's 2025 survey shows roughly a quarter of developers have under five years' experience and the median is well under a decade at any one employer. When the AI champion leaves, the platform freezes. The board asks why no new chase rules have shipped in two quarters. [4]

Add the line items honestly and a small FM operation lands at:

1.0 FTE senior engineer, loaded: £95k–£140k
~0.4 FTE on-call backup engineer: £40k–£60k
Eval & regression infrastructure (year-1 build, then maintenance): £25k–£50k
Anthropic API at production volume: £5k–£15k per year
SOC 2 readiness + audit (year 1, recurring at lower rates): £15k–£65k [3]
Legal & privacy review (DPA, GDPR posture): £10k–£25k
Model-upgrade re-baselining tax, ~3–4 weeks of senior engineer time per year: £8k–£14k

Year-one all-in: £200k–£370k. Steady-state from year two: £160k–£280k once SOC 2 and the eval suite are amortised. Roughly double once the platform has to support more than a single client. And the McKinsey/Standish data on enterprise software builds (45% over budget, 56% less value than planned, 19% cancelled outright) applies in full. [5]

When in-house Claude Code is the right answer

Sometimes it is. Three honest cases:

The workflow is genuinely yours. A truly bespoke ops dashboard, a commercial pricing model, an in-house bid-management surface, a client-facing portal with your brand. These are not back-office plumbing. Claude Code is a tremendous accelerant on building them, and they are not for sale by anyone.
You have an existing engineering function with the senior bench to absorb the AI track. Two senior backends, an infra engineer, a designer, a PM. If you have that team for other reasons, adding the AI workstream marginally with Claude Code is rational.
The thing you would automate is small, isolated, and not on the customer-facing path. "Pull last month's CSV from the supplier portal and pre-classify it for the AP person" is a 200-line script. You do not need a vendor for it. Use Claude Code on a Tuesday afternoon and ship.

What is not on that list: bill extraction at scale, contractor chase orchestration, PPM cert lifecycle, multi-CAFM integration, document QA, audit-log compliance, multi-tenant supplier learning. Those are the workloads that look easy in a Claude Code session and turn into the eighteen-month build above.

The honest summary

Claude Code is one of the top AI dev tools on the market in 2026. The argument here is not that the tools are weak.

The argument is that an FM back-office is not the codebase. It is the codebase, plus the eval set, plus the compliance posture, plus the integration history, plus the on-call playbook, plus the supplier-by-supplier guidance store, plus the model-upgrade discipline, plus the institutional memory of every incident the platform has lived through. Tooling makes writing the code faster. It does not collapse the gap between code and platform.

If your IT team has the time and the budget to spend the next two years building all of that for a single FM operation, the build is honest and the path is clear. If your IT team has a backlog of customer-facing work that earns revenue, the cleaner answer is the same one as the classical build-vs-buy: let a vendor own the back-office, and use Claude Code on the things only your team can build.

We are happy either way. We just want the board to see the full line item list before they sign off.

The two years your team would spend building it.

Same Claude models. Same level of automation. Productionised, evaluated, integrated, and on-call. Live in 4 to 6 weeks instead of two years.

Book a Demo

Sources & methodology

UK senior software engineer salary, 2026. Glassdoor London median £91,669 (May 2026); Indeed £87,957; PayScale £76,300. Loaded employer cost = base × ~1.4–1.5 to cover employer NIC at 15% above the £5,000 secondary threshold, 3% minimum auto-enrolment pension, holiday, sick, equipment, and workspace. UK Employer Costs Calculator (2026) and HMRC NIC rates for 2025/26.
Glassdoor London Senior SWE 2026, Indeed London Senior SWE, PayScale UK Senior SWE 2026, UK Total Employment Cost Calculator.
Anthropic Claude API pricing, 2026. Sonnet 4.6: $3 input / $15 output per million tokens. Opus 4.7: $5 / $25. Up to 90% savings via prompt caching, up to 50% via batch processing. Volume estimate based on a few hundred bills and job sheets per day at typical token sizes.
Anthropic Claude API pricing.
SOC 2 readiness and audit costs, 2026. Readiness work $10k–$40k; Type 2 audit fees $20k–$60k; small-to-mid-size SaaS all-in first-year spend $30k–$80k. Sterling conversion at ~£0.8 / $1.
Sprinto: SOC 2 compliance cost (2026), SecureLeap: SOC 2 audit + total spend (2026).
Developer experience and tenure. Stack Overflow Developer Survey 2025: experience distribution and leadership-vs-developer profile.
2025 Stack Overflow Developer Survey.
Enterprise software project outcomes. Standish Group CHAOS Report 2020: 31% successful, 50% challenged, 19% failed across 50,000+ tracked projects. McKinsey / Oxford BT Centre, 2012, 5,400 IT projects: large IT projects average 45% over budget, 7% over time, 56% less business value than projected.
Standish Group CHAOS Report, McKinsey: Delivering large-scale IT projects on time, on budget, on value.

Numbers above are mid-market estimates for a single-client UK FM operation. Actual costs depend on team seniority, document volume, integration count, and the compliance posture your largest customer demands. The figures are intended to be defensible to a finance director, not a final quote.