← Back to Insights
Comparison Build vs Buy

"We'll just use Claude Code." What that actually costs an FM company.

Every FM IT director in 2026 has fielded the same board email. Here is the honest version of the answer, from a team that runs on Claude Code every day.

Topic: Build vs buy with AI dev tools · 10 min read · Published May 2026 · Updated 15 May 2026
fm-bill-extractor $ claude > build an FM bill extractor prototype ········ production grade ··· ? ~85% feels done. the last 15% runs forever. CUMULATIVE BUILD COST £85k & up to keep one in-house extractor production grade £15k £40k £85k prototype edge cases, formats, drift forever “should we look at a vendor?”

The board email is always the same shape.

"Saw a demo of Claude Code at the conference. It writes working software. Why are we paying TYTEN £X a month when our IT team can just use this themselves?"

It is a fair question. Claude Code is genuinely remarkable.

So here is the honest answer, written by a team with twenty years of building and running production software behind it, and a clear view of what these tools change and what they do not. The answer is not "you cannot do it." The answer is that the gap between a working prototype and an FM back-office that runs in production for fifty thousand work orders a month, across six clients, with audit logs and an on-call rotation, is much wider than the demo suggests.

What Claude Code is genuinely great at

Before the case against, the case for. We are not neutral here. We ship faster because of Claude Code. Specifically:

  • Scaffolding. Spinning up a new admin page, a small internal tool, the bones of a service: minutes, not days.
  • One-off scripts. "Pull last month's contractor data from the supplier portal and reformat it for finance." Done before the kettle boils.
  • Refactors. Renaming a field across dozens of files, tidying up dead code, splitting a tangled module: machine-pace work that used to be a half-day.
  • First-pass analysis. "Here is a 200-row export from the portal, what does it actually contain?" Saves two hours every time.
  • Documentation. Inline notes, runbooks, the project docs that always slip.

If your IT team is not already using Claude Code on every other ticket, that is the conversation to have first. It is the highest-leverage productivity tool to land in 2026 and it is cheaper than a meeting room.

What Claude Code is not

Claude Code writes code. An FM back-office automation platform is not a codebase. It is a codebase, plus thousands of small decisions that nobody on your IT team has ever made before. The decisions are the product.

Six categories of decision that Claude Code cannot make for you, and that absorb the bulk of an FM platform's lifetime engineering cost:

1. Domain knowledge

An FM bill extractor is not "extract supplier name, line items, totals." It is: handle DiscountRate-shaped lines that Xero will reject if you forward them verbatim. Recognise that "callout fee" and "minimum charge" mean the same thing on three different supplier templates and different things on a fourth. Detect when a supplier has rebadged their invoices mid-quarter and the ALIASES table needs an update. Know that this client requires a 12% markup pre-VAT and that one requires it post-VAT.

None of that is in any prompt that Claude can be handed cold. It is two years of edge cases, captured one supplier at a time. Your IT team starts at row zero.

2. Eval and regression

Here is the question every FM finance director asks within the first three months of an in-house build: "How accurate is it, actually?"

"Vibes" is not an answer. The answer requires a labelled holdout set. Hundreds of bills, manually corrected by a finance person who did the work before the AI did. A nightly job that runs the current model and prompt against the holdout and reports field-level accuracy. A drift detector that flags when this week's accuracy is two points below last week's. A regression suite that runs on every prompt change.

None of that is code that Claude Code wants to write for you. It is the unglamorous infrastructure that decides whether you trust the output enough to send the bill to the customer. We have been building ours for two years. It is the single biggest reason our clients let the AI post supplier invoices unattended.

3. The integration treadmill, with AI-flavoured extras

The integration list is the same one that sinks every classical in-house FM build: Infraspeak, SimPRO, JobLogic, Unifocus, Xero, Sage, Microsoft 365, the client portals. Two to four dev-months per year per integration. Eighteen to thirty-six dev-months a year before you ship a new feature.

The AI build adds two more lines on top:

  • Model upgrades. Anthropic ships a new flagship model every few months, each one better than the last. Each upgrade reshapes what your prompts need to say to keep producing the same answers. Some need reworking. Some look fine on a quick check and break in subtle ways only an automated test set will catch. It is a permanent re-tuning project.
  • Prompt regressions. A small change to handle one supplier silently breaks extraction for two others. Without a proper regression process, you find out when a customer emails. The discipline to catch that before it ships is something every AI team has to learn, usually the hard way.

4. Production reliability

Claude Code writes "happy path." Production is the other 5% that breaks at 3am.

A short list of the kind of safeguards a production platform needs, that a Claude-generated v1 will not have:

  • An always-on watcher that pages the on-call within seconds of the first failed request.
  • A health check that runs immediately after every release, because the most subtle failures are the ones that happen quietly at restart.
  • Sensible behaviour when the database has a brief blip, so a five-second wobble does not cascade into thousands of failed background jobs.
  • Guards against accidents in production — a stray destructive command, a misconfigured deploy — before they hit live data.
  • A multi-stage kill switch on anything that emails customers, so a routing bug never sends thousands of messages to the wrong tenant.
  • Nightly database backups, retained, and tested for restore.

Each of those is learned from a real incident, somewhere, by someone. Each takes a senior engineer days to design and harden. Claude Code can help write the implementation, but only once you know what to ask for. Knowing what to ask for is the experience an FM IT team starts without.

5. Compliance and audit

Your enterprise client asks, in week one of the contract, for: GDPR data residency, audit-log retention with immutability, MFA enforcement on every admin surface, a data processing agreement, a SOC 2 statement of work, evidence that customer data is not used to train a third-party model, and a runbook for data subject access requests.

Claude Code can draft any of those documents. Claude cannot tell you whether your architecture actually meets them, whether the contract you signed with Anthropic has the right data-handling carve-outs, whether your in-house deployment of an LLM-driven platform has a lawful basis under UK GDPR Article 6, or whether your audit-log table is robust enough to satisfy an enterprise security review. Those are answers an experienced platform engineer with a privacy lawyer on speed-dial gives. Not a chat session.

6. The bus factor

The dirty secret of LLM-assisted development is that it makes the bus factor worse, not better.

Claude Code writes code at a velocity that no individual human ever fully reads. The "AI champion" on your IT team is the only person with end-to-end mental model of why a particular prompt has the shape it has, why three different retry strategies live in different services, why one queue handler swallows a specific exception class on purpose. When that person leaves, you do not have a codebase you can hand to the next engineer. You have a transcript graveyard.

This is the same risk the in-house build has always had, accelerated. The Stack Overflow tenure data still applies: roughly 30% turnover per year. With LLM-generated code, the institutional knowledge that walks out of the door with a single resignation is larger.

7. And then there's "Claude for Small Business"

On 13 May 2026, Anthropic released Claude for Small Business. Fifteen prepackaged workflows that drop Claude into the office and accounting tools small businesses already run on. The pitch is the obvious one. "Anthropic ship this stuff themselves now, why are we paying anyone?"

Three things worth saying about that.

The office layer
what every small business does
send and chase invoices approve expenses book a meeting update the CRM draft a customer email file a return schedule a shift post on social
Anthropic's release covers this layer.
The trade line
The FM layer
what an FM company runs on
triage a tenant fault, decide the SLA, pick the contractor chase a Legionella or fire-safety cert to its statutory deadline close a reactive job with audit-ready paperwork match a supplier invoice to the work order behind it run PPM schedules across hundreds of assets build the monthly client SLA and KPI report
An FM company runs on this layer.

First, the shape of the release. The fifteen workflows stop at the office surface. Invoices, scheduling, paperwork, the work that runs in front of any small business regardless of what the business actually is. The release does not go deeper than that. There is no FM workflow. No workflow for healthcare, manufacturing, legal, or construction. What makes one trade different from another sits outside the release.

For an FM operation, the picture is clear enough. The work that defines the business sits outside the release. Triaging a reactive work order against the right SLA and dispatching it to a qualified contractor. Running PPM schedules across hundreds of assets so the statutory certificate is in the file before the deadline. Chasing contractor paperwork and validating it before the audit. Producing a monthly client report that holds up under contract review. None of those appear in the catalogue, and the architecture does not point that way.

Second, what the release says about Anthropic's broader strategy. Everything Anthropic has built so far points in the same direction. Eight out of every ten dollars of revenue come from the API, from other companies' products calling Claude rather than from consumers using Claude directly. The major cloud providers compete to host Claude rather than the other way round. An open integration standard published in late 2024 lets any third party connect their software to Claude without going through Anthropic for permission. None of that is the behaviour of a company that intends to be the application.

The release sits inside the same strategy rather than departing from it. The workflows do not ship as a standalone product. They ship as a plugin inside Claude itself, built out of two reusable parts: written instructions for how a piece of work gets done, and connectors into other software using the same open standard. It is the platform shipping a reference example of what the platform makes possible, not the platform becoming an applications company. [6]

Strategically, this is the mobile-OS pattern playing out in AI. The labs are platforms, the way iOS and Android are platforms. Apple ships a few first-party apps with the OS, Calendar, Notes, Reminders, Mail, and those apps are deliberately baseline. They keep the bottom of the market satisfied and leave the platform free to do the harder, more profitable work underneath. Nobody runs a real business on Apple's Notes app. They install software built by a company that does nothing else. Anthropic has now shipped its first-party app bundle for small business. Useful at the bottom of the market. Baseline by design. An FM operation sits at the top of the stack, not the bottom.

For an in-house build, the implication points the same way. Counting on Anthropic to ship an FM workflow at some future date is counting on a move the company's strategy steers it away from. The layer Anthropic wants is underneath. The FM workflow, when it arrives, will arrive from the vendors who build on top of them.

There is a more practical dimension to all this. Once an in-house team actually opens the tin, day-to-day is rougher than the pitch suggests. The promise is clean: Claude does the work, you approve before it sends. The lived reality, the moment a workflow drifts off the prepackaged template, is messier. Claude tells you the task cannot be done from the chat and you need to write a Python script. The chat environment will not reach the systems you actually need it to talk to. You move to a real machine, hit a rate limit, restart, and discover the bill at production volume is several multiples of the demo number. Something breaks at three in the morning and there is no number to call. None of this is unusual. It is how every powerful generic tool behaves when an organisation pushes it past the happy path. A vendor lives inside those rough edges so the customer does not have to. An in-house team learns them one outage at a time, in production.

Third, the question of who is leading next year. Today the conversation is about Anthropic. Three months ago it was a different lab. Three months before that it was another. The state of the art in AI moves faster than any other category of enterprise software, and the leader at the time of a build is rarely the leader at the time of the next refresh. An in-house FM platform written against one provider's prompt patterns, context windows, tool-use conventions and pricing is a platform that will face an expensive migration the first time a competitor opens a clear lead. A vendor absorbs that migration as part of normal product work. An internal IT team absorbs it as a project, on top of everything else on the roadmap.

The cost of staying current is shared differently in each case. A vendor spreads it across every customer who runs the platform. An in-house build carries it alone, and faces the same bill the next time the leader changes.

There is also a difference in ambition. The release is aimed at giving small businesses something useful out of the box, so Claude lands inside more day-to-day workflows. That is a reasonable goal for a horizontal platform. It is a different goal from running an FM back-office end to end, autonomously. Useful general help and complete autonomous operation are not the same product.

What stays constant in a market that moves this fast is not the underlying model. It is the team on the ground that listens to what the FM operation needs, watches every shift in the broader market, and rebuilds the parts that need rebuilding. Every specialised trade has reached the same conclusion. Legal work goes to lawyers. Audits go to auditors. Facilities-management software is built by the people who build facilities-management software, on whichever model is the right one this quarter.

The maths

The naive build pitch looks like this:

£95k–£140k
One senior software engineer, total UK employer cost. Glassdoor's May 2026 London median is £91,669; PayScale and Indeed sit between £76k and £88k. Loaded with employer NIC (15% above the £5k threshold), 3% pension, holiday, equipment, and workspace lifts the all-in cost to roughly 1.4×–1.5× base. [1]
£400–£1,200
Anthropic API spend per month for a small FM operation. Anthropic publishes Sonnet 4.6 at $3 / $15 per million input / output tokens and Opus 4.7 at $5 / $25, with up to 90% off via prompt caching. A few hundred bills and job sheets a day lands in this range. [2]
£0
Vendor licence fees you are no longer paying.

That is the pitch. Roughly £100k–£160k a year, total. Now the line items the pitch leaves off:

Line item
What the pitch shows / What it actually costs
Engineering coverage
1 senior engineer, "they'll do the AI bits in 30% of their time" / Production AI plus integrations is 1.0 FTE, minimum. The 30% number assumes a steady state your build never reaches.
On-call & reliability
Not in the pitch / If finance bills run on this, you have just put one senior engineer on a 24/7 rotation with no backup. Add a second person (~0.4 FTE, ~£40k–£60k loaded) or accept the risk.
Eval infrastructure
Not in the pitch / Two to three months of one engineer's time to build a labelled set, regression suite, and drift detector. Then permanent maintenance of the labels as suppliers change formats.
Domain capture
"We know our processes" / Knowing a process is not the same as turning it into prompt context the model can use. Eighteen months to accumulate a guidance store that a vendor has had two years to build across six clients.
Model upgrade churn
Not in the pitch / One engineer-week per major model upgrade across the platform. Three to four upgrades a year. A twelve-week annual "tax" you cannot opt out of.
Integrations
"Claude can write those" / Two to four dev-months per integration per year, forever, irrespective of whether Claude or a human writes the diff. Breaking changes are not coding effort, they are detection-and-response effort.
Compliance posture
Not in the pitch / SOC 2 readiness alone runs $20k–$80k (~£15k–£65k) for a small organisation in year one, plus six months and a fractional CISO. Add legal time for the DPA and GDPR review. Recurring on every renewal of every enterprise client. [3]
Bus factor
"Sam loves AI, he'll own it" / Stack Overflow's 2025 survey shows roughly a quarter of developers have under five years' experience and the median is well under a decade at any one employer. When the AI champion leaves, the platform freezes. The board asks why no new chase rules have shipped in two quarters. [4]

Add the line items honestly and a small FM operation lands at:

  • 1.0 FTE senior engineer, loaded: £95k–£140k
  • ~0.4 FTE on-call backup engineer: £40k–£60k
  • Eval & regression infrastructure (year-1 build, then maintenance): £25k–£50k
  • Anthropic API at production volume: £5k–£15k per year
  • SOC 2 readiness + audit (year 1, recurring at lower rates): £15k–£65k [3]
  • Legal & privacy review (DPA, GDPR posture): £10k–£25k
  • Model-upgrade re-baselining tax, ~3–4 weeks of senior engineer time per year: £8k–£14k

Year-one all-in: £200k–£370k. Steady-state from year two: £160k–£280k once SOC 2 and the eval suite are amortised. Roughly double once the platform has to support more than a single client. And the McKinsey/Standish data on enterprise software builds (45% over budget, 56% less value than planned, 19% cancelled outright) applies in full. [5]

When in-house Claude Code is the right answer

Sometimes it is. Three honest cases:

  • The workflow is genuinely yours. A truly bespoke ops dashboard, a commercial pricing model, an in-house bid-management surface, a client-facing portal with your brand. These are not back-office plumbing. Claude Code is a tremendous accelerant on building them, and they are not for sale by anyone.
  • You have an existing engineering function with the senior bench to absorb the AI track. Two senior backends, an infra engineer, a designer, a PM. If you have that team for other reasons, adding the AI workstream marginally with Claude Code is rational.
  • The thing you would automate is small, isolated, and not on the customer-facing path. "Pull last month's CSV from the supplier portal and pre-classify it for the AP person" is a 200-line script. You do not need a vendor for it. Use Claude Code on a Tuesday afternoon and ship.

What is not on that list: bill extraction at scale, contractor chase orchestration, PPM cert lifecycle, multi-CAFM integration, document QA, audit-log compliance, multi-tenant supplier learning. Those are the workloads that look easy in a Claude Code session and turn into the eighteen-month build above.

The honest summary

Claude Code is one of the top AI dev tools on the market in 2026. The argument here is not that the tools are weak.

The argument is that an FM back-office is not the codebase. It is the codebase, plus the eval set, plus the compliance posture, plus the integration history, plus the on-call playbook, plus the supplier-by-supplier guidance store, plus the model-upgrade discipline, plus the institutional memory of every incident the platform has lived through. Tooling makes writing the code faster. It does not collapse the gap between code and platform.

If your IT team has the time and the budget to spend the next two years building all of that for a single FM operation, the build is honest and the path is clear. If your IT team has a backlog of customer-facing work that earns revenue, the cleaner answer is the same one as the classical build-vs-buy: let a vendor own the back-office, and use Claude Code on the things only your team can build.

We are happy either way. We just want the board to see the full line item list before they sign off.


The two years your team would spend building it.

Same Claude models. Same level of automation. Productionised, evaluated, integrated, and on-call. Live in 4 to 6 weeks instead of two years.

Book a Demo

Sources & methodology

  1. UK senior software engineer salary, 2026. Glassdoor London median £91,669 (May 2026); Indeed £87,957; PayScale £76,300. Loaded employer cost = base × ~1.4–1.5 to cover employer NIC at 15% above the £5,000 secondary threshold, 3% minimum auto-enrolment pension, holiday, sick, equipment, and workspace. UK Employer Costs Calculator (2026) and HMRC NIC rates for 2025/26.
      Glassdoor London Senior SWE 2026, Indeed London Senior SWE, PayScale UK Senior SWE 2026, UK Total Employment Cost Calculator.
  2. Anthropic Claude API pricing, 2026. Sonnet 4.6: $3 input / $15 output per million tokens. Opus 4.7: $5 / $25. Up to 90% savings via prompt caching, up to 50% via batch processing. Volume estimate based on a few hundred bills and job sheets per day at typical token sizes.
      Anthropic Claude API pricing.
  3. SOC 2 readiness and audit costs, 2026. Readiness work $10k–$40k; Type 2 audit fees $20k–$60k; small-to-mid-size SaaS all-in first-year spend $30k–$80k. Sterling conversion at ~£0.8 / $1.
      Sprinto: SOC 2 compliance cost (2026), SecureLeap: SOC 2 audit + total spend (2026).
  4. Developer experience and tenure. Stack Overflow Developer Survey 2025: experience distribution and leadership-vs-developer profile.
      2025 Stack Overflow Developer Survey.
  5. Enterprise software project outcomes. Standish Group CHAOS Report 2020: 31% successful, 50% challenged, 19% failed across 50,000+ tracked projects. McKinsey / Oxford BT Centre, 2012, 5,400 IT projects: large IT projects average 45% over budget, 7% over time, 56% less business value than projected.
      Standish Group CHAOS Report, McKinsey: Delivering large-scale IT projects on time, on budget, on value.
  6. Claude for Small Business launch, May 2026. Anthropic announced Claude for Small Business on 13 May 2026 with fifteen prepackaged workflows. The release ships as a plugin inside Claude Cowork, built from skills and connectors via the open Model Context Protocol Anthropic published as a public standard in late 2024. None of the workflows target a vertical such as FM, healthcare, manufacturing, or construction.
      Anthropic: Claude for Small Business, TechCrunch: Anthropic courts small business owners, SiliconAngle: Anthropic launches Claude for Small Business, Anthropic plugins overview.

Numbers above are mid-market estimates for a single-client UK FM operation. Actual costs depend on team seniority, document volume, integration count, and the compliance posture your largest customer demands. The figures are intended to be defensible to a finance director, not a final quote.