Artificial Intelligence 29.04.2026

AI Over Budget: What Bitkom’

8 Min. Reading time

One third of DACH companies say, according to the Bitkom study of April 2026, that AI has become more expensive than planned. This is not a question of mis‑estimated volumes, but an architecture and procurement issue. CFOs in the mid‑market who don’t look at cost models now will cut staff in twelve months instead of cutting the vendor.

Key Takeaways

41 % AI usage, 33 % cost overruns: The Bitkom 2026 study shows a doubling of adoption compared with 2024 – but one in three companies exceed their budget (Bitkom, April 2026).
Cost overrun is an architecture issue, not volume: Hyperscaler lock‑in, inefficient RAG pipelines and missing model routing drive inference costs far more than usage amount.
Procurement must be re‑engineered: Classic license logic doesn’t fit token pricing – CFOs need total‑cost‑of‑inference models before the next contract is signed.

What the Bitkom figures mean for CFOs

The Bitkom study from April 2026 measures two realities at once. 41 % of companies in the DACH region actively use AI – in 2024 it was 17 %. Adoption is therefore not the problem. The second figure shows the problem: 33 % report cost overruns. 19 % have cut jobs as a result. That is a stark number in a study that otherwise mainly gathers sentiment.

CFOs who only read the adoption rate miss the expensive side. Cost overruns are not evenly distributed. Industry observations show they cluster in two groups. First, mid‑size firms that entered a hyperscaler contract without an architecture review. Second, companies that have scaled a generative‑AI use case into core business without calculating inference cost per transaction.

For finance, this means: the AI line item is no longer a software line item. It behaves like a variable energy cost – with the difference that most ERP systems don’t model it that way. Writing it into the forecast as a fixed license cost means you’ve already lost.

33 %

of DACH companies with active AI usage report cost overruns in 2026. 19 % have therefore cut jobs.

Source: Bitkom press release, April 2026

The 19 % figure is the real alarm signal. It says: the mid‑market started adopting AI without defining a financial safety margin. As soon as cash flow tightens, the cuts hit the personnel side, not the vendor. That is the opposite of what the AI investment thesis promises – and it almost always stems from nobody having done an honest total‑cost‑of‑inference calculation before launch.

Four architectural anti‑patterns that generate cost overruns

Cost overruns rarely stem from the license fee. They arise from the architecture that sits beneath that fee. Four patterns show up especially often in DACH pilot projects – and all of them are avoidable if the CFO asks the right questions before the architecture sign‑off.

What drives costs

Hyperscaler lock‑in: inference on a single platform, no price comparison in the contract
RAG pipelines that load whole documents into the context instead of chunking
Premium model for every request – even for simple classifications
Oversized inference workload (GPU reserved 24/7 instead of on‑demand)

What bears the cost

Multi‑provider setup with a routing layer (at least two inference sources)
Token budget per use case, not per department
Small models as default, premium only when needed
Caching layer for recurring requests (often 30 % to 50 % of the load)

The most expensive of the four points is usually the RAG pipeline. A poorly built retrieval architecture loads 8,000 to 16,000 tokens of context for each request, even though 1,500 would suffice. When tokens are billed per million, that factor translates directly into the monthly bill. CFOs should therefore, at the first sign‑off, ask not for the model price but for the average token consumption per transaction – and for the worst‑case scenario during load spikes.

Hyperscaler lock‑in is the second major lever. Whoever builds a use case entirely on a single cloud without integrating a second inference provider has no negotiation anchor. In the DACH mid‑market this often results from a historically grown cloud dependency combined with sales pressure from the hyperscaler, which pitches the AI stack as a bundled offering. The consequence: price hikes cannot be absorbed because a replacement project costs six to nine months.

How a CFO Validates an AI Business Case (5 Steps)

Traditional software business cases check license price, implementation effort, and maintenance. With AI, the two most important items are missing: inference costs and scaling path. These five steps augment the business case so the real drivers become visible before the contract is signed.

CFO Validation in 5 Steps

Ask for the token profile per transaction. Not per user, not per month – per actual use‑case transaction. Input plus output. If the vendor cannot quantify this, the business case is not yet mature.

Calculate a peak‑load scenario. What does a day with triple volume cost? What about ten‑fold? With token pricing the cost scales linearly, with reserved GPUs it does not. Compare both models against each other.

Include model routing as a contract clause. The agreement must state that the provider offers different model sizes and that migration between them is possible. Otherwise the company pays a premium for simple classifications.

Define an exit path. How much would it cost to move the use case to a second provider? If the answer is six months, the risk pricing is mis‑calibrated.

Lock in quarterly re‑forecasting. AI costs belong in the rolling forecast, not in the annual budget block. Otherwise the CFO only sees the drift at the Q4 close.

The five steps are not a framework. They are the minimum discipline needed to prevent a €50,000 pilot from turning into a €380,000 cost block without anyone being able to say stop. In industry observations this is exactly what happens in most cost‑overrun cases: no one was tasked with spotting the drift.

Rethinking procurement: from license purchase to inference contracts

Traditional software procurement in the mid‑market buys a license per user, perhaps plus maintenance. With AI this only works in pilot phases. As soon as a use case goes into production, the token model blows past the license model. If procurement isn’t reshaped, the same cost overrun will reappear each quarter – just on a larger scale.

The first step in the overhaul is moving from per‑seat to per‑outcome contracts, which the vendor must be willing to accept. In the DACH mid‑market this is rarely negotiable outright, but a hybrid model works: a base fee covers a defined token amount; any usage beyond that is listed transparently. What must never happen is token consumption without a contractual cap. That is the default path in the 33‑percent cluster.

The second lever is vendor consolidation, which runs counter to the usual pattern. Where the rule of thumb has been “prefer three specialised suppliers to one‑stop‑shop,” AI inference flips that. Running three inference providers in parallel is expensive in setup and monitoring. Most often the sensible approach is: one primary provider for about 70 % of the volume plus a second for routing and negotiating leverage. Running three models simultaneously only pays off at scale levels that the typical mid‑market company won’t reach by 2026.

The third point concerns data clauses. The more use cases tap internal data, the more important restrictions on training use, data residency and audit rights become. These clauses are no longer just a GDPR issue; they are a cost driver: a vendor that excludes training use can offer better terms because its business model is calculated differently. CFOs should ask procurement to negotiate these actively rather than swallowing them as a standard clause.

The next twelve months: what CFOs should put in place

The 33‑percent figure won’t correct itself. Any mid‑market firm that has AI in production by 2026 but lacks a clear cost‑tracking mechanism is a candidate for the next round. Three concrete actions can be realistically implemented over the next twelve months without killing adoption.

12‑Month CFO Roadmap

Q2 2026

Take inventory of all production AI use cases. For each use case record token consumption, vendor, contract term, and exit effort. No new tool required – a spreadsheet is enough to start.

Q3 2026

Set a token budget per use case and establish alerts at 80 % consumption. If the vendor doesn’t provide this, add your own logging before the inference call.

Q4 2026

Create a procurement playbook for AI contracts. Minimum components: model‑routing clause, token profile, exit path, training restriction, quarterly pricing review.

Q1 2027

Onboard a second inference provider in parallel, activate the routing layer, and increase negotiating pressure on the primary provider. Goal: 5‑15 % price reduction through a demonstrable alternative.

The roadmap may look unglamorous – that’s intentional. AI cost control in the mid‑market doesn’t need new frameworks. It needs the willingness to treat the line item seriously before it eats into the personnel budget. The 19‑percent figure from the Bitkom study shows what happens when that willingness is missing.

Conclusion

The Bitkom figures for 2026 are not an adoption problem but an architecture and procurement problem. CFOs in the mid‑market can cushion the cost‑overrun wave if they treat AI costs as a variable line item, demand token profiles before signing contracts, and keep at least a second inference provider on standby. Those who don’t will face the same choice as the 19 percent after twelve months: cut staff or scale back the use case. Both paths cost more than building a clean procurement framework now.

Frequently Asked Questions

Why are 33 percent cost overruns in AI projects so high compared with classic IT projects?

Because AI inference is billed variably, whereas traditional software is fixed‑price. CFOs often calculate with a license model (fixed price per user) and overlook that token pricing scales per transaction. A use case that gains traction drives costs disproportionately – the exact opposite of classic software scaling.

What is Total Cost of Inference and how do I calculate it?

Total Cost of Inference adds up all expenses incurred per AI request: input tokens, output tokens, routing overhead, caching share, monitoring. Calculation for the mid‑market: average tokens per transaction × number of transactions per month × token price, plus a 10‑15 percent surcharge for peak loads. That yields an honest monthly estimate for your forecast.

Is it enough for our IT manager to make the architecture decision, or must the CFO sit at the table?

For pilot projects the IT lead is sufficient. Once a use case moves into production and inference costs exceed roughly 5 000 Euro per month, the finance function must be part of the architecture sign‑off. Hyperscaler lock‑in and missing model routing are no longer mere technical details but contract commitments with multi‑year cost impact.

How do I negotiate AI costs with a hyperscaler if I don’t have a second provider?

Without an alternative, negotiating power is limited. Realistically you have three levers: first, volume commitments in exchange for price reductions (typically 5‑10 percent); second, multi‑year contracts for cost caps; third, inclusion of a routing right in the agreement. More important is building a parallel second provider – without it future negotiations remain one‑sided.

How much more does a poorly built RAG pipeline typically cost in the mid‑market compared with an optimized one?

Industry observations show a factor of three to five at medium volume. A bad pipeline loads 8 000‑16 000 tokens of context per request, an optimized one 1 500‑3 000. At 50 000 requests per month that adds up to four‑ to five‑digit amounts – in the differential, not the total price. Caching layers and better chunking are often small technical tweaks with a large lever effect.

Source cover image: Pexels / Kampus Production (px:8353840)