Symbolbild: Datenqualität und KI im redaktionellen Magazinkontext
03.04.2026

Data Quality in SMEs: Why AI Fails Without Clean Data

7 min Read Time

AI projects fail – not usually because of the AI itself, but one layer deeper: the data. Any company investing in generative AI in 2026 without first auditing its data quality is burning budget and eroding trust in the technology.

The Key Takeaways

  • 57% unprepared: More than half of companies assess their own data as unfit for AI (Gartner, Q3 2024).
  • 60% abandonment rate: Per Gartner’s forecast, the majority of AI projects lacking a quality-assured data foundation will be abandoned (Gartner, February 2025).
  • 73% name data as the barrier: Data quality is the most frequently cited hurdle to AI success among decision-makers (Capital One/Morning Consult, July 2024).
  • Regulation intensifies pressure: The EU AI Act (Article 10) mandates demonstrable data quality for high-risk AI – effective August 2026.
  • Six dimensions decide quality: Completeness, accuracy, timeliness, consistency, uniqueness, and validity form the DAMA framework for measurable data quality.

The uncomfortable truth: Most data isn’t AI-ready

Germany is investing heavily in artificial intelligence. According to the Bitkom 2025 study, 36% of German companies already use AI actively – nearly double the figure from last year. Another 47% are planning or discussing deployment. Yet this enthusiasm masks a fundamental problem: The data underpinning these SME AI projects is, in most cases, not ready.

A Gartner survey of 248 data management leaders in Q3 2024 delivers sobering figures: 57% of companies judge their own data as unfit for AI. Even more alarming: 63% report either lacking appropriate data management practices – or being unaware of whether they have them. In February 2025, Gartner sharpened its forecast: 60% of all AI projects built on non-AI-ready data will be scrapped.

Not AI-ready
57 %
of companies
Projects abandoned
60 %
without data readiness
Barrier #1
73 %
cite data quality

Sources: Gartner Q3 2024, Gartner February 2025, Capital One/Morning Consult July 2024

Why GenAI exacerbates the data problem

Generative AI is far more sensitive to data quality than traditional analytics. A dashboard displaying erroneous sales figures will eventually raise red flags. But an AI model trained on inconsistent master data produces outputs that look plausible – yet are wrong – and no one notices immediately. That’s the core issue: GenAI renders poor data invisible rather than visible.

In classic reporting, data inconsistencies trigger obvious contradictions. If two different revenue figures appear in the same sales report, someone asks why. With an AI-powered forecasting model, that doesn’t happen: it calculates a seemingly plausible answer based on skewed data. Only when demand forecasts miss the mark for months – or a chatbot feeds customers incorrect product specs – does the underlying data problem surface. By then, it’s too late – and too expensive.

The Informatica CDO Insights 2025 report – a global survey of 600 Chief Data Officers – reveals the consequence: 67% of respondents failed to successfully transition even half of their GenAI pilot projects into production. Meanwhile, 43% of data leaders cite data quality, completeness, and readiness as their biggest obstacle in AI projects. At the same time, 92% of CDOs expressed concern that AI pilots are advancing without resolving existing data issues first.

The NTT DATA Global GenAI Study (November 2024), based on interviews with 2,300 decision-makers across 34 countries, confirms the picture: 70-85% of GenAI deployments fail to deliver their targeted return on investment. The most common reason? An insufficiently robust data foundation for production use.

Especially insidious: The typical SME operates five to fifteen disparate systems – from ERP and CRM to industry-specific solutions and manually maintained Excel spreadsheets. Each system uses its own data formats, maintenance processes, responsible parties – and often, its own definitions for seemingly simple terms like “active customer” or “open order.” Data quality erodes precisely at the interfaces between these systems – the very places where AI models must train cross-functionally. Without systematically mapping those fault lines, you cannot fix them.

The six dimensions of data quality

Data quality isn’t intuition – it’s measurable. The DAMA International Framework (Data Management Body of Knowledge) defines six quantifiable dimensions. For SMEs, an honest self-assessment against these criteria pays off:

Dimension What it measures Typical SME pain point
Completeness Are all required fields populated? CRM contacts missing industry or company size
Accuracy Do the data accurately reflect reality? Outdated customer addresses, incorrect item numbers
Timeliness Are the data current enough for their intended use? Inventory levels synced only once daily
Consistency Do data align across systems? Customer master data differs between ERP and CRM
Uniqueness Are there duplicates? Same supplier entered three times – spelled differently each time
Validity Do data conform to defined rules? Free-text fields instead of structured inputs

Analytics firm BARC confirms the relevance: In its annual Data, BI and Analytics Trend Monitor, data quality management has ranked among the top two priorities for six consecutive years – again landing second only to data security in 2024. It’s not a new challenge – but with AI, it becomes dramatically more costly.

A real-world example: A mid-sized machinery manufacturer wants to introduce AI-driven demand forecasting. Its ERP item master data is 85% complete – sounds acceptable. But the missing 15% disproportionately covers new products and high-margin spare parts. The forecasting model learns systematically from flawed input, blind to its most profitable items. The deviation only surfaces after six months – six months of lost optimization.

Regulatory pressure mounts

Beyond financial risk, regulatory pressure is rising. The EU AI Act introduces concrete data quality requirements for high-risk AI systems in Article 10: training, validation, and test data must be relevant, sufficiently representative, and – as far as possible – error-free and complete. Providers must demonstrate systematic bias detection and correction. The high-risk provisions take effect in August 2026.

While most SME AI applications – such as demand forecasting, chatbots, or process optimization – fall outside the high-risk category, companies deploying AI in HR, creditworthiness assessment, or safety-critical domains are directly affected. And even without formal high-risk classification, the AI Act sets a de facto standard increasingly expected by customers and partners.

Simultaneously, the CSRD tightens ESG data requirements. According to the Workiva Sustainability Practitioner Survey 2024 (2,000 professionals surveyed), 83% of companies already find collecting required sustainability data difficult – and 79% struggle with verification. The EFRAG standards include over 1,100 individual data points for CSRD reporting – a major challenge for any organization that hasn’t yet implemented systematic data quality governance.

Without solid data governance, companies face two parallel challenges: AI implementation and compliance. The upside? Investments made for AI data quality automatically benefit ESG reporting – and vice versa. Both demands converge on the same goal: structured, complete, and traceable data.

Five steps toward an AI-ready data foundation

Data quality isn’t a project with a start and end date. It’s a capability an organization must build. These five steps offer a realistic entry point for SMEs:

1. Conduct a data inventory. Before launching any AI initiative, ask: What data do we have, where is it stored, and who maintains it? Many SMEs underestimate the number of data sources. ERP, CRM, Excel files, SharePoint folders, email inboxes – count them all, omit nothing. The result is a data map: a clear overview of all sources, including responsible parties, update frequency, and quality ratings. This document forms the basis for every subsequent decision.

2. Measure quality – don’t guess. Use the six DAMA dimensions as a checklist. For your specific AI use case, identify the three most critical dimensions and test them via sampling. Example: For demand forecasting, completeness, timeliness, and consistency are vital; for a customer service chatbot, accuracy and validity matter most. Manually inspecting 100 records and extrapolating the error rate takes half a day – and yields a reliable baseline assessment.

3. Define clear ownership. Data quality won’t improve without unambiguous accountability. You don’t need a Chief Data Officer – but you do need one designated person per core system responsible for data upkeep. In SMEs, that’s often the department head – not IT. Crucially, responsibility must be backed by allocated time and tools. A sales manager tasked with CRM data quality “on the side” will inevitably deprioritize it.

4. Introduce automated checks. Manual cleanup doesn’t scale. Data observability tools like Soda.io or Great Expectations automatically detect anomalies – for instance, if a mandatory field suddenly appears blank in 30% of new records, or a numeric value deviates by orders of magnitude from its usual range. This market is growing at over 16% annually – and usage-based licensing makes these tools accessible even to smaller firms. For those avoiding new software, simple SQL queries or Python scripts on existing database infrastructure can serve as a starting point.

5. Start small – and learn. Don’t attempt to cleanse your entire data estate at once. Instead: select one concrete AI use case, secure only its data for quality, and learn from the experience. Insights from the first project – which sources proved problematic, which cleanup steps delivered the greatest impact – will transfer directly to future initiatives. Gartner forecasts that by 2028, 80% of GenAI business applications will be built on existing data management platforms. Laying that groundwork today positions you to capitalize on that shift.

Conclusion

The numbers are unequivocal: AI investments made without prior data quality assurance are high-risk bets. Fifty-seven percent of companies already know this – and still do too little. For SMEs, however, that gap also represents opportunity: Those who now clean up their data foundation gain a structural advantage over competitors who launch AI projects only to discover – too late – that the ground beneath them is unstable.

The first step need not be a massive undertaking. A data inventory focused on your most critical use case, an honest quality assessment, and clearly assigned accountability are enough to begin. Everything else follows – provided data quality is treated not as a one-off IT project, but as an ongoing management discipline. The technology is ready. The question is: Are your data?

Frequently Asked Questions

How do I know whether my data is AI-ready?

Test the six DAMA dimensions (completeness, accuracy, timeliness, consistency, uniqueness, and validity) using a sample drawn from your planned AI use case. If more than 10% of records fall short in any dimension, cleansing is essential before launching AI. Gartner estimates that 57% of companies would fail this test.

What does poor data quality cost?

Direct costs arise from flawed decisions, manual cleanup efforts, and failed projects. Indirect costs include eroded trust in AI initiatives and delayed digital transformation. The NTT DATA 2024 study shows that 70-85% of GenAI deployments miss their target ROI – often due to an inadequate data foundation.

Does an SME need a Chief Data Officer?

Not necessarily. More important than the title is clear, system-level accountability for data quality. In SMEs, the IT lead may coordinate oversight, while department heads retain operational responsibility for their respective data. What matters is having someone who regularly monitors and measures quality metrics.

What role does the EU AI Act play for data quality?

Article 10 of the EU AI Act mandates verifiable data quality for high-risk AI systems: training data must be relevant, representative, and – as far as possible – free of errors. Bias must be systematically assessed and corrected. While most SME AI applications don’t qualify as high-risk, this standard is becoming a market expectation. Companies with clean data today avoid costly retrofits later.

How long does it take to make a data foundation AI-ready?

For a single use case, a realistic timeframe is four to eight weeks – assuming data sources are known and the use case clearly defined. Enterprise-wide data quality programs typically require six to twelve months before delivering measurable improvement. Crucially: don’t try to fix everything at once – proceed use-case by use-case.

Header Image Source: Pexels / Kampus Production (px:6248957)

Also available in

A magazine by evernine media GmbH