[Checklist] - Data Foundation Before Scaling AI
Why most AI programmes fail quietly—and how to fix the problem before it starts
The uncomfortable truth
Most AI programs don’t fail because the models are bad. They fail because the data underneath them is fragmented, inconsistent, and stripped of context.
Regulators and standards bodies—such as the European Union Aviation Safety Agency and the National Institute of Standards
and Technology—have been clear: AI systems are only as trustworthy as the data, lineage, and governance behind them.
Yet many organisations are still trying to scale AI on top of the following:
-
Spreadsheet workarounds
-
Disconnected enterprise systems (ERP, MES, CRM)
-
Duplicate and conflicting data definitions
-
Broken ownership models
That’s not a foundation. It’s technical debt with a machine-learning wrapper.
The pattern: where AI actually breaks
Across industries—manufacturing, aviation, and financial services—the failure pattern is consistent:
1. Fragmented data landscape: ERP says one thing. MES says another one. Excel “fixes” both.
2. Hidden spreadsheet dependencies: Critical logic lives in someone’s desktop file. No version control. No audit trail.
3. No shared business meaning "customer", “asset”, "order", or “defect”—each system defines them differently.
4. AI amplifies the problem. Instead of one bad report, you now have automated bad decisions at scale.
AI doesn’t fix data problems. It industrialises them.
What “AI-ready data” actually means
Most teams misunderstand this.
AI-ready data is not
-
A data lake full of raw data
-
A warehouse with dashboards
-
A set of APIs
AI-ready data is consistent, governed, and context-rich data aligned to real business entities and decisions.
That takes architecture, not just storage.
The Data Foundation Checklist
If you can’t confidently answer yes to these, you’re not ready to scale AI.
That takes architecture, not just storage.
1. Do you have a clear system of business entities?
You need canonical definitions for:
-
Customer
-
Product
-
Asset
-
Order
-
Supplier
-
Event (e.g., failure, transaction, interaction)
If these differ across systems, your AI will learn contradictions.
Test: Can two systems describe the same “customer” identically?
2. Is data ownership explicit and enforced?
Every critical dataset needs:
-
A named owner
-
Clear accountability
-
Defined quality expectations
If ownership is “IT” or “the data team”, you don’t have ownership—you have diffusion.
3. Are spreadsheet dependencies eliminated or controlled?
Spreadsheets aren’t the problem. Undocumented spreadsheet logic is.
Test:
-
Are business-critical transformations happening outside governed systems?
-
Can you trace how a KPI was calculated without opening Excel?
If not, your AI will inherit invisible logic.
4. Is data lineage visible end-to-end?
You need to be able to answer the following:
Where did this data come from, and how did it change?
Regulators like the National Institute of Standards and Technology emphasise traceability as a core requirement for trustworthy AI.
Test: Can you trace a prediction back to source systems and transformations?
5. Is your architecture designed around meaning, not systems?
Most architectures are system-centric:
-
ERP schema
-
CRM schema
-
MES schema
AI requires business-centric architecture:
-
Entity models
-
Relationships
-
Context
This is where approaches like semantic layers or knowledge graphs become critical.
6. Are data quality issues measured, not assumed?
You need explicit metrics for:
-
Completeness
-
Accuracy
-
Consistency
-
Timeliness
If your answer is “the data is mostly fine", it isn’t.
7. Are cross-domain processes actually connected?
Example:
-
Customer order → production → shipment → service
If these are stitched together manually, AI can’t reason across them.
8. Can your data support real decisions—not just reporting?
Dashboards describe the past. AI drives decisions.
Test: Can your data answer the following?
-
What should we do next?
-
What happens if we change X?
If not, you’re unprepared.
A practical example: where this breaks
In a manufacturing environment:
-
ERP tracks orders
-
MES tracks production
-
Quality systems track defects
-
Suppliers send updates via email or spreadsheets
A delivery delay happens.
Root cause?
-
Supplier delay (email)
-
Production rework (MES)
-
Incorrect order priority (ERP)
-
Manual override in Excel
No single system sees the full picture.
Now add AI.
It predicts delays—but based on incomplete, inconsistent data.
Result: confidently wrong decisions at scale.
What to do instead (30–60 day focus)
The project doesn’t need to start as a multi-year transformation.
Step 1 — Identify 3–5 critical business entities: start small, i.e., customers, orders, and assets.
Step 2 — Map where they live today: across ERP, CRM, MES, and spreadsheets.
Step 3 — Define a canonical model: agree on what each entity actually means.
Step 4 — Expose conflicts: identify where definitions or values diverge.
Step 5 — Establish ownership & governance: Assign accountability.
Step 6 — Remove or formalise spreadsheet logic: Bring critical transformations into governed pipelines.
Step 7 — Build a thin semantic layer: Create a consistent interface for data consumers and AI systems.
The shift most organisations avoid
This isn’t a tooling problem. It’s a discipline problem:
-
Agreeing on definitions
-
Enforcing ownership
-
Removing hidden workarounds
-
Designing for meaning, not convenience
AI exposes these weaknesses brutally.
Supporting deep dives
These articles go beyond the checklist to explain the real failure patterns that stall AI programmes: brittle spreadsheets, disconnected architecture, weak governance, unclear ownership, and data you can’t trust when decisions need to be made
quickly.
I recommend starting here:
Your transformation might be one spreadsheet away from failure
-
Use this when manual planning, quality holds, reconciliations, or fragile shop-floor workarounds hide AI readiness risk.
Your Data Isn’t Broken – Your Architecture Is
-
Use this scenario when the issue is not “bad data” but disconnected systems, weak ownership, inconsistent definitions, and poor data architecture.
AI in Aviation: Reality check from EASA and NIST
-
Use this when AI needs to be governed as a socio-technical system, not treated as just another technology deployment.
What is AI-Ready Data?
-
Use this section when readers need a plain-English definition of what makes data suitable for AI, analytics, and decision automation.
Why Digital Transformation Fails
-
Use this approach when the reader needs the broader business context: weak execution, unclear ownership, culture gaps, and disconnected strategy.
Final takeaway
You don’t scale AI by adding more models. You scale AI by removing ambiguity from your data.
Until then, every AI investment is built on unstable ground.