From Data Lineage to Relationship Intelligence in Manufacturing Data Platforms

Introduction

Modern data platforms are usually discussed through their technical building blocks: lakehouses, warehouses, pipelines, notebooks, semantic models, reports, catalogs, governance tools, and increasingly AI assistants.

Those building blocks matter. But in a large manufacturing company, the most interesting questions are often not about individual assets. They are about relationships.

Where does this KPI come from?
Which data products are used by this business process?
Who owns the data behind this report?
What changes if this source system, semantic model, or data product changes?
Which customers, factories, machines, components, service events, or spare parts are connected through the data?
Can an AI assistant explain not only an answer, but also the data context behind the answer?

This post is a thinking-out-loud exploration of a spike I would run on top of a modern Microsoft Fabric-based data platform. The goal is not to declare a final architecture. The goal is to ask better questions before building another metadata solution.

The Starting Point: Lineage Is Useful, but Not Enough

Data lineage is usually understood as the traceability of data: where data comes from, how it is transformed, and where it is consumed.

A simplified lineage path might look like this:

Source system
  -> Data pipeline
    -> Lakehouse table
      -> Warehouse table
        -> Semantic model
          -> Power BI report

This is valuable. It helps with troubleshooting, impact analysis, governance, and trust.

But manufacturing data platforms need more than technical lineage diagrams. The business rarely asks only “which table feeds which report?” More often, the business asks questions like:

Which process depends on this data?
Which KPI will be affected?
Which factory or customer is impacted?
Who should be informed?
Is this data product reliable enough for operational or AI use?

That requires a broader view.

From Lineage to Relationship Intelligence

The next useful layer on top of a modern data platform may not be another database or another catalog. It may be relationship intelligence.

By relationship intelligence, I mean the ability to connect technical data flows with business context:

Technical lineage
  + Data product contracts
  + Catalog metadata
  + Data portal content
  + Ownership and support responsibilities
  + Usage metrics
  + Business process and KPI context
  + Governance metadata
  + AI-readable context
= Relationship intelligence

This is not just a nicer diagram. It is a decision-support capability.

A data catalog tells people what exists. A data portal helps people find and request data products. Data product contracts define expectations. Lineage shows technical dependencies. Governance tools help classify and control data. AI assistants can help users ask questions.

The real opportunity is to connect these pieces.

Why Manufacturing Makes This Interesting

Manufacturing is full of relationships.

A customer may have several sites. A site may contain production lines. A line may contain machines. A machine may contain components. Components may be connected to spare parts, service events, suppliers, maintenance plans, telemetry, and operational KPIs.

Another relationship chain could look like this:

Supplier
  -> Component
    -> Product
      -> Factory
        -> Customer

Or this:

Customer
  -> Site
    -> Machine
      -> Component
        -> Service event
          -> Spare part

In this environment, data platform metadata is not only an IT concern. It can become a way to understand business impact, operational risk, and AI-readiness.

Microsoft Fabric as the Strategic Platform Context

If Microsoft Fabric is the strategic data platform, the first principle should be simple:

Test the native platform capabilities before adding a separate relationship or graph technology.

The initial candidate capabilities would include:

Microsoft Fabric lineage views
Semantic model impact analysis
OneLake Catalog
Microsoft Purview integration
Fabric REST APIs
Fabric Data Agent
Fabric Graph preview
Fabric IQ and ontology-related preview capabilities

Some of these are mature platform capabilities. Some are still preview capabilities. The point of the spike is not to assume they solve everything. The point is to find out how far the strategic platform already takes us.

The Spike Should Start with Roles, Not Tools

A common mistake would be to start with a technology question such as: “Do we need a graph database?”

A better first question is:

Who would benefit if we understood the relationships around data better?

Potential stakeholder roles include:

Business Analyst or Data Consumer

They want to find the right data product, understand whether it can be trusted, and see where it comes from.

Useful questions:

Is this the recommended data product for my use case?
Who owns it?
What reports or semantic models already use it?
Is there a contract, description, or known limitation?

Possible outputs:

Enriched data portal detail page
Data product relationship view
Source and ownership summary
Related reports and related data products

Report Owner or Analytics Product Owner

They care about the reports and analytical products used by the business.

Useful questions:

Which upstream data products feed this report?
What changed if the numbers suddenly look different?
Which semantic model or dataset is the report dependent on?
Are there cross-domain dependencies behind this report?

Possible outputs:

Report lineage diagram
Affected reports list
Semantic model dependency view
Report confidence or source transparency summary

KPI Owner or Business Controller

They care about the meaning, reliability, and comparability of business metrics.

Useful questions:

Where does this KPI come from?
Which source systems and transformations affect it?
Has the calculation logic changed?
Can I distinguish business change from data pipeline change?

Possible outputs:

KPI lineage view
Calculation and source transparency page
KPI impact report

Business Process Owner

They care about operational processes such as order-to-cash, procure-to-pay, supply chain planning, field service, spare parts, production follow-up, or finance closing.

Useful questions:

Which data products support this process?
Which KPIs, reports, or AI use cases depend on this data?
What business process is affected if the data is late, stale, or incorrect?

Possible outputs:

Business process dependency map
Critical data product list
Process-level data risk view

Field Service, Spare Parts, Supply Chain, or Manufacturing Owner

These roles bring the strongest manufacturing-specific use cases.

Useful questions:

Which machines, components, customers, suppliers, or spare parts are connected through data?
Which customers are affected by a component or material change?
Which service events are related to a recurring asset issue?
What data context would improve field service or spare part recommendations?

Possible outputs:

Asset relationship diagram
Component impact view
Supply chain dependency map
AI-readiness assessment for service or spare parts use cases

Data Product Responsible or Domain Owner

They need visibility into the data products they provide and consume.

Useful questions:

Who consumes my data product?
Does the implementation match the data product contract?
Which reports, AI use cases, or business processes depend on it?
Which consumer teams should be informed before changes?

Possible outputs:

Data product consumer map
Contract coverage report
Producer-consumer dependency list
Data product health score

Data Steward, Governance Owner, Security, or Compliance

They care about traceability, ownership, classification, and auditability.

Useful questions:

Which data products lack ownership or documentation?
Where does sensitive data flow?
Which data products are used in critical business processes?
Can we prove how a metric or report is built?

Possible outputs:

Governance gap report
Sensitive data lineage diagram
Missing owner list
Audit evidence package

AI Lead or Business AI Owner

AI makes this topic more urgent.

An AI assistant can retrieve documents, but that is not enough. Enterprise AI needs context: source systems, ownership, trust, usage, business meaning, and relationships.

Useful questions:

Can an AI assistant explain where an answer came from?
Can it identify the owner of the data behind the answer?
Can it understand that a machine belongs to a site, a site belongs to a customer, and a service event is related to a component?
Which data products are ready for AI use?

Possible outputs:

AI-ready metadata model
Data product discovery assistant
Relationship-aware Q&A
Source and ownership explanation for AI answers

How I Would Run the Spike

The first deliverable should not be a solution architecture. It should be a better set of questions and a prioritized set of use cases.

1. Collect Role-Based Questions

Start with interviews or short workshops. Ask each role what they would like to know if the data platform could explain relationships better.

Do not start with lineage diagrams. Start with decisions.

Example prompt:

What decision would become easier if you understood the origin, ownership, usage, and business impact of data better?

2. Map Existing Metadata Sources

Inventory what already exists.

Possible sources:

Data product contracts
Data catalog content
Data portal content
Fabric workspaces and items
Semantic models and reports
OneLake Catalog
Purview metadata and lineage
Source system metadata
Usage metrics
Ownership metadata
Support and operational metadata
Documentation and architecture decisions
CI/CD and repository metadata

Classify each source:

already structured
manually maintained
available through API
available only by convention
missing completely

3. Test Native Fabric Capabilities First

Use the strategic platform before adding new components.

Test questions such as:

What lineage does Fabric show automatically?
What impact can semantic model analysis reveal?
What does OneLake Catalog already know?
What can Purview add?
Which metadata is available through APIs?
Can a Data Agent answer simple relationship questions?
Could Fabric Graph or ontology capabilities help with multi-hop or business-semantic questions?

4. Identify the Gaps

The gap analysis is the most valuable output of the spike.

Typical gaps may include:

ownership is manually maintained
data product contracts are not connected to actual Fabric items
business processes are not linked to reports or KPIs
downstream consumers are not fully known
usage metrics are not connected to data product criticality
AI assistants do not yet have trusted relationship context

5. Select PoC Candidates

After the spike, pick a small number of PoCs. Each PoC should test one business question, not one technology.

Good candidates:

KPI lineage PoC: where does this KPI come from?
Data product consumer PoC: who uses this data product?
Data portal enrichment PoC: can we reduce manual maintenance?
Manufacturing asset relationship PoC: how are customer, site, machine, component, service event, and spare part connected?
AI assistant PoC: can an assistant explain an answer with source, ownership, and relationship context?

What Good Outputs Might Look Like

The spike should produce tangible outputs, even before any production solution exists.

Possible outputs:

Prioritized use case list
Role-to-question map
Metadata source inventory
Fabric-native capability assessment
Gap analysis
Candidate relationship model
PoC backlog
Example diagrams
Example JSON output for a data portal
Example Power BI report concept
AI assistant prompt and response examples

What I Would Avoid at the Beginning

I would avoid starting with product selection.

The wrong first questions are:

Do we need a graph database?
Should this be implemented with a specific product?
Can AI solve this for us?
Should we build a new catalog?

The better first questions are:

Who needs relationship context?
Which decisions would improve?
Which metadata already exists?
Which parts can be automated?
Which use cases are valuable enough for a PoC?

Conclusion

Data lineage is a useful starting point, but the bigger opportunity is relationship intelligence.

For a manufacturing data platform, that means connecting data products, contracts, catalogs, portals, reports, KPIs, business processes, assets, ownership, governance, and AI context.

The first step is not to choose a product. The first step is to run a focused spike that discovers the best use cases, tests the native platform capabilities, and identifies what additional metadata or automation is actually needed.

Only after that does it make sense to decide which PoCs to build.

References

Twitter Facebook LinkedIn

From Data Lineage to Relationship Intelligence in Manufacturing Data Platforms

Okko Oulasvirta

Introduction

The Starting Point: Lineage Is Useful, but Not Enough

From Lineage to Relationship Intelligence

Why Manufacturing Makes This Interesting

Microsoft Fabric as the Strategic Platform Context

The Spike Should Start with Roles, Not Tools

Business Analyst or Data Consumer

Report Owner or Analytics Product Owner

KPI Owner or Business Controller

Business Process Owner

Field Service, Spare Parts, Supply Chain, or Manufacturing Owner

Data Product Responsible or Domain Owner

Data Steward, Governance Owner, Security, or Compliance

AI Lead or Business AI Owner

How I Would Run the Spike

1. Collect Role-Based Questions

2. Map Existing Metadata Sources

3. Test Native Fabric Capabilities First

4. Identify the Gaps

5. Select PoC Candidates

What Good Outputs Might Look Like

What I Would Avoid at the Beginning

Conclusion

References

You May Also Enjoy

Building a Domain-Driven Data Mesh on Microsoft Fabric

Create and Publish a Simple Blog Post with GitHub Mobile

Building an Electron Multi-Auth MVP: A Development Journal

Fabric DevOps with Microsoft Terraform Provider - Part 1