Introduction

Modern data platforms are usually discussed through their technical building blocks: lakehouses, warehouses, pipelines, notebooks, semantic models, reports, catalogs, governance tools, and increasingly AI assistants.

Those building blocks matter. But in a large manufacturing company, the most interesting questions are often not about individual assets. They are about relationships.

  • Where does this KPI come from?
  • Which data products are used by this business process?
  • Who owns the data behind this report?
  • What changes if this source system, semantic model, or data product changes?
  • Which customers, factories, machines, components, service events, or spare parts are connected through the data?
  • Can an AI assistant explain not only an answer, but also the data context behind the answer?

This post is a thinking-out-loud exploration of a spike I would run on top of a modern Microsoft Fabric-based data platform. The goal is not to declare a final architecture. The goal is to ask better questions before building another metadata solution.

The Starting Point: Lineage Is Useful, but Not Enough

Data lineage is usually understood as the traceability of data: where data comes from, how it is transformed, and where it is consumed.

A simplified lineage path might look like this:

Source system
  -> Data pipeline
    -> Lakehouse table
      -> Warehouse table
        -> Semantic model
          -> Power BI report

This is valuable. It helps with troubleshooting, impact analysis, governance, and trust.

But manufacturing data platforms need more than technical lineage diagrams. The business rarely asks only “which table feeds which report?” More often, the business asks questions like:

  • Which process depends on this data?
  • Which KPI will be affected?
  • Which factory or customer is impacted?
  • Who should be informed?
  • Is this data product reliable enough for operational or AI use?

That requires a broader view.

From Lineage to Relationship Intelligence

The next useful layer on top of a modern data platform may not be another database or another catalog. It may be relationship intelligence.

By relationship intelligence, I mean the ability to connect technical data flows with business context:

Technical lineage
  + Data product contracts
  + Catalog metadata
  + Data portal content
  + Ownership and support responsibilities
  + Usage metrics
  + Business process and KPI context
  + Governance metadata
  + AI-readable context
= Relationship intelligence

This is not just a nicer diagram. It is a decision-support capability.

A data catalog tells people what exists. A data portal helps people find and request data products. Data product contracts define expectations. Lineage shows technical dependencies. Governance tools help classify and control data. AI assistants can help users ask questions.

The real opportunity is to connect these pieces.

Why Manufacturing Makes This Interesting

Manufacturing is full of relationships.

A customer may have several sites. A site may contain production lines. A line may contain machines. A machine may contain components. Components may be connected to spare parts, service events, suppliers, maintenance plans, telemetry, and operational KPIs.

Another relationship chain could look like this:

Supplier
  -> Component
    -> Product
      -> Factory
        -> Customer

Or this:

Customer
  -> Site
    -> Machine
      -> Component
        -> Service event
          -> Spare part

In this environment, data platform metadata is not only an IT concern. It can become a way to understand business impact, operational risk, and AI-readiness.

Microsoft Fabric as the Strategic Platform Context

If Microsoft Fabric is the strategic data platform, the first principle should be simple:

Test the native platform capabilities before adding a separate relationship or graph technology.

The initial candidate capabilities would include:

  • Microsoft Fabric lineage views
  • Semantic model impact analysis
  • OneLake Catalog
  • Microsoft Purview integration
  • Fabric REST APIs
  • Fabric Data Agent
  • Fabric Graph preview
  • Fabric IQ and ontology-related preview capabilities

Some of these are mature platform capabilities. Some are still preview capabilities. The point of the spike is not to assume they solve everything. The point is to find out how far the strategic platform already takes us.

The Spike Should Start with Roles, Not Tools

A common mistake would be to start with a technology question such as: “Do we need a graph database?”

A better first question is:

Who would benefit if we understood the relationships around data better?

Potential stakeholder roles include:

Business Analyst or Data Consumer

They want to find the right data product, understand whether it can be trusted, and see where it comes from.

Useful questions:

  • Is this the recommended data product for my use case?
  • Who owns it?
  • What reports or semantic models already use it?
  • Is there a contract, description, or known limitation?

Possible outputs:

  • Enriched data portal detail page
  • Data product relationship view
  • Source and ownership summary
  • Related reports and related data products

Report Owner or Analytics Product Owner

They care about the reports and analytical products used by the business.

Useful questions:

  • Which upstream data products feed this report?
  • What changed if the numbers suddenly look different?
  • Which semantic model or dataset is the report dependent on?
  • Are there cross-domain dependencies behind this report?

Possible outputs:

  • Report lineage diagram
  • Affected reports list
  • Semantic model dependency view
  • Report confidence or source transparency summary

KPI Owner or Business Controller

They care about the meaning, reliability, and comparability of business metrics.

Useful questions:

  • Where does this KPI come from?
  • Which source systems and transformations affect it?
  • Has the calculation logic changed?
  • Can I distinguish business change from data pipeline change?

Possible outputs:

  • KPI lineage view
  • Calculation and source transparency page
  • KPI impact report

Business Process Owner

They care about operational processes such as order-to-cash, procure-to-pay, supply chain planning, field service, spare parts, production follow-up, or finance closing.

Useful questions:

  • Which data products support this process?
  • Which KPIs, reports, or AI use cases depend on this data?
  • What business process is affected if the data is late, stale, or incorrect?

Possible outputs:

  • Business process dependency map
  • Critical data product list
  • Process-level data risk view

Field Service, Spare Parts, Supply Chain, or Manufacturing Owner

These roles bring the strongest manufacturing-specific use cases.

Useful questions:

  • Which machines, components, customers, suppliers, or spare parts are connected through data?
  • Which customers are affected by a component or material change?
  • Which service events are related to a recurring asset issue?
  • What data context would improve field service or spare part recommendations?

Possible outputs:

  • Asset relationship diagram
  • Component impact view
  • Supply chain dependency map
  • AI-readiness assessment for service or spare parts use cases

Data Product Responsible or Domain Owner

They need visibility into the data products they provide and consume.

Useful questions:

  • Who consumes my data product?
  • Does the implementation match the data product contract?
  • Which reports, AI use cases, or business processes depend on it?
  • Which consumer teams should be informed before changes?

Possible outputs:

  • Data product consumer map
  • Contract coverage report
  • Producer-consumer dependency list
  • Data product health score

Data Steward, Governance Owner, Security, or Compliance

They care about traceability, ownership, classification, and auditability.

Useful questions:

  • Which data products lack ownership or documentation?
  • Where does sensitive data flow?
  • Which data products are used in critical business processes?
  • Can we prove how a metric or report is built?

Possible outputs:

  • Governance gap report
  • Sensitive data lineage diagram
  • Missing owner list
  • Audit evidence package

AI Lead or Business AI Owner

AI makes this topic more urgent.

An AI assistant can retrieve documents, but that is not enough. Enterprise AI needs context: source systems, ownership, trust, usage, business meaning, and relationships.

Useful questions:

  • Can an AI assistant explain where an answer came from?
  • Can it identify the owner of the data behind the answer?
  • Can it understand that a machine belongs to a site, a site belongs to a customer, and a service event is related to a component?
  • Which data products are ready for AI use?

Possible outputs:

  • AI-ready metadata model
  • Data product discovery assistant
  • Relationship-aware Q&A
  • Source and ownership explanation for AI answers

How I Would Run the Spike

The first deliverable should not be a solution architecture. It should be a better set of questions and a prioritized set of use cases.

1. Collect Role-Based Questions

Start with interviews or short workshops. Ask each role what they would like to know if the data platform could explain relationships better.

Do not start with lineage diagrams. Start with decisions.

Example prompt:

What decision would become easier if you understood the origin, ownership, usage, and business impact of data better?

2. Map Existing Metadata Sources

Inventory what already exists.

Possible sources:

  • Data product contracts
  • Data catalog content
  • Data portal content
  • Fabric workspaces and items
  • Semantic models and reports
  • OneLake Catalog
  • Purview metadata and lineage
  • Source system metadata
  • Usage metrics
  • Ownership metadata
  • Support and operational metadata
  • Documentation and architecture decisions
  • CI/CD and repository metadata

Classify each source:

  • already structured
  • manually maintained
  • available through API
  • available only by convention
  • missing completely

3. Test Native Fabric Capabilities First

Use the strategic platform before adding new components.

Test questions such as:

  • What lineage does Fabric show automatically?
  • What impact can semantic model analysis reveal?
  • What does OneLake Catalog already know?
  • What can Purview add?
  • Which metadata is available through APIs?
  • Can a Data Agent answer simple relationship questions?
  • Could Fabric Graph or ontology capabilities help with multi-hop or business-semantic questions?

4. Identify the Gaps

The gap analysis is the most valuable output of the spike.

Typical gaps may include:

  • ownership is manually maintained
  • data product contracts are not connected to actual Fabric items
  • business processes are not linked to reports or KPIs
  • downstream consumers are not fully known
  • usage metrics are not connected to data product criticality
  • AI assistants do not yet have trusted relationship context

5. Select PoC Candidates

After the spike, pick a small number of PoCs. Each PoC should test one business question, not one technology.

Good candidates:

  • KPI lineage PoC: where does this KPI come from?
  • Data product consumer PoC: who uses this data product?
  • Data portal enrichment PoC: can we reduce manual maintenance?
  • Manufacturing asset relationship PoC: how are customer, site, machine, component, service event, and spare part connected?
  • AI assistant PoC: can an assistant explain an answer with source, ownership, and relationship context?

What Good Outputs Might Look Like

The spike should produce tangible outputs, even before any production solution exists.

Possible outputs:

  • Prioritized use case list
  • Role-to-question map
  • Metadata source inventory
  • Fabric-native capability assessment
  • Gap analysis
  • Candidate relationship model
  • PoC backlog
  • Example diagrams
  • Example JSON output for a data portal
  • Example Power BI report concept
  • AI assistant prompt and response examples

What I Would Avoid at the Beginning

I would avoid starting with product selection.

The wrong first questions are:

  • Do we need a graph database?
  • Should this be implemented with a specific product?
  • Can AI solve this for us?
  • Should we build a new catalog?

The better first questions are:

  • Who needs relationship context?
  • Which decisions would improve?
  • Which metadata already exists?
  • Which parts can be automated?
  • Which use cases are valuable enough for a PoC?

Conclusion

Data lineage is a useful starting point, but the bigger opportunity is relationship intelligence.

For a manufacturing data platform, that means connecting data products, contracts, catalogs, portals, reports, KPIs, business processes, assets, ownership, governance, and AI context.

The first step is not to choose a product. The first step is to run a focused spike that discovers the best use cases, tests the native platform capabilities, and identifies what additional metadata or automation is actually needed.

Only after that does it make sense to decide which PoCs to build.

References