Introduction
Microsoft Fabric represents a significant leap forward in unified data analytics platforms, bringing together data engineering, data science, and business intelligence into a cohesive experience.
However, as organizations adopt Fabric for enterprise-scale implementations, they quickly encounter a familiar challenge.
How to implement robust DevOps practices for this new platform. While Fabric excels at enabling data professionals to create powerful analytics solutions, the platform is still evolving its enterprise-grade deployment and lifecycle management capabilities.
In this blog series, we’ll explore the challenges and solutions for implementing effective DevOps practices with Microsoft Fabric in enterprise environments. This first post identifies the key challenges that organizations face when trying to manage Fabric items across development, testing, and production environments. By understanding these challenges, we can better appreciate the various Solutions we’ll examine in subsequent posts.
Fabric Items and DevOps
Microsoft Fabric encompasses various item types or Fabric items, each with unique characteristics that impact how they’re managed in a DevOps pipeline:
- Notebooks: Code-based items that combine code, visualizations, and narrative text.
- Dataflows: Visual or code-based data transformation definitions.
- Semantic Models: Metadata structures that define business metrics and relationships.
- Pipelines: Orchestration workflows that define dependencies and execution order.
- Lakehouses: Data storage and organization structures.
- Warehouses: Data storage and organization structures.
- Other specialized items: Reports, dashboards, ML models, etc.
Each of these items has different serialization formats, dependencies, and parameterization requirements, making a unified DevOps solution challenging.
Enterprise DevOps Requirements
Enterprise DevOps for Fabric must address several core requirements. Let’s outline some of the key expectations and use them as a lens to evaluate potential solutions in the subsequent posts.
EDR-01 - Environment isolation
Strict separation between development, testing, and production.
EDR-02 - Promotion workflow
Controlled processes for moving items through environments.
EDR-03 - Version control
Tracking and reviewing changes.
EDR-04 - Automation
Minimizing manual steps to reduce errors and increase efficiency.
EDR-05 - Governance
Ensuring compliance with organizational policies.
EDR-06 - Collaboration
Enabling teams to work together without conflicts.
Fabric Item Deployment Requirements
Fabric provides Git integration to store Fabric item configurations to version control but different item types have different serialization formats and granularity. For example, notebooks are relatively straightforward to version control, while warehouse may have complex dependency structures that can be challenging to track.
-
Fabric items often have interdependencies (e.g., a notebook or data pipeline have lakehouse references). Ensuring these dependencies are targeting to correct environment during deployment is critical. However, unfortunatelly currently not automatically managed by Fabric.
-
Many Fabric items contain hardcoded references to data sources, schemas, or other environment-specific elements. Parameterizing these references for multi-environment deployment is complex.
Below is a list of Fabric Item Deployment Requirements we wanto to test for each solution and Fabric item type.
FIDR-DP - Data Pipeline Deployment
- Version control: Ability to track changes to data pipelines over time.
- Dependency management: Handling dependencies between data pipelines and other Fabric items.
- Parameterization: Ability to set environment-specific parameters.
- Deployment of Data Pipeline resources: Deploying data pipelines to different environments.
- Deployment of Data Pipeline content: Deploying data pipeline content.
FIDR-DF - Dataflow Deployment
- Version control: Ability to track changes to dataflows over time.
- Dependency management: Handling dependencies between dataflows and other Fabric items.
- Parameterization: Ability to set environment-specific parameters.
- Deployment of Dataflow resources: Deploying dataflows to different environments.
- Deployment of Dataflow content: Deploying dataflow content.
FIDR-ES - Eventstream Deployment
- Version control: Ability to track changes to eventstreams over time.
- Dependency management: Handling dependencies between eventstreams and other Fabric items.
- Parameterization: Ability to set environment-specific parameters.
- Deployment of Eventstream resources: Deploying eventstreams to different environments.
- Deployment of Eventstream content: Deploying eventstream content.
FIDR-NB - Notebook Deployment
- Version control: Ability to track changes to notebooks over time.
- Parameterization: Ability to set environment-specific parameters.
- Dependency management: Handling dependencies between notebooks and other Fabric items.
- Deployment of Notebook resources: Deploying notebook to different environments.
- Deployment of Notebook content: Deploying notebook content (code, visualizations, etc.).
FIDR-EH - Eventhouse Deployment
- Version control: Ability to track changes to eventhouses over time.
- Dependency management: Handling dependencies between eventhouses and other Fabric items.
- Parameterization: Ability to set environment-specific parameters.
- Deployment of Eventhouse resources: Deploying eventhouses to different environments.
- Deployment of Eventhouse schema: Deploying eventhouse schema.
FIDR-LH - Lakehouse Deployment
- Version control: Ability to track changes to lakehouses over time.
- Dependency management: Handling dependencies between lakehouses and other Fabric items.
- Parameterization: Ability to set environment-specific parameters.
- Deployment of Lakehouse resources: Deploying lakehouses to different environments.
- Deployment of Lakehouse schema: Deploying lakehouse schema.
FIDR-WH - Warehouse Deployment
- Version control: Ability to track changes to warehouses over time.
- Dependency management: Handling dependencies between warehouses and other Fabric items.
- Parameterization: Ability to set environment-specific parameters.
- Deployment of Warehouse resources: Deploying warehouses to different environments.
- Deployment of Warehouse schema: Deploying warehouse schema.
Fabric Identity and Access Management Requirements
Enterprise environments require different access permissions in different environments, with production typically having the most restrictive controls. Automating these permission changes during promotion is challenging.
FIAMR-01 - Service Principal Authentication
Managing service principal authentication across environments.
FIAMR-02 - Access Control Automation
Managing role-based access control across environments programmatically.
Fabric Operational Requirements
Validating that data pipelines work correctly with reduced test datasets while ensuring they will scale to production volumes is technically challenging.
Challenges with rollback strategies in case of failed deployments are also critical. Unlike traditional software where rollbacks can be relatively straightforward, data platform rollbacks may need to consider data changes that have already occurred.
FOR-01 - Testing & Validation
Automated testing of Fabric items including data quality checks and performance validation.
FOR-02 - Rollback Strategies
Implementing safe rollback mechanisms for failed deployments.
Evaluating Potential Solutions
Now that we have requirements defined for our enterprise Fabric DevOps solution (and even cool new acronyms 🤣), we can evaluate potential solutions. In the next posts, I’ll review following tools as potential solutions for Enterprise scaled Fabric DevOps:
Microsoft Terraform provider for Fabric: Managing Fabric resources declaratively using Infrastructure as Code (IaC) principles. This provider enables teams to define Fabric workspaces, capacities, and various Fabric items in code, facilitating repeatable deployments across environments. Terraform’s state management capabilities help track deployed resources and manage dependencies between them.
🧪 Validate Terraform first approach
- Terraform Fabric resource templates are created first.
- Terraform resource templates are deployed to DEV workspace with Terraform
apply
command. - Configuration changes are done to DEV workspace.
- Terraform
plan
command is executed to see what changes it can detect.
🧪 Validate Create Fabric items first approach
- Empty Fabric items are created in the DEV environment.
- Fabric items are added to Terraform state management using Terraform
import
command. - Terraform resource templates are created and deployed to DEV.
🧪 Validate Deploy resources to TEST workspace with Terraform
- Create
Terraform workspaces
for DEV and TEST environments. - Switch local Terraform state to use TEST workspace.
- Execute Terraform
apply
command to deploy resources to TEST workspace. - Verify resources are deployed correctly in the TEST workspace.
Microsoft fabric-cicd: Python library for Fabric CI/CD: It is a code-first solution for deploying Microsoft Fabric items from a repository into a workspace. Its capabilities are intentionally simplified, with the primary goal of streamlining script-based deployments.
Native Fabric DevOps Tools: Using Fabric’s built-in Git integration and Fabric deployment pipelines.
Custom solutions: Above solutions probably have requirements that are not met without using custom scripting solutions that uses Fabric APIs, SDKs and Fabric CLI.