Microsoft Fabric: Understanding Capacity

by | Mar 20, 2024

Microsoft Fabric: Understanding Capacity

Learn how Microsoft Fabric utilizes capacity to manage and optimize the compute resources for your Microsoft Fabric tenant

Navigating costs in Microsoft Fabric is straightforward at first glance: choose a SKU for a fixed monthly fee. That’s it, article over. Not quite, understanding the real value and usage coverage this SKU offers is more complex. Unlike the per-user pricing of M365 or the consumption-based cost of Azure Synapse Analytics, Fabric’s model is different. It operates as a Software-as-a-Service (SaaS) subscription, where payments are made for allocated capacity, not per user or direct consumption. This article seeks to demystify how your usage impacts this capacity and how Fabric manages user activities within it, offering a clearer perspective on optimizing your investment in Microsoft Fabric.

What is Microsoft Fabric Capacity?

Microsoft Fabric is a unified cloud-based platform engineered to streamline the development, deployment, and management of data analytic solutions across a wide spectrum of computational needs. From executing queries and generating reports to managing data warehouses and leveraging Azure AI models, Microsoft Fabric offers a diverse array of compute types to accommodate the multifaceted demands of modern data solutions. At the heart of this versatility lies the concept of Microsoft Fabric Capacity. This is designed to encapsulate the diverse compute requirements into a single, standardized measure, thereby simplifying the management of resources across the platform. Microsoft Fabric Capacity is essentially a quantifiable allocation of universal compute resources made available to your tenant, enabling the use of various compute types most appropriate for your specific workloads under a unified usage model. This streamlined approach fosters a shared resource environment, where the total capacity allocated to your tenant is determined by the SKU you select. Capacity units (CUs) are the metric used to represent this allocation, with options ranging from a minimum of two (2) CUs to a maximum of two-thousand-forty-eight (2048) CUs, offering scalability to meet the needs of any size project or organization. The essence of Microsoft Fabric Capacity is its dynamic and flexible nature, allowing for the adaptation of resources in response to your workload and usage patterns. This dynamic allocation means that your resource utilization can scale up or down based on real-time demand, ensuring that you benefit from the cloud’s elasticity and scalability. Importantly, this system permits adjustments to the number of allocated CUs through SKU upgrades, providing a straightforward path to scale resources according to evolving project requirements. In simplifying the complex landscape of cloud compute resources, Microsoft Fabric Capacity represents a key innovation in the efficient and effective management of data solutions, ensuring that users can leverage the full potential of the cloud without the need for intricate resource management strategies.

What are the types of compute associated with Microsoft Fabric?

Microsoft Fabric offers a flexible and dynamic platform that caters to a wide array of compute needs, utilizing resources such as CPU, memory, disk IO, and network bandwidth. This versatility is embodied in the platform’s universal compute model, which supports various types of workloads, from data processing with Apache Spark Clusters to managing databases with a serverless SQL Server. Key to this model is the ability to apply the most appropriate compute type to the task at hand, with usage measured in CUs to serves as the basic unit. At the heart of Microsoft Fabric’s compute model are two primary operation categories: interactive and background operations, each supporting unique tasks and managed differently within the platform. Interactive Operations are on-demand activities initiated by user interactions, either through the Fabric UI or external interfaces like SQL Server Management Studio. These operations, designed for fast and responsive user experiences, are allocated compute resources on demand, based on the frequency and number of user requests. Background Operations, in contrast, are tasks such as scheduled data refreshes or data pipeline processes that run without immediate user interaction. These operations are allocated compute resources in advance and are designed for reliability and consistency. To manage resource consumption and prevent spikes, background operations are smoothed over time, distributing execution to maintain system performance without exceeding capacity. Microsoft Fabric’s capacity is not a static resource but a dynamic allocation that adjusts to workload and usage patterns. This allows tenants to utilize more or fewer resources than their allocated CUs, based on the availability and demand of the underlying infrastructure, offering the scalability and elasticity of cloud computing. As workloads evolve, tenants can easily scale their resources by upgrading their SKU, ensuring that their compute needs are always met efficiently. The consumption rates for these diverse compute types are calculated based on the operations executed within Fabric, ensuring that the CU is a universal measure for all activities. This approach provides a unified way to manage resources, enabling tenants to optimize their investment in Microsoft Fabric while ensuring high performance and reliability across their operations.

What are the available SKUs?

Microsoft Fabric offers different SKUs that align with a different count of Capacity Units. The SKU sizes range from F2 to F2048, with each “F” SKU providing CU of compute power. For example, F2 has 2 CUs, F4 has 4 CUs, and so on. Some specialized workloads like the GPU enabled compute that powers Fabric Copilot or integration of Azure OpenAI service may only be available at higher SKU levels. Power BI Pro license is required for all Power BI Premium (“P”) and Fabric Capacity (“F”) SKUs to publish Power BI content to Microsoft Fabric. Enabling content consumers to review and interact with Power BI reports without additional paid per-user licenses is available at P1 and above (and F64 and above).   SKU Table 1

How is consumption measured?

The measurement of capacity consumption within Microsoft Fabric is designed to reflect the diverse needs of various operations, translating into cost. A fundamental aspect of this system is the measurement of capacity in terms of CUs, which represent the computational resources like CPU, memory, and disk IO allocated to your tasks. Each CU embodies a pool of computational power available to your Microsoft Fabric operations, calculated based on a standard month of 730 hours. This equates to 43,800 minutes or 2,628,000 seconds of capacity each month. Most meters directly correlate with the duration they run, simplifying the conversion to CUs per second—the most granular measurement within the system. The exact granularity of the operation meter is determined by the specific operation. While many will consume at the CU second others aggregate up to higher granularity such as CU minute. Also note that while many operations measure consumption based on duration, others like Copilot for Fabric might use different metrics such as the number of processed tokens per call. To keep CU as the standardized meter these outlier services will convert their native meter into CU consumption to remain consistent. Practical Example: Copilot for Fabric Consider a scenario where a Copilot request processes 2,000 input tokens and 500 output tokens. Copilot’s metering rates are set at 400 CU seconds per 1,000 input tokens and 1,200 CU seconds per 1,000 output tokens. For this request, the computation yields: Consumption for one Copilot request = (2,000 * 400 + 500 * 1,200) / 1,000 = 1,400 CU seconds = 23.33 CU minutes

Translating CUs into Cost

The cost associated with CU consumption varies by Azure region. Taking the US West region as an example, 1 CU hour costs $0.18. Therefore, for an F2 SKU: Monthly Cost (Pay-as-you-go): $0.18 * 730 = $131.40. Accordingly, an F2 SKU costs $131.40 * 2 = $262.80. For reserved instances, committing to a 12-month SKU level offers significant savings (~40.5% over Pay-as-you-go rates), calculated as follows: Monthly Cost (Reserved Instance for F2 SKU): Round ($0.18 * (1-0.405) * 730 * 12) / 12 = ~$78.166 * 2CU = ~$156.334. This model ensures that billing is transparent, predictable, and adaptable to your usage patterns, allowing for efficient and effective resource management in Microsoft Fabric.

How does bursting work?

Burstable capacity is a dynamic feature designed to enhance performance and stability by allowing workloads to access more resources than their allocated baseline capacity. This capability is crucial for handling spikes or surges in demand, whether they stem from user interaction or background processes. By leveraging bursting, tasks that would traditionally run on a predetermined CU allocation, such as 64 CUs, can instead utilize additional resources, for example, 256 CUs. This significant increase in resources can drastically reduce execution times, transforming a job that might take 60 seconds to complete into one that finishes in just 15 seconds. This bursting functionality is seamlessly integrated into the service as a SaaS feature, eliminating the need for user management. It operates by drawing CUs from a communal pool shared among all tenants with the same service tier or SKU. This pool benefits from the unused CUs of other tenants, ensuring efficient resource utilization. However, access to this burst capacity is contingent on the availability of resources within the pool, which is influenced by overall demand and the unused capacity of others. Therefore, while bursting aims to be universally accessible, it’s important to note that its availability cannot be guaranteed at all times for every tenant. Microsoft’s backend management of this feature involves pre-provisioning virtualized compute resources, managed directly by Microsoft, to ensure optimal performance levels are achievable without risking throttling. This is made possible through specific smoothing policies that manage compute spikes, ensuring that sudden increases in resource consumption during bursting periods do not negatively impact service stability or performance. Smoothing will be explained in more detail in the next section. Bursting is subject to SKU guardrails, which are the limits that define how much and how long you can burst. SKU guardrails are expressed in terms of the maximum number of CUs that you can burst, and the maximum duration of the burst. SKU guardrails are different for interactive and background compute, and they vary depending on the SKU of the tenant. The following table shows the SKU guardrails for burstable capacity in Fabric:  

How does smoothing work?

Standard usage patterns with Fabric capacities will see periods of under-utilization (idle times) and over-utilization (peak times) of compute resource allocation. A key Fabric feature for balancing this is capacity smoothing, which ensures that sudden spikes in demand do not compromise efficiency or performance. This approach simplifies the overall management of capacity by evenly distributing the evaluation of compute resources. Smoothing techniques vary based on the type of job. For interactive jobs initiated by users, capacity demand is typically evened out over a five-minute window to mitigate short-term spikes. In contrast, for scheduled or background jobs, demand is spread across 24 hours, thereby eliminating scheduling conflicts and contention issues. Importantly, this smoothing process does not affect execution times, which remain at peak performance levels, allowing for capacity planning based on average rather than peak usage. Interactive bursting ensures that regardless of the service tier, resources are automatically allocated as needed to maintain maximum performance. This can, however, result in a single query consuming all available quota within a given time window. To prevent overloads, interactive smoothing comes into play, distributing the reported usage of queries across future time windows to avoid immediate overloads. This approach, akin to an installment plan, ensures that no single query can trigger a system overload.

 

Similarly, background bursting addresses the challenges posed by large batch processes, which historically risked overloading compute resources and impacting interactive queries. Traditionally, database administrators (DBAs) had to schedule these jobs during off-hours to avoid such interference. Background smoothing applies the same installment plan logic to batch jobs, evenly distributing their load over the next 24 hours. This strategy frees DBAs from scheduling concerns, ensuring a uniform load distribution regardless of job timing, without degrading the performance of interactive queries.

Despite these smoothing measures, there may be instances when multiple jobs accumulate, potentially exceeding capacity limits. In such cases, the system doesn’t immediately enter an overload state but instead carries forward excess usage into future periods where additional capacity is available. This mechanism is called Carry Forward and when combined with smoothing further mitigates the impact of compute spikes.

When considering capacity load, two dimensions are critical: the percentage of capacity consumed and the amount of future periods filled. By managing these dimensions through smoothing and carrying forward usage, organizations can effectively lessen the challenges associated with capacity spikes and ensure consistent, peak-performance execution across all types of jobs.

How does throttling work?

In the context of Microsoft Fabric, throttling is implemented as a vital mechanism to ensure that compute resources are efficiently managed and distributed among all tenants. Despite advanced techniques like smoothing and carry forward to optimize resource allocation, there remains a possibility for individual Fabric capacities to become over-allocated. The SKUs within Microsoft Fabric are designed with fixed price points and capacity limits to prevent unexpected spikes in costs due to over-utilization. To maintain the availability of compute resources across the board and prevent any tenant from exceeding their share, throttling may be applied to capacities that surpass their designated allocation.

Throttling triggers and stages

Throttling within Microsoft Fabric targets individual capacities, ensuring that only the over-utilized capacity with a tenant is impacted. A capacity triggers throttling when it depletes its available CU) for the forthcoming 10-minute window, after smoothing has been accounted for. This depletion will trigger the initial phase of throttling. Once this condition is met, Microsoft Fabric introduces a 20-second delay for all new interactive operations on the affected capacity. This delay is designed to alleviate the immediate demand, allowing the system to manage the overload more effectively and aim for a return to normal CU availability within the critical window. If throttling escalates due to the carry forward mechanism maxing out a capacity’s CU usage for a full hour, Microsoft Fabric progresses to stage two throttling. In this phase, the system outright rejects new interactive requests, indicating the unsustainability of the current usage pattern. Despite this, background operations that are scheduled to run can still proceed. In the event that over-allocation extends to encompass a full 24 hours of carry forward, Microsoft Fabric enforces the most severe form of throttling by freezing the entire capacity. This state of freeze impacts all new operations, both interactive and background, effectively pausing any new demands on the system until the accumulated deficit is cleared. This level of throttling serves as a clear indicator that the capacity’s selected SKU is insufficient for the tenant’s workload demands. Resolution requires tenants to scale up to a more suitable SKU that can adequately accommodate their operational needs, ensuring that such extensive throttling does not recur. This measure underscores the importance of carefully selecting SKUs that align with anticipated workloads to prevent service disruptions. It’s important to note that throttling within Microsoft Fabric specifically targets new operations. This approach ensures that any operation, regardless of its runtime, which commenced prior to the activation of throttling, will not be interrupted. This policy is critical for maintaining stability within the platform, allowing ongoing tasks to continue unaffected by the initiation of throttling measures.

Special Considerations for Event Streams and Real-time Analytics

Event Streams within Microsoft Fabric are engineered to support continuous data flows, potentially extending over years. Given their long-term nature, standard throttling mechanisms, if applied indiscriminately, could disrupt these essential data streams. To circumvent such interruptions, Microsoft Fabric adopts a nuanced approach by moderating the CUs allocated to these streams rather than halting operations. This adjustment aims to maintain the operational continuity of Event Streams, albeit at a reduced capacity. This strategy ensures that even under throttling conditions, Event Streams can sustain their critical data flows, thus mitigating the impact on the overall system’s performance. Real-time Analytics represents another pivotal area where immediate data processing and analysis are paramount. Recognizing the urgency associated with these operations, Real-time Analytics are granted an exemption from the initial 20-second delay typically imposed in the first phase of throttling. This exemption allows Real-time Analytics to proceed uninterrupted, preserving their immediate responsiveness and ensuring the timely delivery of analytical insights. However, it is essential to note that in scenarios of severe over-allocation, where system stability is at risk, these operations may still be subjected to subsequent throttling measures, including outright rejection at later stages. This balanced approach ensures the integrity of real-time processing while maintaining the overall health of the Microsoft Fabric ecosystem.

Managing throttling

Navigating the throttling landscape requires planning and proactive management. The Microsoft Fabric Capacity Metrics App serves as a crucial resource, offering deep insights that enable administrators to closely monitor usage patterns. This tool facilitates the configuration of alerts to identify when capacity nears critical thresholds, thereby enabling timely and effective interventions. While a temporary increase in capacity SKU often presents a straightforward solution to surges in demand, a more nuanced approach involving the optimization of operations to minimize resource consumption may be necessary in other scenarios. It’s essential to recognize that throttling impacts individual capacities, highlighting the importance of strategic capacity planning. To mitigate the risk of throttling and ensure uninterrupted service, it is generally advisable for tenants to deploy multiple capacities. This strategy allows workloads to be distributed more effectively, minimizing the risk of interference between different types of operations. Typically, segregating interactive operations and background tasks into separate capacities can enhance system efficiency. Moreover, due to the critical nature and immediate data processing requirements of Real-time Analytics, allocating these operations to a distinct capacity is recommended to preserve their ‘real-time’ performance, even in the face of initial throttling measures. The specific arrangement of capacities—whether segregating interactive operations, background tasks, or Real-time Analytics—depends on the complexity and scale of the tenant’s environment. While smaller organizations may operate effectively within a single shared capacity, larger or more complex operations require a deliberate and thoughtful approach to capacity design and management. This tailored strategy ensures that workloads are balanced, performance is optimized, and the risk of throttling is minimized across the board. Through its comprehensive throttling mechanism, Microsoft Fabric achieves a delicate balance, optimizing resource utilization while maintaining consistent access to its services. This equilibrium underscores the platform’s dedication to delivering high-performance and reliable service, guiding tenants toward practices that sustain efficient and effective use of resources. In adopting a structured approach to capacity planning and leveraging the insights provided by tools like the Capacity Metrics App, tenants can navigate the throttling landscape with confidence, ensuring that their operations remain robust and resilient under varying demand conditions.

How does pause and resume work?

Microsoft Fabric’s pause and resume feature offers a method for managing costs within a pay-as-you-go billing model, allowing for the suspension and reactivation of capacity as needed. This functionality halts cost accrual by stopping the consumption of CUs when paused, making it unavailable for use, and resumes CU consumption upon reactivation. While this offers cost management benefits, it’s important to understand its broader impact, especially on user experience and system operations. The feature is not designed to be applied at a granular level; it cannot target individual components like lakehouses or data warehouses but is instead applied across the entire capacity. Administrators can pause capacity at any point, which leads to the immediate rejection of new requests and cancellation of ongoing processes, including SQL executions and portal activities, with a rollback of transactions. Background tasks critical for maintaining query execution speeds and other routine operations are also stopped. Upon resuming, the system must undergo a reinitialization phase, where compute resources restart with a clear cache. This necessitates a period where the system slowly rebuilds its cache with relevant data, leading to temporary performance slowdowns until the cache is sufficiently populated. Given these considerations, the decision to pause capacity requires a careful evaluation of the trade-offs between cost savings and operational performance. In scenarios where continuous, high performance is crucial, the need to maintain a warmed-up cache may supersede the benefits of pausing to save on costs. Users are advised to assess their operational requirements and performance expectations thoroughly before utilizing the pause and resume feature.

How does the Capacity Metrics App work?

The Capacity Metrics App in Microsoft Fabric is a pivotal tool for tenant administrators, offering a comprehensive suite of features for monitoring and managing capacity usage effectively. This app provides real-time and historical data on capacity utilization, enabling administrators to identify which operations or users are consuming the most resources. It breaks down usage into interactive and background operations, offering a detailed view of resource allocation across different tasks. Administrators can leverage the Capacity Metrics App to set up alerts for approaching or exceeding capacity thresholds, facilitating proactive management of resources. This functionality is crucial for preventing throttling and ensuring efficient usage of Fabric capacities. Moreover, the app includes features for drilling down into specific incidents of high usage, allowing administrators to pinpoint and address the root causes of spikes in demand. For those seeking to fully exploit the capabilities of the Capacity Metrics App, Microsoft provides a comprehensive set of documentation on its Learn site. This resource is designed to assist administrators in navigating the app’s features and employing them to optimize capacity management within Microsoft Fabric. The documentation is available at Microsoft Learn and offers valuable insights into maximizing the utility of the app for capacity management. In essence, the Capacity Metrics App is an indispensable tool for tenant administrators in Microsoft Fabric, providing necessary visibility and control over capacity usage. By utilizing this app, alongside the detailed guidance provided by Microsoft, administrators can ensure that their Fabric capacities are leveraged efficiently without compromising performance.

Conclusion

Microsoft Fabric provides an innovative approach to managing capacity, the platform offers a robust and flexible framework designed to meet the evolving needs of modern data solutions. From interactive to background operations, Microsoft Fabric ensures that every workload is supported with the right type of compute, tailored to maximize efficiency and performance. The introduction of mechanisms like bursting, smoothing, and throttling demonstrates Microsoft’s commitment to providing a balanced environment where resources are optimally utilized while maintaining service reliability and performance integrity for all tenants. The strategic deployment of multiple capacities, as recommended for larger or more complex organizations, illustrates the platform’s adaptability. This approach not only mitigates the risk of throttling but also ensures that operations are streamlined, performance is optimized, and workloads are balanced across the board. For smaller organizations, the ability to operate effectively within a single shared capacity highlights the scalability of Microsoft Fabric, catering to a wide range of operational sizes and complexities. Moreover, the Capacity Metrics App demonstrates Microsoft Fabric’s dedication to empowering administrators with the tools and insights needed to manage their capacities proactively. This app is essential to mastering capacity management, ensuring that tenants can leverage their resources most effectively. In embracing these principles and tools, organizations can navigate the intricacies of cloud resource management with confidence, harnessing the full potential of Microsoft Fabric to drive their data solutions forward. As we look ahead, it’s evident that Microsoft Fabric will continue to play a pivotal role in shaping the future of cloud computing, fostering innovation, and enabling businesses to achieve their technological aspirations with precision and efficiency.

Unlock the Full Potential of Microsoft Fabric Now

Dive deeper into the world of Microsoft Fabric with our dedicated consulting services, tailored specifically to your needs in capacity planning, sizing, design, and management. As you’ve learned from our comprehensive guide, navigating the complexities of Microsoft Fabric’s capacity management is crucial for maximizing efficiency and unlocking the full potential of your cloud resources. Are you ready to ensure your Microsoft Fabric environment is not just meeting but exceeding your operational requirements? Our team of experts specializes in crafting bespoke solutions that align with your specific goals, ensuring your investments in Microsoft Fabric are optimized for both performance and cost. Take the Next Step: Let us help you navigate the challenges of capacity planning with a strategic approach designed to future-proof your infrastructure. From detailed sizing assessments to advanced design strategies and ongoing management, our consulting services are your key to unlocking enhanced scalability, efficiency, and performance in Microsoft Fabric. Contact us for next steps at Covenant Technology Partners – Data & AI or plafromboise@mailctp.com