High performing Power BI using Composite models with Aggregate feature
Power BI semantic model for large data sets (in multiple TBs) typically requires some sort of Data Warehouse implementation. Data Warehouses are needed to consolidate, cleanse, transform, and create a curated or certified copy of the data.
Data Warehouse implementation gives a performance boost to Power BI for analysis. However, many organizations struggle to decide the right query mode to choose in Power BI.
Power BI provides two query modes when connecting to data sources.
Import mode cache the data in Power BI. All data is exported from data sources. Cache data is available in memory in compressed format. It provides faster query performance and a higher level of concurrency. However, caching the data also mean data needs to be refreshed before new data can be reflected in reports. Larger data set could take time to complete the refresh.
Further, unless you are using Power BI Premium, you are limited with 1 GB data set size limit. A large data set in import mode may not even fit.
Data sets are large mainly because of large amounts of detailed data. Large size data will impact summary query performance. Drill thought to detailed data will also run slower.
Direct query mode, on the other hand, queries the data directly from the source. The data is close to the real-time data warehouse refresh. There is no refresh cycle. Large data set can be accommodated without upgrading to Premium. However, the aggregate query will take a longer time. There will be performance implications.
What is the right strategy?
The right strategy depends on data access patterns. Typically, a higher number of analytical reports show summary-level data. Self-service features like drill-down and slicing are also used at summary level data. Summary level data is low in terms of data volume.
Drill through reports usually require a larger amount of data at lower level grain are low in volume.
Until now, the right Power BI strategy for large data sets was dependent on data access patterns. If there are a lot of drill-through reports then direct query mode might give better experience. Otherwise using import mode will boost summary level performance.
Introducing Power BI Composite Model with aggregates
Recently added new feature in Power BI enables us to use the best of both worlds. It allows connecting in both import and direct query mode in the same model. Composite Models provide a balancing act.
The aggregate feature will supercharge the performance. Summary information can be configured to use in import mode. It will give a fast performance to most of the report visuals. Whereas detailed information in drill through visuals can come from direct query mode straight from the data warehouse.
As a result, the size of the Power BI data set will be smaller in size. The scheduled refresh will be fast allowing more frequent refresh cycles.
Additional best practices for Performance improvements
A combination of a few other tuning techniques can further boost performance.
- A page is as fast as the slowest visual on the page. We should try not to keep any visual that goes to detail data on the summary page.
- Power BI relationship should assume referential integrity and use inner join as much as possible
- Implement a layered information stack to have fast performing Power BI against the large data set. The highest level of summary data should be running in import mode using aggregates. Build indexed view or materialized views for intermediate level aggregated data. Use detailed data in direct query mode against the fact tables.
- Separate the layered summarization on different pages.
- Use clustered index and hash distribution for large tables in the data warehouse (Synapse analytics)
- Replicate small tables (Synapse analytics)
- Leverage data warehouse result set caching
- Ensure that the data type between aggregates and detail tables are the same.
- If using Power BI Premium and Azure Synapse Analytics for the data warehouse, collocate both the tenants in the same region
Power BI allows creating enterprise-level large data sets. Using the right data strategy and best practices, we can deliver the performance needed for enterprise-wide adoption.