Author: Magnus Cederberg
© BIXL AB 2021
The topics in this section are not an exhaustive guide to Power BI query and data model design. Following are common issues and recommendations often encountered in projects. We do not claim that we are always right in our recommendations, and if you follow them, it is on your own risk.
The importance of a good model cannot be overestimated. The model is the foundation of the Power BI solution and makes everything easier if well designed. A poor model will affect the overall performance and writing calculated measures in DAX will be much harder.
Power BI (and Power Pivot in Excel) expects a star schema model to perform well. The star schema consists of one or more fact tables and connected dimensions. The fact table is typically the largest table and should normally hold the following:
The dimension should typically hold columns that you would like to filter the facts by, such as customer name, product name etc.
Use this design also for smaller projects and even for pilots. Sooner or later the data volume will grove, the demands will get higher which will affect the performance and the complexity of the model.
For bigger enterprise solutions it could be a good idea to build this star schema already in the database as a Data Warehouse/Data Mart. There are several advantages to this approach:
It is possible to connect dimensions to other dimensions (known as snowflake) without violating the principle of a good model. However, if possible, it is better to de-normalize, i.e. to move the columns from the outer dimensions to the inner to get a pure star schema. This will get better performance and a less complex and more understandable model.
When you import data from a relational database, do this from views rather than from tables if possible. Over time, the database structure might change so that the report will be corrupt. Instead of making changes in all the reports that use a specific table, you often can make the changes in the view instead.
Do not import more columns than you need. All columns in the model will consume memory. A smaller model will provide better performance and is also easier to understand and maintain. If you need to change names of columns it is better to do these changes in the views instead of in Power Query.
There are two options to get data – Import and Direct Query. Use Import wherever possible because of query performance. You also lose some DAX functionality with Direct Query. However, sometimes you will have to use Direct Query because of the size of the fact table. If so, use Direct Query for the fact table and Import for the dimensions, leading to a composite model.
The dimension Key/ID values should be unique. Even though a key value “should be” unique, make sure you handle this by performing grouping and duplicate reduction in either the data warehouse source or Power Query.
Avoid bi-directional relations and bridging tables if possible. However, if the model includes many-to-many relationships, a bridging table is necessary. Do not use the many-to-many “relationship” in Power BI unless you are absolutely sure that it will work. In fact, this is not a many-to-many relationship but rather describing a many-to-many cardinality. If you use the many-to-many relationship, you can the wrong results in the report, as the outcome will not be as expected.
If possible, also avoid bi-directional relations as they can give unexpected results. In the past many Power BI developers used them to avoid too many results in slicers. One way of solving this is to create a measure that looks like this:
INT ( NOT ( ISEMPTY ( Facttable ) ) )
Then use this measure as a filter in the slicer visual and set the condition that Slicer_reduction = 1.
More information about relationships can be found here: https://www.sqlbi.com/articles/relationships-in-power-bi-and-tabular-models/
As a rule, you should use calculated measures instead of calculated columns wherever possible. Calculated columns consume memory whereas calculated measures consume CPU power. The only situation where calculated measure is to prefer is when you would like to use the column as a filter in a visualization, in the filter panel or in a slicer.
However, whenever you decide to use calculated columns, it is better to create them in the data source or in Power Query. Avoid creating them in DAX if possible as the column will not be properly compressed. Also, if you create the calculated columns in the data source or in Power Query you will ensure a consistent design pattern for maintainability.
Format all DAX code properly. A good place to get help for this is daxformatter.com where you can paste your code and format it.
In DAX code, calculated measures should always be referred to by its measure name only, whereas columns and calculated columns should be referred to by table name and column name. This is a best practice within the Power BI community. The reason for this is that anyone who reads the code should know if a certain referred element is a measure or column
Example measure: [Revenue]
Example column: Calendar[Year]
There are three occasions where you need to surround the table name with the ’ character:
Power BI supports implicit measures for all numeric columns, which makes it possible to use the columns directly in different visuals. You can see if a column is an implicit measure by the summary sign in front of the column name (∑). The default behavior is to summarize data, but there are other options available like Average, Min, Max and Count. In most projects these implicit measures will not cover all the calculations you need in a report, so you will have to create your own implicit measures. A consequence of this will be that the reports could have mix of implicit and explicit measures which could be somewhat confusing.
Instead, create calculated measures for all numeric columns that you would like to aggregate. Also, reuse the measures you created in other measures – also known as measure branching.
When you have created explicit measures for all the columns that should be aggregated, you can then hide the columns/implicit measures. You can probably hide the whole fact table.
Another reason for creating your own measures is that implicit measures will not work if you decide to use the “Analyze in Excel” option in the Power BI portal.
A well spread method is to place calculated measures where they logically belong. Most of the measures should therefore belong to the fact table. An even better method is to create an empty table and put all measures in that table. If you have many measures it could even be a good idea to have several measure tables. Do the following to create a (new) measure table:
Before you start the project, decide if you should have the dataset and reports in the same pbix file or if you should separate them. There are pros and cons with both alternatives. By separating them and create the report with a live connection to a dataset improve version control manageability. This also makes it possible to reuse the same dataset for different reports and purposes. This approach also allows a division of labor and ownership, particularly for certified datasets.
At the same time, the development time will increase since you manage the dataset in one file and work with the reports in another file. In the report file you cannot review the DAX codes, or go to the Table or Model view, when using a live connection.
Name all object – tables, columns etc. – from a business user perspective. Users normally do not have the knowledge of dimension and facts. Therefore, do not name you tables dimProduct, factCost etc. Always try to use names that a business user understands. Furthermore, do not extensively abbreviate column or measure names. One can probably understand what “Revenue YTD” stands for but maybe not “Revenue L3MSPLY”
Give files an intuitive name and try to keep them short. In the Power BI portal navigation bar, only around 20 characters are visible so it is a good practice to differentiate the file names early.
Comment your code in SQL, Power Query/M and DAX to explain the logic. Rename the applied steps in Power Query where there are not obvious what the steps do.
You should always have a Date dimension in your dataset.
By default, Power BI creates a visible date hierarchy for all date columns in the dataset. Behind the scenes Power BI also creates a hidden table for every date column. The reason for this is that power users without data modelling knowledge should be able to use the built-in time intelligence functions in DAX – such as sameperiodlastyear etc. These hidden tables occupy memory and should be avoided. Also, if you decide to use the Analyze in Excel option in Power BI Online, these hierarchies will not show up in Excel.
First, make sure you have a “rich” date dimension, where you have Date, Year, Quarter, Month, Week, Year Quarter, Year Month, Week, Year Week, Weekday etc. (and also columns for sorting text columns like Month, Weekday, Year Week etc.)
Then, turn off the “automatic date/time”:
In visuals, you should pick date related columns from the Date dimensions and not from other tables in the model. It is a good practice to hide all date columns in other tables.
Make sure that the dates in Date table relates to the dates in the fact table. There is no sense to have a date table that starts with 2012-01-01 if the first date in your fact table starts with 2018-01-01. One way of doing this is to have a parameter for Start date and then filter by this parameter in both the date and fact table.
An optimization technique used in by Power Query is called “Query Folding”. The different steps in the M code is translated into a native SQL query that can be executed by the database server. If the M code includes steps that prevents query folding, all raw data is streamed through the gateway, held in memory, and then processed row-by-row which is less effective.
Query folding works good with queries based on relational tables or views. If you write the SQL code directly, the query folding will not work any longer.