Fact table data modeling involves connecting fact and dimension tables together into an interlinked data warehouse. Some best practices can reduce work required of data professionals and enhance business user analytics experiences.
Loading Detailed Atomic Data into Dimensional Structures
Fact tables are databases designed to store information regarding business processes. Users can then view and access this data, providing answers to numerous types of business questions.
Dependent upon the business process, fact tables may contain single records or multiple rows within each row. Sometimes they contain foreign keys for dimension tables.
One way of ensuring that the information in a fact table is accurate is to fill it at its smallest grain or level of detail, which allows it to expand easily in future without needing to rebuild it all from scratch.
One best practice for loading atomic data into dimensional structures is ensuring every fact table contains an atomic date dimension table. This is essential since fact tables measure data at specific moments in time.
Not only can tables serve as a source of dates, but they can also store other kinds of data like service logs, measurement information and various forms of other types of information.
Best practices for loading atomic data into dimensional structures involve following three basic rules of dimensional modeling:
Structuring around business processes. This requires making sure the fact table represents as closely as possible the business process it represents.
This can be accomplished by asking questions that make sense within the context of your business. For instance, if a company sells products directly to customers, their fact table could include questions such as ‘What number of sales are relevant?” before exploring alternative numbers that may make more sense.
Structuring Dimensional Models Around Business Processes
When creating a dimensional model, it is crucial to take into account how its data will be utilized within a business process. Doing this allows you to structure the fact table data model around this process rather than designing it according to an analysis or report.
Dimension tables contain attributes that provide context to data contained within fact tables, such as users, accounts, invoices and customers. Typical dimensions might include users, accounts and invoices as examples of dimensions that might be stored here.
Step one in creating a dimension table involves identifying its attributes, creating a fact table to link these facts with records in the dimension table and adding a primary key that uniquely identifies each record; this key can either be generated using the dbt_utils.surrogate_key macro or concatenated from several foreign keys within its source fact table but must remain distinct for each record in the dimension table.
Similarly, if the dimension table contains an attribute identifying customer types, make sure it matches in both tables so it will be easy to identify records in both. If necessary.
Additionally, it is critical that the fact table’s primary key reflects the uniqueness of its dimensional model. This means ensuring its primary key does not derive from source application data but instead generated and controlled by the data warehouse.
At all measurement events, it is wise to associate each with its own date dimension table with different date characteristics, such as monthly balance snapshots or financial transactions captured to the hundredth of a second. When multiple date dimension tables represent one measurement event, then Bridge(bdg_) tables should be utilized in order to resolve many-to-many relationships between these tables.
Ensuring Every Fact Table Has an Associated Date Dimension Table
Date dimension tables are additional tables that contain data related to fact tables. It connects directly with them via foreign keys.
Date dimension tables serve to document dates related to events, such as when an order was shipped out to its customer. Furthermore, these tables should provide metrics related to calendar attributes or nonstandard characteristics related to measurement events like fiscal month indicators and corporate holiday dates.
As part of a dimensional model, it is vitally important that every fact table has an accompanying date dimension table in the date dimension table hierarchy. This ensures fewer date foreign keys and that a single fact table can be used to analyze all facts at the same level of detail or grain.
This approach to fact table data modeling is one of the best approaches, as it allows future updates without needing to re-build all the data. Furthermore, using such date dimension tables enables fact tables to be updated periodically rather than continuously.
Changes to any of the dimensional attributes result in the addition of new rows with updated values, while deletion of records produces an equivalent deletion column which helps remove unwanted information from the fact table logically and avoids being picked up when querying for facts; this technique is known as “logical delete”, making fact table data modeling simpler.
Ensuring All Facts in a Single Fact Table Are at the Same Grain or Level of Detail
Fact tables contain data related to business processes. As the center table in a star schema, they contain measurements, metrics, facts about specific problems or solutions, and descriptive attributes that can serve as query constraints.
One of the key decisions in data modeling is deciding on a table’s level of detail or grain, which impacts its presentation and maintenance. This is also key when creating a data warehouse design plan.
Example: the daily granularity of a sales table could range from very fine (storing every purchase by each customer), to coarse (only daily totals for product categories at each store). The choice will have an impactful impact on how data is stored as well as its analysis capabilities.
Another key consideration in designing a fact table is its type. There are different variants available ranging from transactional tables to periodic snapshots.
Transaction grains refer to events in time at one precise instant in time, such as grocery store beeps or receiving credit card payments. Each event’s measured facts apply only for its current moment in time and could come back around again microseconds or months later.
Figure 1 depicts a periodic snapshot grain. It represents an extended span of time – often an accounting period – with facts only valid within that specific timeframe, making the snapshot dense and consistent over time.
Fact tables must include multiple grains. For instance, transaction grain fact tables should have daily granularity while periodic snapshot grain tables require monthly granularity.
Resolving Many-to-Many Relationships in Fact Tables
Resolving many-to-many relationships is one of the primary difficulties associated with modeling fact tables. For instance, you might have an order data set for multiple customers and products in two separate tables that relate back to one another.
Resolving these relationships requires creating a bridge table between two tables. This bridge table stores unique identifiers for each record in both tables to facilitate identification of individual records from both tables.
As part of your task to ensure referential integrity is being observed correctly in Tableau, make sure the bridge table is connected to all other dimension tables. Otherwise, Tableau will rely on data warehouse identifiers rather than the ones stored in the bridge table for referential integrity purposes.
It could lead to incorrect results when users filter a dimension table using columns not associated with its bridge table; in extreme cases, this could even result in query failures.
Another challenge lies in relating non-date columns from a dimension table to a fact table. If a category or other non-date column in its related dimension table has duplicate values, establishing one-to-many relationships will not be feasible.
Finally, it’s essential that relationships only propagate filters in one direction from the dimension table to the fact table. Otherwise, any single table could end up filtering all other tables within its relationship before becoming subject to further filters when it gets accessed by an end-user.
An effective star schema requires each fact table row to match an entry in each dimension table; otherwise, Tableau will fail to recognize their many-to-many relationship and set referential integrity to Some records match instead.
Here are a few guidelines for modeling and designing fact tables.
Fact Table Effective Practices
- The table naming convention should identify it as a fact table. For example:
- Suffix Pattern:
- <<TableName>>_Fact
- <<TableName>>_F
- Prefix Pattern:
- FACT_<TableName>>
- F_<TableName>>
- Must contain a temporal dimension surrogate key (e.g. date dimension)
- Measures should be nullable – this has an impact on aggregate functions (SUM, COUNT, MIN, MAX, AVG, etc.)
- Dimension Surrogate keys (srky) should have a foreign key (FK) constraint
- Do not place the dimension processing in the fact jobs
- Suffix Pattern:
2 Comments
Comments are closed.