BI - Data Warehouse/ Lake/ Lakehouse

Data Warehouse Design Patterns

November 10, 2015

6522

This post will not dive into each topic in detail but serve more like a curriculum of things to research for the Data Journey. Anyone who needs to get into the Data Warehouse (DW) space should have a handle on the following Design Patterns:

Connection Patterns

There are 4 Patterns that can be used between applications in the Cloud and on premise. The combinations are as follows

on-premise caller to Cloud provider
Cloud caller to on-premise provider
Cloud caller to Cloud provider

Remote procedure calls (RPC) Connection Patterns
Asynchronous (fire and forget) Connection Patterns using Queues
Shared Database in cloud or on-premise
Data/File synchronizing in Copying Data (ETL) flat file loads, database to database sources to targets.

Extract Transform Load (ETL) Patterns

Truncate and Load Pattern (AKA full load): its good for small to medium volume data sets which can load pretty fast. it is good for staging areas and it is simple. The key benefit is that if there are deletions in the source then the target is updated pretty easy. The disadvantage is there is no history .kept and no tracking. CUID ie created, updated Inserted or Deleted cannot be tracked.

Slowly Changing Dimension Type 1 Pattern: This pattern is simple but it is very slow and should not be done for anything over 1000 rows. See the dimensions definition for type 1

Slowly Changing Dimension Type 2 Pattern: This pattern is simple but it is very slow and should not be done for anything over 1000 rows. See the dimensions definition for type 2

Declarative/Adhoc SQL Query Patterns

Join patterns: directional, inner or equijoin, left and right outer join, full outer joinA theta join allows for arbitrary comparison relationships (such as ≥ or between). An equijoin is a theta join using the equality operator. A natural join is an equijoin on attributes that have the same name in each relationship
Flattened Hierarchies which put all the levels on one row as columns vs Ragged hierarchies which like unbalanced hierarchies, the branches of the hierarchies can descend to different levels.
Join Tables/ Translation tables usually when putting two silo systems in the same context so the data can be merged
Parent/ Child Tables and Cardinality (Fan traps that occur when using aggregate measures), the parent is a foreign key (FK) on the child record so the relationship creates data clusters, We have to ensure we do not writes queries that multiple the aggregation values.
Self Joins (aka Alias) in the SQL we can refer to the same table by another name and join to itself eg Manager is a type of Person so technically the person table can be self joined to get the manager’s info.

Query Performance Patterns

Explain Plans, Indexing and Partitions, this is the bedrock of performance tuning in relational databases. This topic alone deserves its own post. It would be dependent on table storage and data types configurations at the Data Definition Language (DDL) setup. It will also need knowledge of the data cardinality to create balanced tree vs bitmap indexes and user query patterns to create covering indexes and getting more index range scans if query does not uniquely select the index or hit partitions to use much smaller data sets for faster queries.
ETL Aggregation and Aggregate awareness for multiple aggregation tables
Table Constraints in Data quality, including PK, FK and additional functions or regular expressions that can be put on columns to ensure the accurate data and not nulls are stored as needed.

Dashboard Design Patterns

Layout Patterns
Leading Indicators Aggregation Pattern
Drill Down Pattern
Progressive Filtering Choice Pattern

Security Patterns

I have a dedicated article on security patterns, which are getting increasingly complex as time progresses and new regulations.

Holy Trinity of Analytics

Data Warehouse Design Patterns

Connection Patterns

Extract Transform Load (ETL) Patterns

Declarative/Adhoc SQL Query Patterns

Query Performance Patterns

Dashboard Design Patterns

Security Patterns

EDITOR PICKS

Estimation for Agile Developers While Status Reporting to Waterfall Managers

5 Major Reasons Why So Many Companies Fail At Social Media

Best Practices for Distributed Or Remote Teams in the Age of...

POPULAR POSTS

How to use business objects @Prompt Variable to build flexible universes...

How to Merge Data from Multiple Data Providers in WEBIntelligence (webi)

How to Calculate Number Of Days in a Month or Month...

POPULAR CATEGORY

Basic Terminology of Data Warehousing (DW) for Business Intelligence (BI)

When to use Apache Spark vs Apache Flink