Too often, metrics become trapped in BI tools like Looker. When metric calculations occur in the BI layer, they can’t be reused in other dashboards and reports, or tools outside Looker. Analytics engineers are left guessing about what dimensions are used downstream, and which metrics are being used most often by the business.
Slowly, over time, calculations of the same metric appear in multiple places within Looker, but each calculation differs slightly! Now, nobody knows which metric is the source of truth and which they can depend on. Stakeholders are left scratching their heads, turning to their own best guesses instead of relying on the data.
Once trust is lost, it’s hard to get back.
Centralizing metrics in a data modeling tool like dbt™️, rather than a BI tool like Looker, creates a central source of truth further upstream. Instead of searching through metric calculations across various dashboards and reports, analytics engineers and data analysts know the one place to look- the semantic layer.
What is the dbt Semantic Layer?
The dbt semantic layer allows you to define metric calculations on top of your already-defined data models within dbt. By doing this, it centralizes the calculations, taking them out of the BI layer, and making them accessible to everyone.
This is particularly helpful when working with different business units, all of which have a slightly different idea of how a metric should be derived. The semantic layer allows you to create one consistent metric with the different dimensions requested by each business unit. It also standardizes these metrics for other data users such as data scientists, who may be using tools downstream of these metrics. Tools that sit adjacent to BI tools like reverse ETL tools and notebooks can reference metrics with confidence, knowing that they are accurate and up-to-date.
Why should you use the dbt Semantic Layer?
While LookML is a powerful semantic language for data teams to employ, we still recommend moving metric calculations to a tool like dbt, allowing them to exist outside of Looker. dbt is an open-source tool with a built-in semantic layer standard, meaning it is accessible to anyone. It uses SQL to transform your data, making it easy for analytics engineers and data analysts to build into their pre-existing workflow. You can integrate it into an already existing CI/CD pipeline!
dbt promotes best practices such as modularity, documentation, and testing, only improving your data stack and allowing it to scale as your data grows. Adding the semantic layer to an already-loved transformation tool with clear guidelines is a fairly low lift.
When using a feature like the semantic layer, the data team can better keep tabs on the metric definitions being used, ensuring all considerations from various teams go into calculating metrics. This prevents metrics from being siloed in the team requesting them, creating a more accurate picture of your data. When your metrics are more accurate and aligned across different teams, it leaves little room for questioning and distrust.
Other benefits of semantic layers include:
- Increased flexibility (no longer locked into a specific BI tool)
- More transparency and easier data governance
- Increased efficiency (no longer running the same metric calculation in multiple places)
Challenges of Building a Semantic Layer
The biggest challenge when building a semantic layer involves capturing the context around a metric. Like anything with data, it’s only helpful if you understand how to correctly use the metric. Because of this, the conventions you put in place and how you document them are key to creating a strong semantic layer.
We recommend creating a style guide for your dbt project, but also your semantic layer. Your style guide outlines naming conventions for your metrics, so analysts understand exactly what a metric means depending on its name.
For example, if you have a base metric and add a filter to it in another metric calculation, the filtered metric should have the same name as the base metric followed by the name of the applied filters. If revenue is the base metric, revenue from one-time products would be named revenue_one_time_product.
Setting other standards such as including the compiled SQL code in a metric definition, specifying alias names, and walking through how to handle NULL values in derived metrics are just as important in ensuring the semantic layer creates a reliable, easy-to-use source of truth.
The Process of Building a Semantic Layer
Curating your metrics
What metrics do you want to calculate in the semantic layer? Before building out your metrics, you need to take inventory of the metrics being used downstream of your dbt data models in Looker.
Making a list of every measure / custom measure used in LookML / Looker will allow you to prioritize the ones used most often. Be sure to also take inventory of where you find these metrics so you can easily replace them once your semantic layer is built out!
Many times, when you take inventory of your metrics, you realize that you don’t have the data models in place to support these metrics. When using a tool like Looker, some of your metrics are built on derived views generated within the tool itself. To move outside of Looker, you will need to “shift left” and build out your transformations within the data modeling layer. Until you have the data models to support metric calculations, you can’t properly utilize the semantic layer.
During this time of curating your metrics, it’s also important to have conversations with stakeholders who rely on these metrics. You need to identify duplicate metrics and clarify which, if any, are the reliable ones. Just because a metric is calculated a certain way within your BI tools does not mean it’s accurate! As an analytics engineer, I’ve found that historic metric calculations tend to linger in reports and dashboards, with nobody realizing that the calculations are out-of-date.
By talking with stakeholders, you can clarify their expectations around the metrics, eventually documenting these within dbt for easy tracking of how metrics change over time. This way, you aren’t spending time building data models and a semantic layer based on old assumptions of your data.
Choosing what you want to build
As I mentioned before, to create a semantic layer, you need to have core data models already written in dbt. Let the models that you have available be the ones you begin building your semantic layer on. However, if there are metrics that you gathered in the previous step that need to be built into dbt, be sure to prioritize building data models that allow you to do so.
Next, focus on the metrics that are most commonly used across different teams within the business. These are the metrics that will benefit most from being coded into the semantic layer. Once you identify these, you need to consider entities, dimensions, and measures that the business would be interested in.
In dbt’s semantic layer, an entity represents a unique object within the business. This is typically the primary key of your model. In the case of orders, order ID would be the entity.
Dimensions are the qualitative fields you can group and filter your data by. Order type and order date are examples of this.
Measures are aggregations performed on fields within your model. Think of these as sums or counts.
The entities, measures, and dimensions that you define should consider how the metric is currently being used and how it should be used in the future.
For example, if multiple lines of business look at order revenue, you need to understand how they like to slice and dice this metric. Do they want to know order revenue by order type? Do they like to compare how order revenue changes week over week? Answering questions like this will allow you to code these details into your calculation in the next step.
Coding everything into the semantic layer
When coding metrics into the semantic layer, you need to follow this order of operations:
- Create your semantic model with the entities, dimensions, and measures needed for your metric calculations.
- Define your metrics.
- Document your metrics.
First, you’ll want to create a YAML file within a metrics subdirectory in the models directory of your dbt project (models/metrics/example.yml). This is where you will define one semantic model for each of the data models you plan to use in the semantic layer.
We recommend naming the YAML file the same name as the model it sits on top of. For example, a model named sessions_calculated would have a YAML file named sessions_calculated.yml.
Within this file, you would define the name of the model, its description, and the model it sits on top of. It would look like this:
semantic_models:
– name: sessions_calculated
description: |
Sessions calculated (< 30min). This table’s grain is one row per session, per account.
model: ref(‘sessions_calculated’)
You then need to define the entities, or keys, of the model. In this case, it would be session_id and account_id:
semantic_models:
– name: sessions_calculated
description: |
Order fact table. This table’s grain is one row per session, per account.
model: ref(‘sessions_calculated’)
entities:
name: session_id
type: primary
name: account_id
type: foreign
After defining your entities, you must define your dimensions. These are determined by stakeholders’ interests in how they’d like to slice and dice the data.
– name: sessions_calculated
description: |
Order fact table. This table’s grain is one row per session, per account.
model: ref(‘sessions_calculated’)
entities:
name: session_id
type: primary
name: account_id
type: foreign
dimensions:
– name: session_started_at
type: time
type_params:
time_granularity: hour
– name: device_type
type: categorical
Lastly, you need to define your measures. Measures are often your final metrics, but can also be used in metric calculations with multiple measures.
– name: sessions_calculated
description: |
Order fact table. This table’s grain is one row per session, per account.
model: ref(‘sessions_calculated’)
entities:
name: session_id
type: primary
name: account_id
type: foreign
dimensions:
– name: session_started_at
type: time
type_params:
time_granularity: hour
– name: device_type
Type: categorical
measures:
name: session_count
description: The number of sessions
agg: count_distinct
expr: session_id
Now you are ready to build your metrics!
Metrics can be added to the same YAML file where your semantic model is defined. The block should match up with the semantic_models block. Here, you define the metric calculation used by the end business user. Because this uses the entities, dimensions, and measures you defined in the semantic layer, you must identify these before you begin the building process!
There are 5 things that every metric needs- a name, description, type, label, and type_params.
Name and description help define the metric and explain more about what it means and how it should be used.
Label refers to how the metric should be referenced further downstream within dbt.
Type refers to the type of metric- simple, ratio, cumulative, or derived. This depends on how you are calculating the metric using the measure.
Type_params depend on the type you define. When it is simple, it includes just one measure. For the other types, it may include a numerator, denominator, grain, or expression.
Metrics:
– name: “session_count”
description: “The number of total sessions”
type: simple
label: “session_count”
type_params:
measure: session_count
– name: “session_count_tablet”
description: “The number of total sessions on a tablet”
type: simple
label: “session_count_tablet”
type_params:
measure: session_count
filter: |
{{ Dimension(‘session_id__device_type’) }} = ‘tablet’
Notice that we added a filter to this metric based on the dimension value. This is optional for all metrics, but necessary when you want to filter your metrics on any of the dimensions defined in the semantic model.
Coding everything into the semantic layer
When working with the semantic layer in dbt, you have two options for integrating your metrics into your BI tool. The first involves a dbt Cloud subscription and making use of the integrations dbt offers directly with BI tools. In this case, the semantic layer generates a query that can then be used within the tool. However, an integration for Looker is not yet available.
The second option involves creating measures equivalent to what you have stored in your semantic layer, directly within Looker. This can work by pre-aggregating each of your metrics into tables and then building LookML views on top of this. However, this requires a lot of effort and maintenance. You need to ensure these metrics in Looker are always in sync with the calculations you’ve defined in dbt.
Because this can end up leading to the same problem you were attempting to solve in the first place, your best option is to use Euno to keep calculations in sync. Euno creates a LookML measure in the view that points to the respective dbt model, ensuring every change within dbt’s semantic layer manifests in LookML. By doing it this way, Looker users can dynamically query these metrics.
Teaching analysts how to work with the semantic layer
Once you have the semantic layer built out by your analytics engineers, you need to ensure it is ready to be used by data analysts. If you haven’t properly documented your metrics, do this before anything else. The metrics you’ve defined are only as powerful as the documentation written to support them!
Silja Mardla, Senior Analytics Engineer at Bolt, has a great system in place for defining reusable doc blocks that can be used in metric definitions. She created a collection of generic doc blocks to describe the different types of metrics (cumulative, ratio, derived, etc.) and uses those in her metric definitions along with a doc block for the base metric.
For example, a doc block called metrics_cumulative describes how cumulative metrics are calculated within the semantic layer. She then pairs this with a separate doc block for the base metric, rides_orders_created, which contains the compiled SQL code of the metric.
Together, the description for the rides_orders_created_7_days metric looks like so:
metrics:
– name: rides_orders_created_in_7_day
description: >
{{ doc(‘metrics_cumulative’) }}
{{ doc(‘rides_orders_created’) }}
type: cumulative
label: Rides Orders Created (UTC) in 7 days
type_params:
measure: rides_orders
window: 7 days
Thorough documentation like this is key for analysts’ successful adoption of the semantic layer. It will also enable them to better understand how each metric works so that they can properly use it downstream in the dashboards and reports they create. It will take some time for them to adjust from calculating metrics themselves in views to choosing a LookML measure from an aggregated view.
Building processes around using the semantic layer
Lastly, now that you have the semantic layer built, you need to ensure the data team and its stakeholders start and continue to adopt it. A tool is not valuable unless it’s being consistently used and updated, replacing the old processes before it.
While there may be an initial overhaul of all metrics to the semantic layer, it’s important to consider how the process of building metrics will change going forward. You don’t want to be in a continual process of overhauling metric calculations from Looker to dbt!
When a stakeholder requests an analyst to build a metric, how will that be handled? What is the process of working with the analytics engineers to build supporting dbt models? What happens if you don’t have a data model built to support a metric calculation? Consider everything that needs to happen for the process to go smoothly, ensuring you don’t end up right where you started.
Enter Euno
Looking to streamline this entire process? Euno can help you automatically generate semantic models and their metrics in dbt with minimal lift. The tool will identify the most used custom measures/measures in Looker/LookML and help you shift them left into dbt’s semantic layer, saving your analytics engineers time and effort. Euno will then sync these metrics, enabling analysts to dynamically query those metrics from Looker without worrying if they’re up-to-date with the semantic layer definitions.
By using Euno, you don’t need to do the tedious work of taking inventory of all calculations across Looker. You can ensure your data is reliable and always up-to-date with what analytics engineers have defined in the semantic layer. Euno does the tedious work for you, allowing you to focus your time on what really matters- clarifying expectations with stakeholders and making your data environment more scalable.
***
Euno it → We’re officially on the dbt Cloud integrations list. Check us out!