Skip to content

Dimension

A dimension is something that you can group and filter your metrics by. A user registration date, order delivery address, total number of orders a user made.

Most dimensions are just table columns, like name in the product_categories table. The power of Dictum comes from the fact that when you request revenue by product category, you don't have to remember that you have to first join products by product_id to orders and then product_categories by category_id to products. Because all the information about table relationships is already in your model, all you have to do is specify the dimension.

Dimensions are tied to tables, so unlike measures, they can only be defined in the same file as the table, under the dimensions key.

Schema

name: |
    string, required
    Human-readable descriptive display name for a dimension

description: |
    string, optional
    A longer description for documentation purposes.

expr: |
    string expression, required
    Must be an non-aggregate expression.
    An aggregate expression will result in an error.

type: |
    float | int | date | datetime | bool | str, required
    The data type of the calculation.

format: |
    string or dict, optional
    Formatting information (see Formatting).

missing: |
    whatever type is specified in the type field, optional
    Missing values (NULL) in the dimension will be replaced
    with value specified in this option.

union: |
    string, optional
    If present, this dimension will be treated as belonging
    to a union with this ID. The union must be defined
    in unions.yml. See below for explanation.

Unions

Data in data warehouses is not always perfectly normalized. If dimensions are tied to tables, it means that each dimension can only exist in one table at a time, meaning that the data must be normalized for Dictum to work.

So, for example, if you have a dimension called country, Dictum will expect that there's some table holding a list of countries that it can join to the fact tables.

What if your data is denormalized, and country is just a literal column in, say, users and orders table? Common sense tells us that "France" in users.country and "France" in orders.country is the same France. We might want to construct a query that gives us both number of users and number of orders per country.

To connect the two columns we can define a thing called a union. A union is like an alias for several other dimensions.

# unions.yml
country:
  name: Country
  type: str
# tables/users.yml
...
dimensions:
  user_country:
    name: Users's Country in Profile
    type: str
    expr: country
    union: country
# tables/orders.yml
...
dimensions:
  order_country:
    name: Order Delivery Country
    type: str
    expr: country
    union: country

For a user querying Dictum, there are now 3 dimensions: a generic country, a user_country and an order_country. So they can get number of users by country, number of orders by country and number of orders by user_country.

Aggregate dimensions

Info

This feature, while convenient, produces subobptimal queries. It's recommended that you materialize such information as a physical column and just use normal dimensions.

Some dimensions can be calculated from measures. Let's say you have a measure called revenue and you want to calculate the number of users per revenue bracket. That is, first you need to calculate revenue on a per-user basis and then use that value as a dimension.

To do this, you can just use $revenue in the dimension expression:

# tables/users.yml
...
primary_key: id  # primary_key is required for aggregate dimensions to work
dimensions:
  user_revenue_bracket:
    name: User Revenue Bracket
    description: |
      Revenue bracket for a user. The brackets are:
      [0, 10), [10, 100), [100, 1000), [1000, ...]
    type: string
    expr: |
        case when $revenue < 10 then '< 10'
             when $revenue < 100 then '< 100'
             when $revenue < 1000 then '< 1000'
             else '>= 1000'
        end