Python API
Before reading this, make sure you're familiar with the concepts from the Query language guide.
Query language is good when you want to explore your data interactively. But sometimes you might want to go further and build your query with Python code. This is useful when you want to parametrize your queries, build custom functions around Dictum or expose your metric store as a web API.
Dictum provides several helpers for this use case.
Metrics and dimensions can be retrieved by name from the corresponsing attributes
of the Project
object. Calculations can be accessed by their ID as both attributes and
dict keys.
from dictum import Project
project = Project.example("chinook")
project.metrics["revenue"]
Revenue
:revenue
sum(UnitPrice * Quantity)
on table invoice_items: invoice_items
project.dimensions.Year
Year
:Year
The Project
object has a select
method that corresponds to the select
clause of
the query language. It accepts a list of metrics we want to select. You can give it a
list of strings, "pieces" of the actual QL query.
project.select("revenue", "revenue.percent as \"pct\"")
Revenue | pct | |
---|---|---|
0 | $2,328.60 | 100% |
But, as you can see, this becomes problematic as the expressions become more complex.
So, you can pass the metric objects from the project directly. Transforms are applied
the same way, by calling the corresponding methods with or without ()
.
m = project.metrics
project.select(m.revenue, m.revenue.percent)
Revenue | Revenue (%) | |
---|---|---|
0 | $2,328.60 | 100% |
To add a grouping, use the by
method. Dimensions are passed the same way as metrics,
either as a string or as an object.
d = project.dimensions
(
project.select(m.revenue, m.revenue.percent)
.by(d.genre)
)
Genre | Revenue | Revenue (%) | |
---|---|---|---|
0 | Alternative | $13.86 | 1% |
1 | Alternative & Punk | $241.56 | 10% |
2 | Blues | $60.39 | 3% |
3 | Bossa Nova | $14.85 | 1% |
4 | Classical | $40.59 | 2% |
5 | Comedy | $17.91 | 1% |
6 | Drama | $57.71 | 2% |
7 | Easy Listening | $9.90 | 0% |
8 | Electronica/Dance | $11.88 | 1% |
9 | Heavy Metal | $11.88 | 1% |
... | ... | ... | ... |
14 | Pop | $27.72 | 1% |
15 | R&B/Soul | $40.59 | 2% |
16 | Reggae | $29.70 | 1% |
17 | Rock | $826.65 | 35% |
18 | Rock And Roll | $5.94 | 0% |
19 | Sci Fi & Fantasy | $39.80 | 2% |
20 | Science Fiction | $11.94 | 1% |
21 | Soundtrack | $19.80 | 1% |
22 | TV Shows | $93.53 | 4% |
23 | World | $12.87 | 1% |
There are where
and limit
methods too.
select = (
project.select(m.revenue, m.revenue.percent)
.by(d.genre)
.where(d.music, d.Time.year >= 2012)
.limit(m.revenue.top(5))
)
select
Genre | Revenue | Revenue (%) | |
---|---|---|---|
0 | Alternative & Punk | $94.05 | 11% |
1 | Jazz | $27.72 | 3% |
2 | Latin | $142.56 | 17% |
3 | Metal | $120.78 | 15% |
4 | Rock | $336.60 | 40% |
select
method (as well as ql
) doesn't return a Pandas DataFrame, it returns a special
query object. When you're using Jupyter, it runs the query for you, but to get the
DataFrame in code, call the df
method. The DataFrame will be unformatted. To keep the
formatting, pass format=True
to the method. Note that in this case, all values will be
strings.
type(select)
dictum_core.project.analyses.Select
df = select.df()
type(df)
pandas.core.frame.DataFrame
df
genre | revenue | revenue__percent | |
---|---|---|---|
0 | Alternative & Punk | 94.05 | 0.112961 |
1 | Jazz | 27.72 | 0.033294 |
2 | Latin | 142.56 | 0.171225 |
3 | Metal | 120.78 | 0.145065 |
4 | Rock | 336.60 | 0.404281 |
select.df(format=True)
Genre | Revenue | Revenue (%) | |
---|---|---|---|
0 | Alternative & Punk | $94.05 | 11% |
1 | Jazz | $27.72 | 3% |
2 | Latin | $142.56 | 17% |
3 | Metal | $120.78 | 15% |
4 | Rock | $336.60 | 40% |
of
and within
can be passed as keyword arguments to the metric transform, e.g.
m.revenue.percent(within=[d.Year])
.
Lets's write a function that builds a typical query for us. We'll give it the metric ID, and it will display that metric as a percentage by Year and Quarter.
def percentage_of_quarter(metric_id: str):
metric = project.metrics[metric_id]
return (
project.select(metric.percent(within=[project.d.Year]))
.by(project.d.Year, project.d.Quarter)
)
percentage_of_quarter("items_sold")
Year | Quarter | Number of Items Sold (%) | |
---|---|---|---|
0 | 2009 | Q1 2009 | 25% |
1 | 2009 | Q2 2009 | 25% |
2 | 2009 | Q3 2009 | 25% |
3 | 2009 | Q4 2009 | 25% |
4 | 2010 | Q1 2010 | 25% |
5 | 2010 | Q2 2010 | 25% |
6 | 2010 | Q3 2010 | 25% |
7 | 2010 | Q4 2010 | 25% |
8 | 2011 | Q1 2011 | 26% |
9 | 2011 | Q2 2011 | 26% |
10 | 2011 | Q3 2011 | 26% |
11 | 2011 | Q4 2011 | 23% |
12 | 2012 | Q1 2012 | 26% |
13 | 2012 | Q2 2012 | 26% |
14 | 2012 | Q3 2012 | 23% |
15 | 2012 | Q4 2012 | 26% |
16 | 2013 | Q1 2013 | 24% |
17 | 2013 | Q2 2013 | 25% |
18 | 2013 | Q3 2013 | 26% |
19 | 2013 | Q4 2013 | 26% |