Python API

Before reading this, make sure you're familiar with the concepts from the Query language guide.

Query language is good when you want to explore your data interactively. But sometimes you might want to go further and build your query with Python code. This is useful when you want to parametrize your queries, build custom functions around Dictum or expose your metric store as a web API.

Dictum provides several helpers for this use case.

Metrics and dimensions can be retrieved by name from the corresponsing attributes of the Project object. Calculations can be accessed by their ID as both attributes and dict keys.

In [1]:

            
                Copied!
                
from dictum import Project

project = Project.example("chinook")
project.metrics["revenue"]
from dictum import Project

project = Project.example("chinook")
project.metrics["revenue"]

Out[1]:

Revenue

:revenue

sum(UnitPrice * Quantity) on table invoice_items: invoice_items

In [2]:

            
                Copied!
                
project.dimensions.Year
project.dimensions.Year

Out[2]:

Year

:Year

The Project object has a select method that corresponds to the select clause of the query language. It accepts a list of metrics we want to select. You can give it a list of strings, "pieces" of the actual QL query.

In [3]:

            
                Copied!
                
project.select("revenue", "revenue.percent as \"pct\"")
project.select("revenue", "revenue.percent as \"pct\"")

Out[3]:

	Revenue	pct
0	$2,328.60	100%

But, as you can see, this becomes problematic as the expressions become more complex. So, you can pass the metric objects from the project directly. Transforms are applied the same way, by calling the corresponding methods with or without ().

In [4]:

            
                Copied!
                
m = project.metrics
project.select(m.revenue, m.revenue.percent)
m = project.metrics
project.select(m.revenue, m.revenue.percent)

Out[4]:

	Revenue	Revenue (%)
0	$2,328.60	100%

To add a grouping, use the by method. Dimensions are passed the same way as metrics, either as a string or as an object.

In [5]:

            
                Copied!
                
d = project.dimensions
(
    project.select(m.revenue, m.revenue.percent)
    .by(d.genre)
)
d = project.dimensions
(
    project.select(m.revenue, m.revenue.percent)
    .by(d.genre)
)

Out[5]:

	Genre	Revenue	Revenue (%)
0	Alternative	$13.86	1%
1	Alternative & Punk	$241.56	10%
2	Blues	$60.39	3%
3	Bossa Nova	$14.85	1%
4	Classical	$40.59	2%
5	Comedy	$17.91	1%
6	Drama	$57.71	2%
7	Easy Listening	$9.90	0%
8	Electronica/Dance	$11.88	1%
9	Heavy Metal	$11.88	1%
...	...	...	...
14	Pop	$27.72	1%
15	R&B/Soul	$40.59	2%
16	Reggae	$29.70	1%
17	Rock	$826.65	35%
18	Rock And Roll	$5.94	0%
19	Sci Fi & Fantasy	$39.80	2%
20	Science Fiction	$11.94	1%
21	Soundtrack	$19.80	1%
22	TV Shows	$93.53	4%
23	World	$12.87	1%

There are where and limit methods too.

In [6]:

            
                Copied!
                
                    
                    
                
                

        
select = (
    project.select(m.revenue, m.revenue.percent)
    .by(d.genre)
    .where(d.music, d.Time.year >= 2012)
    .limit(m.revenue.top(5))
)
select
select = (
    project.select(m.revenue, m.revenue.percent)
    .by(d.genre)
    .where(d.music, d.Time.year >= 2012)
    .limit(m.revenue.top(5))
)
select

Out[6]:

	Genre	Revenue	Revenue (%)
0	Alternative & Punk	$94.05	11%
1	Jazz	$27.72	3%
2	Latin	$142.56	17%
3	Metal	$120.78	15%
4	Rock	$336.60	40%

select method (as well as ql) doesn't return a Pandas DataFrame, it returns a special query object. When you're using Jupyter, it runs the query for you, but to get the DataFrame in code, call the df method. The DataFrame will be unformatted. To keep the formatting, pass format=True to the method. Note that in this case, all values will be strings.

In [7]:

            
                Copied!
                
type(select)
type(select)

Out[7]:

dictum_core.project.analyses.Select

In [8]:

            
                Copied!
                
df = select.df()
type(df)
df = select.df()
type(df)

Out[8]:

pandas.core.frame.DataFrame

In [9]:

Out[9]:

	genre	revenue	revenue__percent
0	Alternative & Punk	94.05	0.112961
1	Jazz	27.72	0.033294
2	Latin	142.56	0.171225
3	Metal	120.78	0.145065
4	Rock	336.60	0.404281

In [10]:

            
                Copied!
                
select.df(format=True)
select.df(format=True)

Out[10]:

	Genre	Revenue	Revenue (%)
0	Alternative & Punk	$94.05	11%
1	Jazz	$27.72	3%
2	Latin	$142.56	17%
3	Metal	$120.78	15%
4	Rock	$336.60	40%

of and within can be passed as keyword arguments to the metric transform, e.g. m.revenue.percent(within=[d.Year]).

Lets's write a function that builds a typical query for us. We'll give it the metric ID, and it will display that metric as a percentage by Year and Quarter.

In [11]:

            
                Copied!
                
                    
                    
                
                

        
def percentage_of_quarter(metric_id: str):
    metric = project.metrics[metric_id]
    return (
        project.select(metric.percent(within=[project.d.Year]))
        .by(project.d.Year, project.d.Quarter)
    )

percentage_of_quarter("items_sold")
def percentage_of_quarter(metric_id: str):
    metric = project.metrics[metric_id]
    return (
        project.select(metric.percent(within=[project.d.Year]))
        .by(project.d.Year, project.d.Quarter)
    )

percentage_of_quarter("items_sold")

Out[11]:

	Year	Quarter	Number of Items Sold (%)
0	2009	Q1 2009	25%
1	2009	Q2 2009	25%
2	2009	Q3 2009	25%
3	2009	Q4 2009	25%
4	2010	Q1 2010	25%
5	2010	Q2 2010	25%
6	2010	Q3 2010	25%
7	2010	Q4 2010	25%
8	2011	Q1 2011	26%
9	2011	Q2 2011	26%
10	2011	Q3 2011	26%
11	2011	Q4 2011	23%
12	2012	Q1 2012	26%
13	2012	Q2 2012	26%
14	2012	Q3 2012	23%
15	2012	Q4 2012	26%
16	2013	Q1 2013	24%
17	2013	Q2 2013	25%
18	2013	Q3 2013	26%
19	2013	Q4 2013	26%