Maintain Saved Formulas effectively

In the last post Handle NULL efficiently in Aggregation Function, I used saved formulas to split long and complex PQL to reusable components. Today I would like to share my best practice to use saved formulas.

For example I would like to ananlyze throughput time between arbitrary two activities. As below screenshot I created three dropdown buttons, switching time unit (sec, min, hour, day etc.) and two activities (from / to). Variables (time_unit, from, to) are set by user action and passed to PQL for calculating other components relevant to throughtput time (single KPI, histogram, boxplot, drill down table, and fact table).

In each component it is possible to write down throughput time PQL like below. But if I need to change something in PQL, I need to open each component and maintain it multiple times. So I will try to create saved formulas and simplify each component.

DATEDIFF(
    <%=time_unit%>,
    PU_FIRST(
        EKPO,
        _CEL_P2P_ACTIVITIES_EN.EVENTTIME,
        _CEL_P2P_ACTIVITIES_EN.ACTIVITY_EN = <%=from%>
    ),
    PU_FIRST(
        EKPO,
        _CEL_P2P_ACTIVITIES_EN.EVENTTIME,
        _CEL_P2P_ACTIVITIES_EN.ACTIVITY_EN = <%=to%>
    )
)

First I focused on the PU_FIRST portion. These functions can be summarized to formula first_time as below.

--first_time
PU_FIRST(
    EKPO,
    _CEL_P2P_ACTIVITIES_EN.EVENTTIME,
    _CEL_P2P_ACTIVITIES_EN.ACTIVITY_EN = {p1}
)

Then original PQL is changed to below.

DATEDIFF(
    <%=time_unit%>,
    KPI(first_time,VARIABLE(<%=from%>)),    -- call first_time formula passing parameter 'from'
    KPI(first_time,VARIABLE(<%=to%>))       -- call first_time formula passing parameter 'to'
)

This is something better than previous PQL, but let’s encapsulate the DATEDIFF portion. I will create second formula throughput_time as below.

--throughput_time
DATEDIFF(
    {p1},
    KPI(first_time,VARIABLE({p2})),
    KPI(first_time,VARIABLE({p3}))
)

Finally original PQL is changed to KPI(throughput_time, VARIABLE(<%=time_unit%>), VARIABLE(<%=from%>), VARIABLE(<%=to%>)). It means each component is just call throughput_time with three parameters.

By the way, single KPI and drill down table are aggregating throughput time by AVG and COUNT function. In my opinion it is not required to create one more formula of each aggregation function. My focus is replacement of complex ‘dimension’ determination to formula (Exceptionally, I would like to create formula for adding multiple aggregation function case, like Calculate Multi Dimensional KPIs post). In the end each component are determined like below.

Title	Component	PQL
Avg Throughput Time	single KPI	`AVG(KPI(throughput_time, VARIABLE(<%=time_unit%>), VARIABLE(<%=from%>), VARIABLE(<%=to%>)))`
Case Count	single KPI	`COUNT(KPI(throughput_time, VARIABLE(<%=time_unit%>), VARIABLE(<%=from%>), VARIABLE(<%=to%>)))`
Distribution	histogram	`KPI(throughput_time, VARIABLE(<%=time_unit%>), VARIABLE(<%=from%>), VARIABLE(<%=to%>))`
Time Trend	boxplot	same as histogram
Drill Down by vendor	OLAP table	same as single KPI
Fact	OLAP table	same as histogram

One more tips, I recommend to determine multi layered formualas as above because I can use interim result in verification purpose. In the right lower fact sheet, columns of from / to timestamp are displayed then KPI(first_time,VARIABLE(<%=from%>)) and KPI(first_time,VARIABLE(<%=to%>)) are assigned. So that it is easy to verify throughput time calculation of each case.

To summarize today’s discussion,

decomposite complex dimensional PQL to multi layered formulas
assign variable to formula’s parameters (not directly assign variable in the formula)
not required to create formula for simple aggregation

Kaz

This post’s program can be downloaded here then push to your environment by content-cli.

See also