I looked at many PQLs that can be simplified if they know about NULL handling well. Today I would like to tell how to handle NULL efficiently in Aggregation Functions (COUNT
,SUM
,AVG
etc.).
Today I would like to use O2C process to explain my case. I determined KPI Send Invoice within a day after Ship Goods
, because sales company will Ship Goods then should Send Invoice immediately.
Same as previous posts, first I would like to create OLAP table to look into the cases. As below screenshot, I determined _CASE_KEY
(sales order item), Customer
for classifying cases later, timestamp of Ship Goods
and Send Inovice
activities at first. Timestamps are extracted by PU_FIRST( VBAP, _CEL_O2C_ACTIVITIES.EVENTTIME, _CEL_O2C_ACTIVITIES.ACTIVITY_EN = 'Ship Goods')
etc. Timestamp columns are NULL
(displayed as -
) if there is no record fullfills Pull up filter condition, that means no specific activity in that case.
Next column days between Ship and Invoice
is time difference between two activities, as below PQL (This PQL is frequently used so I registered saved formula days_between_ship_and_invoice
). Screenshot shows that if either of timestamps are NULL
then time difference is also NULL
. In other word, time difference value can only be calculated if both timestamps exists.
-- days_between_ship_and_invoice
DATEDIFF(
dd, -- calculate time difference as day
PU_FIRST(
VBAP,
_CEL_O2C_ACTIVITIES.EVENTTIME,
_CEL_O2C_ACTIVITIES.ACTIVITY_EN = 'Ship Goods'
),
PU_FIRST(
VBAP,
_CEL_O2C_ACTIVITIES.EVENTTIME,
_CEL_O2C_ACTIVITIES.ACTIVITY_EN = 'Send Invoice'
)
)
The last column Within a day
(it is also registered as within_a_day
formula) is judging if time difference is less than 1 day by PQL CASE WHEN KPI(days_between_ship_and_invoice) < 1 THEN 1 ELSE 0 END
. If days_between_ship_and_invoice
is NULL
, then comparing result between days_between_ship_and_invoice
and 1
is also NULL
.
To summarize above discussion, within_a_day
formula returns
1
if both activities exists and time difference is less than 1 day0
if both actiivties exists and time difference is more than 1 dayNULL
if either activity is not found
Next step is to group Within a day
column (dimension) by Customer using aggregation functions.
- Invoice Count :
COUNT(KPI(within_a_day))
ignoresNULL
record and count the rest of record. - Within a day Count :
SUM(KPI(within_a_day))
ignoresNULL
record and sum up the rest of record. - Within a day Ratio :
AVG(KPI(within_a_day))
is same asSUM(KPI(within_a_day)) / COUNT(KPI(within_a_day))
, ignoringNULL
record count from denominator (COUNT
) and calculate avarage of the rest of record.
Especially for calculating ratio (AVG
) you need to take care denominator (COUNT
). Below screenshot shows example of one Customer that have 16 record including 6 NULL
record, so denominator of ratio is 10.
In this series of calculation I assume that I can ignore the case that does not have Send Invoice
. If I think differently that no Send Invoice
means forgetting to send invoice, of cource it takes more than 1 day and include Ratio calculation.
To implement this considering NULL handling, I will minimumly change days_between_ship_and_invoice
formula to set default value (today) instead of NULL
as below PQL. Below screenshot is the result of this PQL change, changing denominator from 10 to 13.
DATEDIFF(
dd,
PU_FIRST(
VBAP,
_CEL_O2C_ACTIVITIES.EVENTTIME,
_CEL_O2C_ACTIVITIES.ACTIVITY_EN = 'Ship Goods'
),
COALESCE( -- set default value if PU_FIRST result is NULL
PU_FIRST(
VBAP,
_CEL_O2C_ACTIVITIES.EVENTTIME,
_CEL_O2C_ACTIVITIES.ACTIVITY_EN = 'Send Invoice'),
TODAY() -- default value is today
)
)
Kaz
This post’s program can be downloaded here then push to your environment by content-cli.