Three routes : SQL Lead/DAX Earlier/Pandas Shift

Simi Talkar
6 min readNov 22, 2020

Relevant Scenario

In a company that handles leases, one of the rates that accounting applies to generate journal entries is depreciation rate. The accounting team has been tasked with updating the depreciation rate for assets for various countries beginning January 01, 2020. While they can insert the new rates and set their effective date from January 01, 2020, the team also has to cap the previous rates denoting that the historic rates stored are effective only until midnight of December 31, 2020.

This is where the analytic LEAD function in SQL steps in. It can be used to look into the rows ahead (when rows are ordered in ascending order of effective date) and retrieve the start (effective from date) of the next rate validity window.

Data tables

IMPORTANT TO NOTE: The data is fabricated for demo purposes and should not be considered in real-world business applications.

A small subset of the historic data* (please see note above) in DEPRECIATION_TABLE is shown below. The columns are ASSET, COUNTRY, RATE, EFFECTIVE_FROM and EFFECTIVE_UNTIL. There are rows for two countries for two asset types all of which are effective from Jan 01, 2019 until new rates arrive. This is indicated by a blank/(null) EFFECTIVE_UNTIL date. When new rates arrive, these records will need to be capped (their EFFECTIVE_UNTIL will be capped to a second before new rate’s EFFECTIVE_FROM date).

DEPRECIATION_TABLE

Goal:

Once new rates have been reported to the IT team, they have to be entered with their date of effectiveness. At the same time an indication that the old rate is no longer effective is flagged by setting the EFFECTIVE_UNTIL of the previous rate to one second prior to the new rate’s EFFECTIVE_FROM as seen below.

FINAL RATE TABLE DATA

Technique in SQL

Oracle

The data in DEPRECIATION_RATE that historically exists is shown in the table above. It can be setup as seen in Setup Data section below.

The new rates to be entered are inserted as seen in the SQL Developer screen shot. You must have noticed that the rates for Brunei change effective from January 3rd, 2020 where as the new rates for Andorra are effective from January 1st 2020. The 2019 rates for both the countries need to be capped but while the 2019 rates for Andorra will be effective until a second before midnight of December 31st, 2020, the rates for Brunei will be effective until a second before midnight of January 2nd, 2020.

Oracle has a function called MERGE that just like CTEs make the logic of an update, insert or delete, easy to break down. The using clause creates the values to the we will use in our update. The outer table is the DEPRECIATION_RATE aliased as d. The inner table is the DEPRECIATION_RATE table aliased as n. The ASSET, COUNTRY and EFFECTIVE_FROM columns are selected in ‘n’ to identify the rows to be joined on. When the join columns match the EFFECTIVE_UNTIL of the out aliased ‘d’ table is set to the EFFECTIVE_UNTIL as created by the LEAD function in the inner ’n’ table.

The LEAD analytic function groups the rows into ones with the same ASSET and COUNTRY and orders each group by the EFFECTIVE_FROM date in ascending order. By default this function reaches forward by 1 row (this default can be changed) within each group and retrieves the EFFECTIVE_DATE value in the row. And so for the rows with year 2019, (the historic year) the next row in each group will contain the effective from date (with year 2020) of the new rates. Since the cap must occur a second before midnight we subtract 1/*24*60*60) from the next effective from date.

In PostgreSQL, we cannot avail of the MERGE function and so we must identify the joining row by its ROWID/CTID. Here is the code for the two databases.

This results in the capping of the EFFECTIVE_UNTIL for the prior rate as seen in Goal section in the picture captioned “FINAL RATE TABLE DATA”.

Technique in Power BI

The rate files are typically provided as excel worksheets and can be imported using Get Data (from Excel). At the point seen below, we have both the historic and the freshly supplied rates in the model.

We will now create a EFFECTIVE_UNTIL calculated column in the model. A very powerful way to partition in DAX is to use the EARLIER function. It ropes off rows with the same country, asset but an effective date greater than that of the row in the current row context. If this filtered table has rows (that is if we have entered new rates) then we have rows with the new EFFECTIVE_FROM date. The last of these is picked with the MAX function and one second is subtracted from it by creating a TIME delta with TIME(0, 0, 1).

Calculated column Effective Until

Formula for the calculated column:

This is the result of the formula above in creating the calculated column.

Result of the formula in the calculated column

Technique in Pandas

The historic data was created as shown in Setup Data section below. The lists of columnar data are “zip”ped to be served up as a list to the constructor of the dataframe. The update to the rate changes in each country with the associated effective date was applied using the append function of the pandas dataframe.

The key code in the above snippet is again the partitioning achieved by grouping rows by a combination of country and asset. But here, note the SHIFT being applied to grab the EFFECTIVE_FROM value from the next row in a group to store in the current row. Since our rows are already sorted, this brings in the next rate’s EFFECTIVE_FROM into the prior rate’s EFFECTIVE_UNTIL.

Bringing the code out from above snippet for an deep dive:

depreciation_rate_df[‘effective_until’] =            depreciation_rate_df.groupby([‘country’, ‘asset’])
[‘effective_from’].shift(-1)

We then subtract a one second from this date if there is a capping date using a lambda function in the line after it.

depreciation_rate_df[‘effective_until’] = depreciation_rate_df[‘effective_until’].apply(lambda x: x-datetime.timedelta(seconds=1) if ~pd.isnull(x) else None)

The result is as seen in the Goal section ab:

Setup Data

The data was set up in Oracle using the create table and inserts provided:

create table depreciation_rate
(
asset varchar2(100) NOT NULL,
country varchar2(100) NOT NULL,
rate NUMBER(5,2) NOT NULL,
effective_from date,
effective_until date
);
insert into depreciation_rate (asset, country, rate, valid_from, valid_until)
values ('Automobiles', 'Andora', 12.5, to_date('2019-01-01', 'YYYY-MM-DD'), null);
insert into depreciation_rate (asset, country, rate, valid_from, valid_until)
values ('Building', 'Andora', 4.5, to_date('2019-01-01', 'YYYY-MM-DD'), null);
insert into depreciation_rate (asset, country, rate, valid_from, valid_until)
values ('Automobiles', 'Brunei', 14, to_date('2019-01-01', 'YYYY-MM-DD'), null);
insert into depreciation_rate (asset, country, rate, valid_from, valid_until)
values ('Building', 'Brunei', 6.5, to_date('2019-01-01', 'YYYY-MM-DD'), null);

In Pandas, the depreciation_rate dataframe was created using the DataFrame function:

import pandas as pd

A final thought:

I appreciate suggestions to implement the functionality detailed in my article in other efficient ways. Please feel free to touch base with comments and questions.

--

--

Simi Talkar

Certified DS Associate (DP-100, DA-100), pursuing Masters in University Of Michigan’s Applied Data Science Program https://www.linkedin.com/in/simi-talkar/