More OLAP

This post is a follow up to Crunching Numbers in APL available for kindle at Amazon.

Undaunted by my last foray into online analytical processing (OLAP), I sat down to code what I thought was OLAP. Workspace more_olap is here.

Should you load this workspace, you will find the following variables:

irdb The balance sheets for International Bank for Reconstruction and Development for the years 2010 and 2011. cats, subs, yrs Tables of categories compiled from irdb facts An OLAP cube of facts compiled from irdb. This cube has two dimensions, subcategories and years. Each cell of the cube contains the facts in irdb for that combination of subcategory and year.

A fact in this exercise is the line item and amount columns from irdb.

      utl∆numberedArray ⊃ irdb[1;]
[001] Category code
[002] Category
[003] Subcategory code
[004] Subcategory
[005] Line item
[006] Fiscal year
[007] Amount (millions of dollars)

The function olap∆buildFacts loads the irdb data and produces all of these variables. You should also consider olap∆buildVars, which will produce variables to use as indices of facts.

      olap∆buildVars cats subs yrs
      )vars cat_
cat_a   cat_e   cat_l
      )vars sub_
sub_b       sub_cs  sub_da  sub_dfb     sub_dl      sub_i       sub_lo  sub_nn
sub_o       sub_oa  sub_oe  sub_ol      sub_orcv    sub_rcv     sub_re  sub_s
sub_sol
      )vars yr_
yr_2009     yr_2010

We now have some variables to use as indices of our fact cube and can look at our cube:

      olap∆combineFacts facts[cat_e;yr_2010]
 Paid-in capital                      11492
 Retained earnings                    28793
 Accumulated other comprehensive loss ¯3043

      ⍝ Or
      +/(olap∆combineFacts facts[cat_e;yr_2010])[;2]
37242

I still haven’t concluded that it’s easier than this:

SELECT Line_item, Amount from irdb where Category_code = 'e'
and Fiscal_year = 2010;

But I’m biased. While I’ve been writing APL code longer, I’ve spent more time writing SQL.

This is a simple exercise with simple data. It demonstrates what a data cube might look like and how to simplify slicing and dicing the data. There is no generalized code in this workspace, and therefore I must write a whole new workspace when I want to analyze a new data set.

I’m reminded of the weeks I spent designing a gross margin reporting system for a manufacturer. This company had several product lines and three departments. It had detailed time reports from the factory floor, so that I knew how much labor cost was incurred by product line and by department. It had a perpetual inventory system, so that I knew what material had been drawn from raw material inventory and production counts for each department. This allowed me to construct a model of the manufacturing process and estimates of costs incurred through each step in that process.

I’d like to get my hands on that long-lost data and see if a data cube would simplify anything.