This post is a follow up to Crunching Numbers in APL available for kindle at Amazon.
Undaunted by my last foray into online analytical processing (OLAP), I sat down to code what I thought was OLAP. Workspace more_olap is here.
Should you load this workspace, you will find the following variables:
irdb The balance sheets for International Bank for Reconstruction and Development for the years 2010 and 2011. cats, subs, yrs Tables of categories compiled from irdb facts An OLAP cube of facts compiled from irdb. This cube has two dimensions, subcategories and years. Each cell of the cube contains the facts in irdb for that combination of subcategory and year.A fact in this exercise is the line item and amount columns from irdb.
utl∆numberedArray ⊃ irdb[1;]
[001] Category code
[002] Category
[003] Subcategory code
[004] Subcategory
[005] Line item
[006] Fiscal year
[007] Amount (millions of dollars)
The function olap∆buildFacts loads the irdb data and produces all of these variables. You should also consider olap∆buildVars, which will produce variables to use as indices of facts.
olap∆buildVars cats subs yrs
)vars cat_
cat_a cat_e cat_l
)vars sub_
sub_b sub_cs sub_da sub_dfb sub_dl sub_i sub_lo sub_nn
sub_o sub_oa sub_oe sub_ol sub_orcv sub_rcv sub_re sub_s
sub_sol
)vars yr_
yr_2009 yr_2010
We now have some variables to use as indices of our fact cube and can look at our cube:
olap∆combineFacts facts[cat_e;yr_2010]
Paid-in capital 11492
Retained earnings 28793
Accumulated other comprehensive loss ¯3043
⍝ Or
+/(olap∆combineFacts facts[cat_e;yr_2010])[;2]
37242
I still haven’t concluded that it’s easier than this:
SELECT Line_item, Amount from irdb where Category_code = 'e' and Fiscal_year = 2010;
But I’m biased. While I’ve been writing APL code longer, I’ve spent more time writing SQL.
This is a simple exercise with simple data. It demonstrates what a data cube might look like and how to simplify slicing and dicing the data. There is no generalized code in this workspace, and therefore I must write a whole new workspace when I want to analyze a new data set.
I’m reminded of the weeks I spent designing a gross margin reporting system for a manufacturer. This company had several product lines and three departments. It had detailed time reports from the factory floor, so that I knew how much labor cost was incurred by product line and by department. It had a perpetual inventory system, so that I knew what material had been drawn from raw material inventory and production counts for each department. This allowed me to construct a model of the manufacturing process and estimates of costs incurred through each step in that process.
I’d like to get my hands on that long-lost data and see if a data cube would simplify anything.
