This post is a follow up to Crunching Numbers in APL available for kindle at Amazon.
Undaunted by my last foray into online analytical processing (OLAP), I sat down to code what I thought was OLAP. Workspace more_olap is here.
Should you load this workspace, you will find the following variables:
irdb The balance sheets for International Bank for Reconstruction and Development for the years 2010 and 2011. cats, subs, yrs Tables of categories compiled from irdb facts An OLAP cube of facts compiled from irdb. This cube has two dimensions, subcategories and years. Each cell of the cube contains the facts in irdb for that combination of subcategory and year.A fact in this exercise is the line item and amount columns from irdb.
utl∆numberedArray ⊃ irdb[1;] [001] Category code [002] Category [003] Subcategory code [004] Subcategory [005] Line item [006] Fiscal year [007] Amount (millions of dollars)
The function olap∆buildFacts loads the irdb data and produces all of these variables. You should also consider olap∆buildVars, which will produce variables to use as indices of facts.
olap∆buildVars cats subs yrs )vars cat_ cat_a cat_e cat_l )vars sub_ sub_b sub_cs sub_da sub_dfb sub_dl sub_i sub_lo sub_nn sub_o sub_oa sub_oe sub_ol sub_orcv sub_rcv sub_re sub_s sub_sol )vars yr_ yr_2009 yr_2010
We now have some variables to use as indices of our fact cube and can look at our cube:
olap∆combineFacts facts[cat_e;yr_2010] Paid-in capital 11492 Retained earnings 28793 Accumulated other comprehensive loss ¯3043 ⍝ Or +/(olap∆combineFacts facts[cat_e;yr_2010])[;2] 37242
I still haven’t concluded that it’s easier than this:
SELECT Line_item, Amount from irdb where Category_code = 'e' and Fiscal_year = 2010;
But I’m biased. While I’ve been writing APL code longer, I’ve spent more time writing SQL.
This is a simple exercise with simple data. It demonstrates what a data cube might look like and how to simplify slicing and dicing the data. There is no generalized code in this workspace, and therefore I must write a whole new workspace when I want to analyze a new data set.
I’m reminded of the weeks I spent designing a gross margin reporting system for a manufacturer. This company had several product lines and three departments. It had detailed time reports from the factory floor, so that I knew how much labor cost was incurred by product line and by department. It had a perpetual inventory system, so that I knew what material had been drawn from raw material inventory and production counts for each department. This allowed me to construct a model of the manufacturing process and estimates of costs incurred through each step in that process.
I’d like to get my hands on that long-lost data and see if a data cube would simplify anything.