This post is a follow up to *Crunching Numbers in APL* available for kindle at Amazon.
Undaunted by my last foray into online analytical processing (OLAP), I sat down to code what I thought was OLAP. Workspace more_olap is here.
Should you load this workspace, you will find the following variables:
A fact in this exercise is the line item and amount columns from irdb.
irdb is made up of these columns:
utl∆numberedArray ⊃ irdb[1;] [001] Category code [002] Category [003] Subcategory code [004] Subcategory [005] Line item [006] Fiscal year [007] Amount (millions of dollars)
The function olap∆buildFacts loads the irdb data and produces all of these variables. You should also consider olap∆buildVars, which will produce variables to use as indices of facts.
olap∆buildVars cats subs yrs )vars cat_ cat_a cat_e cat_l )vars sub_ sub_b sub_cs sub_da sub_dfb sub_dl sub_i sub_lo sub_nn sub_o sub_oa sub_oe sub_ol sub_orcv sub_rcv sub_re sub_s sub_sol )vars yr_ yr_2009 yr_2010
We now have some variables to use as indices of our fact cube and can look at our cube:
olap∆combineFacts facts[cat_e;yr_2010] Paid-in capital 11492 Retained earnings 28793 Accumulated other comprehensive loss ¯3043 ⍝ Or +/(olap∆combineFacts facts[cat_e;yr_2010])[;2] 37242
I still haven’t concluded that it’s easier than this:
SELECT Line_item, Amount from irdb where Category_code = 'e' and Fiscal_year = 2010;
But I’m biased. While I’ve been writing APL code longer, I’ve spent more time writing SQL.
This is a simple exercise with simple data. It demonstrates what a data cube might look like and how to simplify slicing and dicing the data. There is no generalized code in this workspace, and therefore I must write a whole new workspace when I want to analyze a new data set.
I’m reminded of the weeks I spent designing a gross margin reporting system for a manufacturer. This company had several product lines and three departments. It had detailed time reports from the factory floor, so that I knew how much labor cost was incurred by product line and by department. It had a perpetual inventory system, so that I knew what material had been drawn from raw material inventory and production counts for each department. This allowed me to construct a model of the manufacturing process and estimates of costs incurred through each step in that process.
I’d like to get my hands on that long-lost data and see if a data cube would simplify anything.