OLAP stands for OnLine Analytical Processing. This post describes why
I’m not ready to write about OLAP.
I’ve been reading various web pages about OLAP and have reached two
conclusions. First, the demand for OLAP is driven by SQL commands’
complexity, which can arise when the programmer is querying complex
databases. Nontechnical users stumble badly in that environment, and
even correct queries can take too long to execute.
Second, the processing is done on a new non-SQL database designed to
make querying easier and processing time faster. Generally, this means
an underlying SQL database is kept to record transactions, and an OLAP
database is updated periodically from the SQL database. The OLAP
database usually is an array of facts determined by the SQL queries.
Each dimension of the array is a fact attribute for which aggregate data
may be sought.
I’ve been struggling to a third conclusion, that I can reach a better
understanding of the data by investigating why and how the data was
compiled than by constructing complicated SQL queries.
I found an open-source OLAP application, Cubes, written in
python([[http://cubes.databrewery.org/]]). I downloaded the application
and started on the tutorial. Step one, called Hello World, constructs a
data cube from balance sheets of the International Bank for
Reconstruction and Development for the years 2010 and 2011.
I was planning to code an OLAP database in APL, so rather than following
the tutorial, I just loaded the supplied csv file into APL.
ibrd←import∆file 'Downloads/IBRD_Balance_Sheet__FY2010.csv' ⍴ibrd 63 7
That didn’t look like a lot of data, so I displayed some of it:
ibrd[⍳5;] Category Code Category Subcategory Code Subcategory Line Item Fiscal Year Amount (US$, Millions) a Assets dfb Due from Banks Unrestricted currencies 2010 1581 a Assets dfb Due from Banks Unrestricted currencies 2009 2380 a Assets dfb Due from Banks Currencies subject to restriction 2010 222 a Assets dfb Due from Banks Currencies subject to restriction 2009 664
My seven columns:
1. Category code
2. Category
3. Subcategory code
4. Subcategory
5. Description (called line item above)
6. Fiscal year
7. Amount (in $US millions)
I concluded that our cube should have two dimensions, description and
year. Each fact (cell in the array) should be made up of a description
and an amount. The category Descriptions is a hierarchy of category,
subcategory and item.
With that in mind I dived off the cliff and started writing APL queries.
Question one always is whether the debits equal the credits, or in this
case whether total assets equal total liabilities plus total equities.
I needed to confirm that both years and amounts were in fact numbers:
utl∆numberp ¨ 1 0↓ibrd[⍳10;6 7] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Yes.
I knew I’d get tired of the column heads, so I copied the array without
them.
db←1 0 ↓ibrd
I also determined the universe of category codes to simplify my next
query.
db[;1] a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a l l l l l l l l l l l l l l l l l l l l l l e e e e e e e e
I concluded that *a* means assets, *l* means liabilities, and *e* means
equity. My queries:
+/(∊db[;1]='a')/db[;7] 558430 +/(∊db[;1]='l')/db[;7] 480838 +/(∊db[;1]='e')/db[;7] 77592 480838 + 77592 558430
I started planning my next query and was curious: what descriptions
describe each fact?
⍞←⎕tc[3] utl∆join (db[;6]=2010)/db[;5] Unrestricted currencies Currencies subject to restriction Trading Securities purchased under resale agreements Nonnegotiable, nonintrest-bearing demand obligations on account of subscribed capital Investments Client operations Borrowings Other Receivables to maintain value of currency holdings on account of subscribed capital Receivables from investment securities traded Accrued income on loans Net loans outstanding Assets under retirement benefit plans Premises and equipment (net) Miscellaneous All Securities Sold under Repurchase Agreements, Securities Lent under Securities Lending Agreements, and Payable for Cash Collateral Received Investments Client Operations Borrowings Other Payable to Maintain Value of Currency Holdings on Account of Subscribed Capital Payable for investment securities purchased Accrued charges on borrowings Liabilities under retirement benefit plans Accounts payable and misc liabilities Paid-in capital Deferred Amounts to Maintain Value of Currency Holdings Retained Earnings Accumulated Other Comprehensive Loss
I stopped.
I am not a bank accountant, so I must open the accounting rulebook and
read it cover to cover before I try to read banks’ financial statements.
Let’s consider a few facts. Accumulated other comprehensive loss is an
equity account that accumulates unrealized gains and losses. What they
might be is in the financial statements. We have just one page.
The subcategories make matters worse.
⍞←⎕tc[3] utl∆join (db[;6]=2010)/db[;4] Due from Banks Due from Banks Investments Securities Nonnegotiable Derivative Assets Derivative Assets Derivative Assets Derivative Assets Receivables Other Receivables Other Receivables Loans Outstanding Other Assets Other Assets Other Assets Borrowings Sold or Lent Derivative Liabilities Derivative Liabilities Derivative Liabilities Derivative Liabilities Other Other Liabilities Other Liabilities Other Liabilities Other Liabilities Capital Stock Deferred Amounts Retained Earnings Other
When the Financial Accounting Standards Board issued guidance on
derivatives, I passed. I could do better in Atlantic City than with
derivatives, and so how to account for them was irrelevant. It certainly
seemed probable that these liabilities give rise to some of the
accumulated other comprehensive losses.
So after I complete my study of the accounting rules, I need to digest
the footnotes to the financial statements to get some understanding of
four different kinds of derivatives.
I have yet to extract an analysis of the facts I have. And that’s why
I’m not writing about OLAP…yet.
This post is a follow up to *Crunching Numbers in APL* available
for kindle at Amazon.All of the examples presented here used GNU APL.