OLAP

OLAP stands for OnLine Analytical Processing. This post describes why
I’m not ready to write about OLAP.

I’ve been reading various web pages about OLAP and have reached two
conclusions. First, the demand for OLAP is driven by SQL commands’
complexity, which can arise when the programmer is querying complex
databases. Nontechnical users stumble badly in that environment, and
even correct queries can take too long to execute.

Second, the processing is done on a new non-SQL database designed to
make querying easier and processing time faster. Generally, this means
an underlying SQL database is kept to record transactions, and an OLAP
database is updated periodically from the SQL database. The OLAP
database usually is an array of facts determined by the SQL queries.
Each dimension of the array is a fact attribute for which aggregate data
may be sought.

I’ve been struggling to a third conclusion, that I can reach a better
understanding of the data by investigating why and how the data was
compiled than by constructing complicated SQL queries.

I found an open-source OLAP application, Cubes, written in
python([[http://cubes.databrewery.org/]]). I downloaded the application
and started on the tutorial. Step one, called Hello World, constructs a
data cube from balance sheets of the International Bank for
Reconstruction and Development for the years 2010 and 2011.

I was planning to code an OLAP database in APL, so rather than following
the tutorial, I just loaded the supplied csv file into APL.

      ibrd←import∆file 'Downloads/IBRD_Balance_Sheet__FY2010.csv'
      ⍴ibrd
63 7

That didn’t look like a lot of data, so I displayed some of it:

      ibrd[⍳5;]
 Category Code Category Subcategory Code Subcategory    Line Item                         Fiscal Year Amount (US$, Millions) 
 a             Assets   dfb              Due from Banks Unrestricted currencies                  2010                   1581 
 a             Assets   dfb              Due from Banks Unrestricted currencies                  2009                   2380 
 a             Assets   dfb              Due from Banks Currencies subject to restriction        2010                    222 
 a             Assets   dfb              Due from Banks Currencies subject to restriction        2009                    664 

My seven columns:

1. Category code
2. Category
3. Subcategory code
4. Subcategory
5. Description (called line item above)
6. Fiscal year
7. Amount (in $US millions)

I concluded that our cube should have two dimensions, description and
year. Each fact (cell in the array) should be made up of a description
and an amount. The category Descriptions is a hierarchy of category,
subcategory and item.

With that in mind I dived off the cliff and started writing APL queries.
Question one always is whether the debits equal the credits, or in this
case whether total assets equal total liabilities plus total equities.

I needed to confirm that both years and amounts were in fact numbers:

      utl∆numberp ¨ 1 0↓ibrd[⍳10;6 7]
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1

Yes.
I knew I’d get tired of the column heads, so I copied the array without
them.

      db←1 0 ↓ibrd

I also determined the universe of category codes to simplify my next
query.

      db[;1]
 a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a l l l l l l l l l l l l l l l l l l l l l l e e e e e e e e 

I concluded that *a* means assets, *l* means liabilities, and *e* means
equity. My queries:

      +/(∊db[;1]='a')/db[;7]
558430
      +/(∊db[;1]='l')/db[;7]
480838
      +/(∊db[;1]='e')/db[;7]
77592
      480838 + 77592
558430

I started planning my next query and was curious: what descriptions
describe each fact?

      ⍞←⎕tc[3] utl∆join (db[;6]=2010)/db[;5]
Unrestricted currencies
Currencies subject to restriction
Trading
Securities purchased under resale agreements
Nonnegotiable, nonintrest-bearing demand obligations on account of subscribed capital
Investments
Client operations
Borrowings
Other
Receivables to maintain value of currency holdings on account of subscribed capital
Receivables from investment securities traded
Accrued income on loans
Net loans outstanding
Assets under retirement benefit plans
Premises and equipment (net)
Miscellaneous
All
Securities Sold under Repurchase Agreements, Securities Lent under Securities Lending Agreements, and Payable for Cash Collateral Received
Investments
Client Operations
Borrowings
Other
Payable to Maintain Value of Currency Holdings on Account of Subscribed Capital
Payable for investment securities purchased
Accrued charges on borrowings
Liabilities under retirement benefit plans
Accounts payable and misc liabilities
Paid-in capital
Deferred Amounts to Maintain Value of Currency Holdings
Retained Earnings
Accumulated Other Comprehensive Loss

I stopped.

I am not a bank accountant, so I must open the accounting rulebook and
read it cover to cover before I try to read banks’ financial statements.

Let’s consider a few facts. Accumulated other comprehensive loss is an
equity account that accumulates unrealized gains and losses. What they
might be is in the financial statements. We have just one page.

The subcategories make matters worse.

       ⍞←⎕tc[3] utl∆join (db[;6]=2010)/db[;4]
Due from Banks
Due from Banks
Investments
Securities
Nonnegotiable
Derivative Assets
Derivative Assets
Derivative Assets
Derivative Assets
Receivables
Other Receivables
Other Receivables
Loans Outstanding
Other Assets
Other Assets
Other Assets
Borrowings
Sold or Lent
Derivative Liabilities
Derivative Liabilities
Derivative Liabilities
Derivative Liabilities
Other
Other Liabilities
Other Liabilities
Other Liabilities
Other Liabilities
Capital Stock
Deferred Amounts
Retained Earnings
Other

When the Financial Accounting Standards Board issued guidance on
derivatives, I passed. I could do better in Atlantic City than with
derivatives, and so how to account for them was irrelevant. It certainly
seemed probable that these liabilities give rise to some of the
accumulated other comprehensive losses.

So after I complete my study of the accounting rules, I need to digest
the footnotes to the financial statements to get some understanding of
four different kinds of derivatives.

I have yet to extract an analysis of the facts I have. And that’s why
I’m not writing about OLAP…yet.

This post is a follow up to *Crunching Numbers in APL* available
for kindle at Amazon.

All of the examples presented here used GNU APL.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *