OLAP

OLAP

OLAP stands for OnLine Analytical Processing. This post describes why I'm not ready to write about OLAP.

I've been reading various web pages about OLAP and have reached two conclusions. First, the demand for OLAP is driven by SQL commands' complexity, which can arise when the programmer is querying complex databases. Nontechnical users stumble badly in that environment, and even correct queries can take too long to execute.

Second, the processing is done on a new non-SQL database designed to make querying easier and processing time faster. Generally, this means an underlying SQL database is kept to record transactions, and an OLAP database is updated periodically from the SQL database. The OLAP database usually is an array of facts determined by the SQL queries. Each dimension of the array is a fact attribute for which aggregate data may be sought.

I've been struggling to a third conclusion, that I can reach a better understanding of the data by investigating why and how the data was compiled than by constructing complicated SQL queries.

I found an open-source OLAP application, Cubes, written in python(http://cubes.databrewery.org/). I downloaded the application and started on the tutorial. Step one, called Hello World, constructs a data cube from balance sheets of the International Bank for Reconstruction and Development for the years 2010 and 2011.

I was planning to code an OLAP database in APL, so rather than following the tutorial, I just loaded the supplied csv file into APL.

        ibrd←import∆file 'Downloads/IBRD_Balance_Sheet__FY2010.csv'
      ⍴ibrd
63 7

That didn't look like a lot of data, so I displayed some of it:

     ibrd[;]
 Category Code Category Subcategory Code Subcategory    Line Item                         Fiscal Year Amount (US$, Millions)
 a             Assets   dfb              Due from Banks Unrestricted currencies                  2010                   1581
 a             Assets   dfb              Due from Banks Unrestricted currencies                  2009                   2380
 a             Assets   dfb              Due from Banks Currencies subject to restriction        2010                    222
 a             Assets   dfb              Due from Banks Currencies
     subject to restriction        2009                    664

My seven columns: 1. Category code 2. Category 3. Subcategory code 4. Subcategory 5. Description (called line item above) 6. Fiscal year 7. Amount (in $US millions)

I concluded that our cube should have two dimensions, description and year. Each fact (cell in the array) should be made up of a description and an amount. The category Descriptions is a hierarchy of category, subcategory and item.

With that in mind I dived off the cliff and started writing APL queries. Question one always is whether the debits equal the credits, or in this case whether total assets equal total liabilities plus total equities.

I needed to confirm that both years and amounts were in fact numbers:

      utl∆numberp ¨ 1 0↓ibrd[⍳10;6 7]
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1

Yes. I knew I'd get tired of the column heads, so I copied the array without them.

     ddb←1 0 ↓ibrd

I concluded that a means assets, l means liabilities, and e means equity. My queries:

       +/(∊db[;1]='a')/db[;7]
558430
      +/(∊db[;1]='l')/db[;7]
480838
      +/(∊db[;1]='e')/db[;7]
77592
      480838 + 77592
558430

I started planning my next query and was curious: what descriptions describe each fact?

      ⍞←⎕tc[3] utl∆join (db[;6]=2010)/db[;5]
Unrestricted currencies
Currencies subject to restriction
Trading
Securities purchased under resale agreements
Nonnegotiable, nonintrest-bearing demand obligations on account of subscribed capital
Investments
Client operations
Borrowings
Other
Receivables to maintain value of currency holdings on account of subscribed capital
Receivables from investment securities traded
Accrued income on loans
Net loans outstanding
Assets under retirement benefit plans
Premises and equipment (net)
Miscellaneous
All
Securities Sold under Repurchase Agreements, Securities Lent under Securities Lending Agreements, and Payable for Cash Collateral Received
Investments
Client Operations
Borrowings
Other
Payable to Maintain Value of Currency Holdings on Account of Subscribed Capital
Payable for investment securities purchased
Accrued charges on borrowings
Liabilities under retirement benefit plans
Accounts payable and misc liabilities
Paid-in capital
Deferred Amounts to Maintain Value of Currency Holdings
Retained Earnings
Accumulated Other Comprehensive Loss

I stopped. I am not a bank accountant, so I must open the accounting rulebook and read it cover to cover before I try to read banks' financial statements.

Let's consider a few facts. Accumulated other comprehensive loss is an equity account that accumulates unrealized gains and losses. What they might be is in the financial statements. We have just one page.

The subcategories make matters worse.

       ⍞←⎕tc[3] utl∆join (db[;6]=2010)/db[;4]
Due from Banks
Due from Banks
Investments
Securities
Nonnegotiable
Derivative Assets
Derivative Assets
Derivative Assets
Derivative Assets
Receivables
Other Receivables
Other Receivables
Loans Outstanding
Other Assets
Other Assets
Other Assets
Borrowings
Sold or Lent
Derivative Liabilities
Derivative Liabilities
Derivative Liabilities
Derivative Liabilities
Other
Other Liabilities
Other Liabilities
Other Liabilities
Other Liabilities
Capital Stock
Deferred Amounts
Retained Earnings
Other

When the Financial Accounting Standards Board issued guidance on derivatives, I passed. I could do better in Atlantic City than with derivatives, and so how to account for them was irrelevant. It certainly seemed probable that these liabilities give rise to some of the accumulated other comprehensive losses.

So after I complete my study of the accounting rules, I need to digest the footnotes to the financial statements to get some understanding of four different kinds of derivatives.

I have yet to extract an analysis of the facts I have. And that's why I'm not writing about OLAP…

yet.