column oriented databases & formats
April 30, 2015
Column oriented databases have become a staple in the big data analytics scene. This article will analyze the in-memory and on-disk formats of kdb+.
kdb+ is a high performance timeseries database implemented in the K language and widely deployed within the financial sector. Q is a wrapper around K to provide an easier interface to the database. kdb+ is implemented as a column store which makes operations like aggregations extremely fast.
###Architecture While kdb+ is both an in-memory and a persistent disk based column oriented data store, the in-memory representation and the on-disk file formats are quite simple. Tables in kdb+ are essentially ordered dictionaries of typed columns sorted on one or more columns. In python an in-memory table of stock symbols, dates, and their closing prices could be implemented as follows:
from collections import OrderedDict
daily = OrderedDict(
[
('symbol', [1, 1, 1, 1, 2, 2, 2, 2]),
('date', [0, 0, 0, 0, 0, 0, 0, 0]),
('prices', [90.01, 91.02, 92.03, 93.04, 60.01, 61.02, 62.02, 63.03])
])
kdb+ makes tradeoffs in order to improve performance, two of which I have implemented in the table above and will explain below.
Strings
Strings stored in tables are symbolized and mapped to integers in a bidirectional dictionary. This tradeoff enables very fast selects on columns like a stock symbol as we only have to seek for a integer (8bit to 64bit depending on the size of the symbol dictionary). For our table the symbol map could be implemented as:
symbols = {
1 : 'GOOG',
2 : 'YHOO'
}
symbols2 = {
'GOOG' : 1,
'YHOO' : 2
}
Dates
Dates in kdb+ are also stored as integers and calculated as the distance from the epoch 01-01-2000. So, in our example implementation, 0 represents 01-01-2000.
Disk File Format
kdb+ can export tables in multiple formats. First, the in-memory table can be written to a binary file in row format:
uint8,uint32,float
1,0,90.01
1,0,91.02
1,0,92.03
1,0,93.04
2,0,60.01
2,0,61.02
2,0,62.03
2,0,63.03
Secondly, the table can be splayed and each column is written to its own binary file:
symbols.col (uint8): [1, 1, 1, 1, 2, 2, 2, 2]
dates.col (uint32): [0, 0, 0, 0, 0, 0, 0, 0]
prices.col (float): [90.01, 91.02, 92.03, 93.04, 60.01, 61.02, 62.02, 63.03]
Thirdly, the table can be partitioned by a column value and then splayed:
symbol_1 directory:
dates.col (uint32): [0, 0, 0, 0]
prices.col (float): [90.01, 91.02, 92.03, 93.04]
symbol_2 directory:
dates.col (uint32): [1, 1, 1, 1]
prices.col (float): [60.01, 61.02, 62.02, 63.03]
Querying
When kdb+ loads a table, it memory maps (mmap on linux) either the single table file, each column for a splayed table, or all columns for a partitioned table. Every splayed column is mapped as a continuous array of a data type and accessed accordingly. kdb+ also loads the symbol dictionary for symbol lookups. In our example, the table is sorted on the symbol and date columns.
Select price WHERE symbol = ‘GOOG’
Symbol is a symbol column, so GOOG is looked up and mapped to 1
In the single table file layout, we perform a binary search on the array of structures and find all rows with a symbol value of 1 and return them
In the splayed table layout, we perform a binary search and record the indices of the symbol array where symbol = 1. Then we index into the memory mapped column (array) for price and use our recorded indices to select the matching entries. i.e. price[0:3].
In the partitioned table, we perform the same steps as in the splayed table layout within the symbol_1 directory.
Splaying a table enables us to seek through less data, particularly when dealing with wide row data sets, in order to find the records we want to query. As data size increases, partitioning tables on a column like symbol or date further reduces I/O overhead. Another benefit of single type columns is that aggregations such as calculating the mean or sum become single vector operations.
Concluding Remarks
kdb+ supports many more features, including indexing and compression, but the basic architecture is as described. The primary contributing factors to its performance are in its column oriented structure and data representation format. In a real world deployment, real-time data is typically appended to an in-memory table and then archived to a splayed/partitioned table sorted on one or multiple columns. Often, the same table is stored in varying sorted orders to further improve the performance of specific queries. Similar to big data database systems like Cassandra, tables are modeled to fit commonly executed queries.