What are Advantages of Columnar Tables in HANA ?

Advantages of Columnar Tables: 

HANA Columnar Tables advantage



Advantage: Higher Data Compression Rates
The goal of keeping all relevant data in main memory can be achieved with less cost if data compression is used. Columnar data storage allows highly efficient compression. Especially if a column is
sorted, there will normally be several contiguous values place adjacent to each other in memory. In
this case compression methods, such as run-length encoding, cluster coding or dictionary coding can
be used. This is especially promising for business applications as many of the table columns contain
only a few distinct values compared to the number of rows. Extreme examples are codes such as
country codes or ZIP codes, other valid examples include customer and account numbers, region
codes, sales channel codes, or status codes. Many of the latter are used as foreign keys in other
tables in the database, e. g. tables containing data for customer orders and accounting records. This
high degree of redundancy within a column allows for effective compression of the data. In rowbased storage, successive memory locations contain data of different columns, so compression
methods such as run-length encoding cannot be used. In column stores a compression factor of 5 can
typically be achieved compared to traditional row-oriented storage systems.

Compression techniques discussed briefly below:
Run-length encoding. If data in a column is sorted there is a high probability that two or more
elements contain the same values. Run-length encoding counts the number of consecutive column
elements with the same values. To achieve this, the original column is replaced with a two-column
list. The first column contains the values as they appear in the original table column and the second
column contains the counts of consecutive occurrences of the respective value. From this information the original column can easily be reconstructed.

Cluster encoding. This compression technique works by searching for multiple occurrences of the
same sequence of values within the original column. The compressed column consists of a two-column list with the first column containing the elements of a particular sequence and the second
column containing the row numbers where the sequence starts in the original column. Many popular
data compression programs use this technique to compress text files.

Dictionary encoding. Table columns which contain only a comparably small number of distinct values
can be effectively be compressed by enumerating the distinct values and storing only their numbers.
This technique requires that an additional table, the dictionary, is maintained which in the first column contains the original values and in the second one the numbers representing the values. This
technique leads to high compression rates, is very common, e. g. in country codes or customer numbers, but is seldom regarded as a compression technique.

Advantage: Higher Performance for Column Operations
With columnar data organization operations on single columns, such as searching or aggregations,
can be implemented as loops over an array stored in contiguous memory locations. Such an operation has high spatial locality and can efficiently be executed in the CPU cache. With row-oriented storage, the same operation would be much slower because data of the same column is distributed
across memory and the CPU is slowed down by cache misses.
Assume that we want to aggregate the sum of all sales amounts in the example in figure 2 using a
row-based table. Data transfer from main memory into CPU cache happens always in blocks of fixed
size called "cache lines" (for example 64 bytes). With row-based data organization it may happen
that each cache line contains only one "sales" value (stored in 4 bytes) while the remaining bytes are
used for the other fields of the data record. For each value needed for the aggregation a new access
to main memory would be required. This demonstrates that with row-based data organization the
operation will be slowed by cache misses that cause the CPU to wait until the required data is available. With column-based storage, all sales values are stored in contiguous memory, so the cache line
would contain 16 values that are all needed in the calculation of the sum. In addition, the fact that
columns are stored in contiguous memory allows memory controllers to pre-fetch, which further
minimizes the number of cache misses


Advantage: Elimination of Additional Indexes
Columnar storage, in many cases, eliminates the need for additional index structures. Storing data in
columns is functionally similar to having a built-in index for each column. The column scanning speed
of the in-memory column store and the compression mechanisms – especially dictionary compression – allow read operations with very high performance. In many cases it will not be required to
have additional indexes. Eliminating additional indexes reduces complexity and eliminates effort for
defining and maintaining metadata.

Advantage: Parallelization
Column-based storage also makes it easy to execute operations in parallel using multiple processor
cores. In a column store data is already vertically partitioned. This means that operations on different
columns can easily be processed in parallel. If multiple columns need to be searched or aggregated,
each of these operations can be assigned to a different processor core. In addition operations on one
column can be parallelized by partitioning the column into multiple sections that can be processed by
different processor cores.


Advantage: Elimination of Materialized Aggregates
Traditional business applications use materialized aggregates to increase read performance. This
means that the application developers define additional tables in which the application redundantly
stores the results of aggregates (for example sums) computed on other tables. The materialized
aggregates are computed and stored either after each write operation on the aggregated data, or at
scheduled times. Read operations read the materialized aggregates instead of computing them each
time they are required.
With a scanning speed of several gigabytes per millisecond, in-memory column stores make it
possible to calculate aggregates on large amounts of data on the fly with high performance. This is
expected to eliminate the need for materialized aggregates in many cases.




Tags: Advantages of Columnar Tables in HANA,  Advantage: Elimination of Materialized Aggregates, HANA Advantage: Parallelization, HANA Elimination of Additional Indexes, HANA Higher Performance for Column Operations, HANA Advantage: Higher Data Compression Rates,
SAP HANA Run-length encoding, SAP HANA Cluster encoding, Dictionary encoding. Table columns,


Share this article :

Post a Comment

 
Copyright © 2011. SAP HANA TUTORIALS FREE - S/4 HANA - All Rights Reserved