Intel’s Haswell Architecture on SAP’s HANA

The development of Intel-based hardware and the adaption by SAP’s HANA continue to improve the applicability of the in-memory technology. The cooperation between the two companies in research and development produces stunning results. Instead of just looking at the raw specifications of the Haswell CPU and it’s application in database servers, I want explain why some of the new features are so important in the HANA case.

First, the overall workload capacity and the performance of HANA depends on the number of cores per socket, the number of sockets per server chassis, the QPI bus speed and the available memory. With Haswell, we can now have a maximum of 144 cores (8×18) on one chassis with up to 12 terabytes of memory. The QPI bus speed, the L1/L2/L3 cache sizes and the very important scan speed of HANA attribute vectors increased by 20% and that has a linear effect on the overall performance of HANA. In several previous blogs we discussed the penalty for spreading data across multiple chassis. This is particularly true for the actual data partitions in an OLTP system. With the new Haswell-based servers we will be able to run all actual data partitions of 99% of all SAP S/4HANA customers on one chassis. The more and more successful compression of data and the intelligent split into actual and historical data, based on business rules, allows for this. A redundant replica of the actual data partition serves as a fast failover system and helps on the other hand to nearly double the workload capacity of the whole system. This is important since I anticipate that more analytical applications (read only) will move to the OLTP system and new applications will anyway have a much more analytical component, increasing the read only workload. The historical data is defined as data which isn’t subject to change any more and new inserts occur only periodically (quarterly, yearly). The database requirements are therefore drastically simplified (no inserts, no updates, no delta management, no back up, massively parallel map & reduce) and a scale out architecture, using cheaper servers, is appropriate.

Second, in SP10 of HANA some of the new features of Haswell are being exploited intensively. As everybody knows one of the great aspects of HANA is that it was designed for in-memory only, using primarily a columnar store with dictionary compression. The attributes of a table are stored as integers with a high compression factor in so-called attribute vectors. HANA still uses a primary key concept, but most data access has shifted from direct access to set operations (accessing a number of rows of a table) and the rows needed are identified via attribute vector scan operations. This is extremely fast (compression, integer operation), completely flexible (every attribute can be used as an index) and there is no need to get a DBA (data base administrator) involved. With the new vector operations in Haswell (AVX2) these scan operations improve significantly (on average close to 50%). Another optimization is the NUMA-aware data distribution and execution. Remember, the fastest access to data still happens when the data is in the memory of the executing CPU. HANA achieves dramatic throughput improvements especially for systems with 8 sockets (greater than 100%).

Third, in S/4HANA we don’t maintain aggregates (totals) any more but calculate them on request on the fly. The “old” predefined aggregates are not so popular any more and the flexibility to aggregate data freely along multiple hierarchies is more important. I know this is against the common practice of the last 50 years in enterprise systems, but it simplifies everything dramatically. The early adopters of S/4HANA all verify this. Since there are no updates of totals anymore, there is no need for database locks (read data for update) and the data entry transactions can now run in parallel with a huge impact on high-volume systems like physical warehouses, order entry systems etc. All what remains are the database inserts, but here HANA has to use an internal lock, when multiple inserts for the same table occur in parallel. Haswell offers now a hardware feature for synchronization (TSX) with which parallel inserts improve up to 5x. I cannot emphasize enough what that means for high-volume transactional systems – it’s a dream.

Internal benchmarks are often of limited value to customers and their real world systems, but when you see a 6x improvement for OLTP processing from an Ivy Bridge system (4 sockets) with HANA SP08 to a Haswell system (4 sockets) with HANA SP10, you can only congratulate the engineers of both companies on the work they have done. I can’t wait to see these systems in production at our customer sites or in the cloud. If there was any question about the viability of in-memory database systems, here is the answer. By the way, the pricing looks very attractive, but I have to leave this to the market.

Source: Hasso Plattner Blog


Tags: HANA , Processor, architecture, CISC, RISC, x86, HASWELL
Share this article :

Post a Comment

 
Copyright © 2011. SAP HANA TUTORIALS FREE - S/4 HANA - All Rights Reserved