Avatar

Guest Blogger: Manan Trivedi, Big Data Solution Architect

Apache Hadoop started as a framework to store and process very large data sets in a distributed manner in a cost-effective way. Initially, this distributed data processing was limited to batch processing only. More recently, tools have been developed that extend the power of Hadoop big data processing directly into the realm of decision support systems or for a Data Warehouse. Decision support systems (DSS) are the beating heart of an efficient organization. In recent years, the explosion of data has strained traditional decision support systems with enterprises looking beyond traditional data warehouse for their needs.

We are pleased to announce the first ever fully audited result of the complete TPC-DS benchmark on Hadoop with Cisco UCS. This is the first time we have a fully audited result of the complete TPC-DS benchmark on big data with Hadoop. The benchmark results have been published and are available on the TPC website.

Recognizing the crucial role decision support systems play in today’s organizations, the Transaction Processing Performance Council (TPC) developed the TPC-DS benchmark to provide the industry with relevant, objective, verifiable and vendor-neutral performance data.

TPC-DS has been extensively referenced, more than 900 times. Hundreds of publications have been written by academia and industry on various aspects of the workloads leading to innovation, better performance and lower price per performance systems. Also, some vendors have published, only in blogs, cherry-picking performance measures of a small subset of the full TPC-DS benchmark, or with many modifications to the queries. There has been no fully audited TPC-DS benchmark published so far.

The TPC-DS benchmark models key aspects of a decision support system, using 99 queries and well-defined data maintenance functions. The benchmark provides a representative evaluation of performance as a general-purpose decision support system, including: examining large volumes of data, answering real-world business questions via ad-hoc reporting, online analytical processing and data-mining, and database maintenance functions that synchronize with transaction processing systems.

The benchmark result measures query response time in single user mode, query throughput in multi user mode and data maintenance performance for a specified hardware, operating system, and data processing system configuration.

The benchmark was run on Cisco UCS Integrated Infrastructure for Big Data and Analytics, and Transwarp Data Hub v5.1. It achieves, at a 10,000 GB scale factor, a composite query per hour of 1,580,649 QphDS and a price/performance of $0.64 USD / QphDS.

The Transwarp Data Hub (TDH) is a full suite of Hadoop distribution components, including a supplemental SQL engine (Inceptor), machine learning & deep learning components, a NoSQL search engine and stream processing.

For more information about Cisco UCS Integrated Infrastructure for Big Data and Analytics, visit: http://www.cisco.com/go/bigdata.

Information about the Transwarp Data Hub can be found here: http://transwarp.cn/?lang=en. More information about the TPC is available at www.tpc.org