Hive and HBase are both great additions to the Hadoop ecosystem. While Hive provides a SQL-like interface for Hadoop, HBase acts as a NoSQL layer for HDFS. Both are quite different but can work well together with Hadoop. In this article, we discuss the key differences between Hive and HBase. We'll look at what makes them different as well as when to use which and why.
Apache Hive is a data warehouse built on top of Hadoop. It allows you to easily run MapReduce jobs on a Hadoop cluster while using a SQL-like syntax.
A data warehouse is a system for reporting and data analysis. While implementations of a data warehouse may vary, it typically serves as a central repo of data fed by multiple sources.
Hive provides it's own query language (HQL) for running MapReduce jobs on a Hadoop cluster. It adds a relational schema to HDFS so you can run traditionally complicated MapReduce jobs with more familiar SQL-like queries.
When you run a Hive query, it runs batch processing on Hadoop to aggregate data. While it doesn't support updates, Hive takes an RDBMS approach to reads and writes on HDFS.
Hive is great for SQL savvy developers who want to run MapReduce jobs without knowing how to implement MapReduce. This could include data analysts or anyone that is familiar with SQL or RDBMS. Remember that the key point of Hive is to provide a SQL-like abstraction for running MapReduce jobs on Hadoop. This makes it good for analytical queries (OLAP).
HBase is a non-relational, distributed database that runs on top of HDFS. It brings the benefits of NoSQL to Hadoop. For more on NoSql, see choosing the right database.
Through it's NoSQL key/value store, HBase is great for real time querying on big data. This makes it perfect for lighting-fast reads and writes on live data streams and provides a lot of transactional support to HDFS.
Use HBase for real-time queries and fast lookups. HBase is perfect for quickly storing and processing data on top of a static HDFS data store.
Remember that HBase is a database and Hive is a database engine. Comparing the two is apples and oranges.
Despite their differences, Hive and Hbase actually work well together. For example, you can run Hive queries on top of HBase. This couples the convenience of a SQL-like syntax with the benefits of a non-relational data store for HDFS.
It is a mistake to think that Hive and HBase compete within the Hadoop ecosystem. While Hive improves the analytical side of HDFS, HBase improves transactions in a real-time environment. For these reasons, it's recommended that both are used together to enhance Hadoop.