In our previous discussions, we have compared Apache Cassandra with Apache Hbase and Apache Cassandra with MongoDB. In one of those articles, we already mentioned that Apache Cassandra is more appropriate to compare with Apache Phoenix than Apache HBase. Also, it will make understand why commonly HBase is compared with Apache Cassandra, not Apache Phoenix.
Apache Phoenix is a relational database engine which supports OLTP for Hadoop. It uses Apache HBase as the store and provides a JDBC driver. Apache Phoenix helps to compile the queries into native NoSQL store APIs. Apache Phoenix and HBase may be operated as one system. For that reason, our Apache Phoenix installation guide bears the phrase “SQL on HBase”. HBase queries are written in a custom language that needs to be learned. Apache Phoenix helps to avoid that part.
We already know that Apache Cassandra is a distributed, NoSQL database management system which is designed to handle very large amounts of data providing high availability with no single point of failure.
Apache Phoenix is easy to get started. Apache Phoenix is not a bad choice for low-latency, medium size tables of around 1M size with 100M rows. But tables with many columns can not be warranted around the processing. Not all of the SQL on Hadoop systems designed targeting the OLTP workloads. Most of them are optimized for aggregations, analytical workloads with a SQL interface. But, there is no data ingestion. Apache Phoenix is focused on aggregations and analytical workloads with underlying HBase for an OLTP workload. Hortonworks has tests showing HBase consistently outperform Cassandra on read-heavy workloads. Apache Phoenix with medium size tables will outperform Cassandra. Apache Phoenix bridges the gap between SQL and NoSQL, and it is optimized to efficiently interact with HBase. HBase originated mainly from Bigtable. Cassandra was derived from Bigtable and Amazon’s Dynamo.
Phoenix and HBase utilize Hadoop infrastructure. Cassandra uses different infrastructure and operation than Hadoop. However, Cassandra can be used along with Hadoop. Cassandra supports ordered partitioning leading to make row size up to 10M. Cassandra is limited range based row scan support, coprocessor-like functionality.