Project Impala is a parallel real-time query engine that can run atop the raw Hadoop Distributed File System (HDFS) or the HBase tabular overlay for HDFS that makes it look somewhat like a relational database.
Impala does not work through Hadoop MapReduce. Impala uses a SQL-like syntax and allows you to query data, whether stored in HDFS or Apache HBase – including SELECT, JOIN, and aggregate functions – in real time.
Here is a list of articles and opinions on what this will mean for Hadoop users:
- Cloudera makes SQL a first-class citizen in Hadoop
- Quick notes on Impala
- Cloudera’s Impala brings Hadoop to SQL and BI
- Big Data in More Hands
- Cloudera Debuts Real-Time Hadoop Query
- Cloudera Brings Real-Time, SQL-like Experience to Hadoop
- Cloudera Announces Game-Changing, Real-Time Query on Hadoop and Leads a New Era of Data Management
- Cloudera Impala: Real-Time Queries in Apache Hadoop, For Real