Hadoop Technology Stack
Image may be NSFW.
Clik here to view.
Hadoop Core / Common Project
- Distributed Storage : HDFS
- Distributed Processing : MapReduce (MR1)
- Distributed Scheduling : YARN (MR2) (its started in Hadoop v2)
How data can be accessed and processed from Hadoop FrameWork without writing Map Reduce Job
- PIG : http://pig.apache.org/
- Hive : http://hive.apache.org/
How to Process Data Storage or DB in Hadoop
- HBase : http://hbase.apache.org/
- Cassandra : http://cassandra.apache.org/
Storage Management Services
- HCatalog : http://incubator.apache.org/projects/hcatalog.html
RegEx and Search Tool
- Lucene : http://lucene.apache.org/
Bulk Synchronous Parallel computing engine
- Hama : http://hama.apache.org/
Managing MapReduce Pipelining
- Crunch : http://crunch.apache.org/
Data Serialization to send data to another application in some format like JSON, XML
- Avro : http://avro.apache.org/
- Thrift : http://thrift.apache.org/
Data Intelligence
- Drill : https://incubator.apache.org/drill/drill_overview.html
- Mahout : http://mahout.apache.org/
Real Time Log Processing Tool
- Flume : http://flume.apache.org/
- Chukwa : http://chukwa.apache.org/
Data Integration to connect RDBMS to HDFS
- Sqoop : http://sqoop.apache.org/
Distributed Service Coordinator
- Zookeeper : http://zookeeper.apache.org/
Work Flow or Job Scheduler
- Oozie : http://oozie.apache.org/
Centralized Service Management, monitoring and Orchestration
- Ambari : http://ambari.apache.org/
Centralized Security of Hadoop Project
- Knox : http://knox.apache.org/
Eclipse IDE plugin for Development
- HDT : http://hdt.incubator.apache.org/
Project that is 100x Times faster than MapReduce
- Spark : http://spark.apache.org/
To get the list of ALL apache Incubator project : http://incubator.apache.org/projects/