Hadoop Eco System Hyperlinks



Logo:


Name:


Project Type:


Description:


More Info:


Apache HDFS


Storage


Distributed File System


http://hadoop.apache.org/hdfs/


Apache MapReduce


Framework


Framework for parallel processing


http://hadoop.apache.org/mapreduce


Apache Hbase


Storage


Column oriented database, real-time querying


http://hbase.apache.org/


Apache Hive


Analytics


High-latency data warehousing analytics


http://hive.apache.org/


Apache Pig


Analytics


High level Hadoop programming language


http://pig.apache.org/


Apache Mahout


Analytics


Machine learning for clustering, classification, recommendation, and itemset mining


http://mahout.apache.org/


Apache Zookeeper


Management


Configuration information, naming, providing distributed synchronization, and providing group services.


http://hadoop.apache.org/zookeeper/


Ganglia


Monitoring


Scalable distributed monitoring system for high-performance computing systems


http://ganglia.sourceforge.net/


Nagios


Monitoring


Monitors applications, services, and business processes


http://www.nagios.org/about/


Apache Hadoop Streaming


Analytics


Create and run MapReduce jobs with any executable or script as the mapper and/or the reducer.


http://hadoop.apache.org/common/docs/r0.15.2/streaming.html


Cascading


Storage


Java Query API, Query Planner, and Process Scheduler for Hadoop


http://www.cascading.org/


HadoopDB


Development Environment


Hybrid DBMS and MapReduce technology


http://hadoopdb.sourceforge.net/guide/


Karmasphere


User interface


Development environment


http://www.karmasphere.com/Products-Information/karmasphere-studio.html


HUE


Deployment


Web UI for Hadoop


http://www.cloudera.com/blog/2010/07/whats-new-in-cdh3b2-hue/


Puppet


Configuration


Deployment and configuration management


http://docs.puppetlabs.com/guides/introduction.html


Chef


Management


Configuration management


http://wiki.opscode.com/display/chef/Home


Flume


Data import


Load data into HDFS from a variety of sources


http://www.cloudera.com/blog/2010/07/whats-new-in-cdh3b2-flume/


HOP


Realtime processing


Online aggregation (approximate answers as a job runs), and stream processing (MapReduce jobs that run continuously, processing new data as it arrives)


http://code.google.com/p/hop/


sqoop


Database integration


A database import/export tool for Hadoop


https://github.com/cloudera/sqoop


Ora-Oop


Database integration


Connector between Oracle and Apache Hadoop


http://www.quest.com/Ora-Oop/