Starting with Apache Hadoop In Hadoop, a single master is managing many slaves The master node consists of a JobTracker , Tasktracker , NameNode , and DataNode . A slave or worker node acts as both DataNode and TaskTracker though it is possible to have data-only worker node, and compute-only workerNodes. NameNode holds the file system metadata. The files are broken up and spread over the DataNode and JobTracker schedules and the manager's job. The TaskTracker executes the individual map and reduced function. If a machine fails, Hadoop continues to operate the cluster by shifting work to the remaining machines. The input file, which resides on a distributed file system throughout the cluster, is split into even-sized chunks replicated for fault tolerance. Haddopp divides each map to reduce jobs into a set of tasks. Each chunk of input is processed by a map task, which outputs a list of key-value pairs. In Hadoop, the shuffle phase o...
Comments
Post a Comment