Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2 (Addison-Wesley Data & Analytics)

Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2 (Addison-Wesley Data & Analytics)

Arun Murthy, Vinod Vavilapalli

Language: English

Pages: 400

ISBN: 0321934504

Format: PDF / Kindle (mobi) / ePub

“This book is a critically needed resource for the newly released Apache Hadoop 2.0, highlighting YARN as the significant breakthrough that broadens Hadoop beyond the MapReduce paradigm.”
—From the Foreword by Raymie Stata, CEO of Altiscale

The Insider’s Guide to Building Distributed, Big Data Applications with Apache Hadoop™ YARN


Apache Hadoop is helping drive the Big Data revolution. Now, its data processing has been completely overhauled: Apache Hadoop YARN provides resource management at data center scale and easier ways to create distributed applications that process petabytes of data. And now in Apache Hadoop™ YARN, two Hadoop technical leaders show you how to develop new applications and adapt existing code to fully leverage these revolutionary advances.


YARN project founder Arun Murthy and project lead Vinod Kumar Vavilapalli demonstrate how YARN increases scalability and cluster utilization, enables new programming models and services, and opens new options beyond Java and batch processing. They walk you through the entire YARN project lifecycle, from installation through deployment.


You’ll find many examples drawn from the authors’ cutting-edge experience—first as Hadoop’s earliest developers and implementers at Yahoo! and now as Hortonworks developers moving the platform forward and helping customers succeed with it.


Coverage includes

  • YARN’s goals, design, architecture, and components—how it expands the Apache Hadoop ecosystem
  • Exploring YARN on a single node 
  • Administering YARN clusters and Capacity Scheduler 
  • Running existing MapReduce applications 
  • Developing a large-scale clustered YARN application 
  • Discovering new open source frameworks that run under YARN

Intelligent Distributed Computing (Advances in Intelligent Systems and Computing, Volume 321)

Linux Bible (8th Edition)

Data Communications and Computer Networks: A Business User's Approach (7th Edition)

Game AI Pro: Collected Wisdom of Game AI Professionals

Cloud Computing for Dummies

Office 2016 Simplified












different than shown in this listing: $ jps 15140 SecondaryNameNode 15015 NameNode 15335 Jps 15214 DataNode If the process did not start, it may be helpful to inspect the log files. For instance, examine the log file for the NameNode. (Note that the path is taken from the preceding command.) Click here to view code image vi /opt/yarn/hadoop-2.2.0/logs/hadoop-hdfs-namenode-limulus.log All Hadoop services can be stopped using the hadoop-daemon.sh script. For example, to stop the datanode

(pidŠŠ36772) is running... # service hadoop-resourcemanager status Hadoop YARN ResourceManager daemon is stopped We can use grep to confirm the 舠running舡 response or assume the service is stopped otherwise. Once we舗re satisfied with the script, we name it check_resource_manager.sh and put it in the Nagios plug-in directory (e.g., /usr/lib64/nagios/plugins). We tell Nagios about this plug-in by adding the following lines to our hadoop-cluster.cfg file: Click here to view code image define

/usercache//appcache// Irrespective of the application type, once the resources are downloaded and the containers are running, the containers can access these resources locally by making use of the symbolic links created by the NodeManager in each container舗s working directory. Resource Localization Configuration Administrators can control various aspects of resource localization by setting or changing certain configuration parameters in yarn-site.xml when starting

specific to the application in question. Scheduling Example Assume there are four racks舒rackA, rackB, rackC, and rackD舒in the cluster. Also assume that each rack has only four machines each, named host-rackName-12[3-6].domain.com. Imagine an application whose data consists of a total of four files, which are physically located on host-A-123.domain.com, host-A-124.domain.com, host-B-123.domain.com, and host-B-124.domain.com, respectively. For efficient operation, this application expects YARN to

org.apache.hadoop.yarn.api.records.LocalResource as the value. The map key is translated into a symbolic link in the file system visible to the container. More details on this aspect will follow, but for now, let舗s describe the code in Listing 10.4 that builds the map of LocalResources for the ApplicationMaster ContainerLaunch-Context. First, we create the ContainerLaunchContext as a YARN record. Click here to view code image ContainerLaunchContext amContainer =

Download sample