Apache Hadoop | Running MapReduce Jobs
Step 1 - Configure extra environment variables
Open the Start Menu and type in 'environment' and press enter. A new window with System Properties should open up. Click the Environment Variables button near the bottom right.
HADOOP_CP environment variable
$ echo %HADOOP_CP%
$ echo %HDFS_LOC%
Step 2 - Compiling our project
Run the following commands in your CLI with your respective .java files.
$ mkdir dist/
$ javac -cp %HADOOP_CP% <some_directory>/*.java -d dist/
Step 3 - Producing a JAR file
Run the following commands to create a JAR file with the compiled classes from Step 2.
$ cd dist
$ jar -cvf <application_name>.jar <some_directory>/*.class
added manifest
...
Step 4 - Copy our inputs to HDFS
Make sure that HDFS and YARN daemons are running. We can now copy our inputs to the HDFS using the copyFromLocal command and verify the contents with the ls command :
$ hadoop fs -copyFromLocal <some_directory>/input %HDFS_LOC%/input
$ hadoop fs -ls %HDFS_LOC%/input
Found X items
...
Step 5 - Run our MapReduce Job
Run the following commands in the dist folder when we originally compiled our code:
$ hadoop jar <application_name>.jar <pkg_name>.<class_name> %HDFS_LOC%/input %HDFS_LOC%/output
2020-10-16 17:44:40,331 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
...
2020-10-16 17:44:43,115 INFO mapreduce.Job: Running job: job_1602881955102_0001
2020-10-16 17:44:55,439 INFO mapreduce.Job: Job job_1602881955102_0001 running in uber mode : false
2020-10-16 17:44:55,441 INFO mapreduce.Job: map 0% reduce 0%
2020-10-16 17:45:04,685 INFO mapreduce.Job: map 100% reduce 0%
2020-10-16 17:45:11,736 INFO mapreduce.Job: map 100% reduce 100%
2020-10-16 17:45:11,748 INFO mapreduce.Job: Job job_1602881955102_0001 completed successfully
...
We can verify the contents of our output by using the cat command just like in shell:
$ hadoop fs -cat %HDFS_LOC%/output/part*
2020-10-16 18:19:50,225 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
...
Step 6 - Copy outputs to our local machine
Once we are satisfied with the results, we can copy the contents to our local machine using the copyToLocal command:
$ hadoop fs -copyToLocal %HDFS_LOC%/output <some_output_directory>/
Comments
Post a Comment