大数据技术-实验05-MapReduce实践【实测可行】
MapReduce实践(实测可行)
实践一 MapReduce任务
1)拷贝02-上机实验/ds.txt到客户端机器/opt目录下。
# hadoop fs -put ds.txt /user/root/ds.txt
# hadoop fs -ls /user/root
Found 1 items
-rw-r--r-- 3 root supergroup 9135 2015-05-29 19:49 /user/root/ds.txt
2)拷贝hadoop的安装目录的MapReduce Example的jar包到/opt目录下。
# sudo cp ~/local/opt/hadoop-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar /opt
# ls /opt/hadoop-mapreduce*
/opt/hadoop-mapreduce-examples-2.6.0.jar
[hadoop@master hadoop-2.6.0]$ hadoop jar /opt/hadoop-mapreduce-examples-2.6.0.jar wordcount /user/root/ds.txt /user/root/ds_out
17/04/16 12:21:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/04/16 12:21:32 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.164.5:8032
17/04/16 12:21:35 INFO input.FileInputFormat: Total input paths to process : 1
17/04/16 12:21:36 INFO mapreduce.JobSubmitter: number of splits:1
17/04/16 12:21:40 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1492366567239_0001
17/04/16 12:21:45 INFO impl.YarnClientImpl: Submitted application application_1492366567239_0001
17/04/16 12:21:46 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1492366567239_0001/
17/04/16 12:21:46 INFO mapreduce.Job: Running job: job_1492366567239_0001
17/04/16 12:22:35 INFO mapreduce.Job: Job job_1492366567239_0001 running in uber mode : false
17/04/16 12:22:35 INFO mapreduce.Job: map 0% reduce 0%
17/04/16 12:23:26 INFO mapreduce.Job: map 100% reduce 0%
17/04/16 12:23:45 INFO mapreduce.Job: map 100% reduce 100%
17/04/16 12:23:46 INFO mapreduce.Job: Job job_1492366567239_0001 completed successfully
17/04/16 12:23:46 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=10341
FILE: Number of bytes written=231931
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=9230
HDFS: Number of bytes written=9375
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=44983
Total time spent by all reduces in occupied slots (ms)=14322
Total time spent by all map tasks (ms)=44983
Total time spent by all reduce tasks (ms)=14322
Total vcore-seconds taken by all map tasks=44983
Total vcore-seconds taken by all reduce tasks=14322
Total megabyte-seconds taken by all map tasks=46062592
Total megabyte-seconds taken by all reduce tasks=14665728
Map-Reduce Framework
Map input records=240
Map output records=240
Map output bytes=9855
Map output materialized bytes=10341
Input split bytes=95
Combine input records=240
Combine output records=240
Reduce input groups=240
Reduce shuffle bytes=10341
Reduce input records=240
Reduce output records=240
Spilled Records=480
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=873
CPU time spent (ms)=23610
Physical memory (bytes) snapshot=301469696
Virtual memory (bytes) snapshot=1954639872
Total committed heap usage (bytes)=136450048
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=9135
File Output Format Counters
Bytes Written=9375
4)查看任务的输出。
# hadoop fs -cat /user/root/ds_out/part-r-00000
16.75481160342442,0.5590169943749481 1
17.759065824032646,0.6708203932499373 1
17.944905786933322,0.5852349955359809 1
18.619213022043585,0.5024937810560444 1
18.664436259885097,0.7433034373659246 1
……
实践二 Hbase MapReduce命令
1)拷贝02-上机实验/user.csv到客户端机器/opt目录下,并上传至HDFS。
# hadoop fs -put /opt/user.txt /user/root/user.csv
# hadoop fs -ls /user/root/user.csv
Found 1 items
-rw-r--r-- 3 root supergroup 8393 2015-08-13 11:04 /user/root/user.csv
2)使用HBase Shell新建user表。
hbase(main):001:0> create 'user','info'
0 row(s) in 1.2520 seconds
=> Hbase::Table - user
3)运行MapReduce任务,导入数据。
①生成HFile
#export HADOOP_CLASSPATH=$HBASE_HOME/lib/*:classpath
# hadoop jar $HBASE_HOME/lib/hbase-server-1.0.3.jar importtsv -Dimporttsv.separator="," -Dimporttsv.bulk.output=/user/root/hbase_tmp -Dimporttsv.columns=HBASE_ROW_KEY,info:name,info:age user /user/root/user.csv
15/08/13 11:16:50 INFO zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0xdcfda20 connecting to ZooKeeper ensemble=master:2181,slave1:2181,slave2:2181
15/08/13 11:16:50 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
15/08/13 11:16:50 INFO zookeeper.ZooKeeper: Client environment:host.name=master.example.com
15/08/13 11:16:50 INFO zookeeper.ZooKeeper: Client environment:java.version=1.7.0_75
15/08/13 11:16:50 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
15/08/13 11:16:50 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.75.x86_64/jre
15/08/13 11:16:50 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/opt/hadoop-2.6.0/etc/hadoop:/opt/hadoop-2.6.0/share/hadoop/common/lib/log4j-1.2.17.jar:/opt/hadoop-2.6.0/share/hadoop/common/lib/paranamer-2.3.jar:/opt/hadoop-2.6.0/share/hadoop/common/lib/commons-logging-1.1.3.jar:/opt/hadoop-2.6.0/share/hadoop/common/lib/asm-3.2.jar……
15/08/13 11:16:50 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/opt/hadoop-2.6.0/lib/native
15/08/13 11:16:50 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
15/08/13 11:16:50 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
15/08/13 11:16:50 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
15/08/13 11:16:50 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
15/08/13 11:16:50 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.32-504.el6.x86_64
15/08/13 11:16:50 INFO zookeeper.ZooKeeper: Client environment:user.name=root
15/08/13 11:16:50 INFO zookeeper.ZooKeeper: Client environment:user.home=/root
15/08/13 11:16:50 INFO zookeeper.ZooKeeper: Client environment:user.dir=/root
15/08/13 11:16:50 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181 sessionTimeout=90000 watcher=hconnection-0xdcfda200x0, quorum=master:2181, baseZNode=/hbase
15/08/13 11:16:50 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
15/08/13 11:16:50 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
15/08/13 11:16:50 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x14f2495fb0f000c, negotiated timeout = 90000
15/08/13 11:16:52 INFO mapreduce.HFileOutputFormat2: Looking up current regions for table user
15/08/13 11:16:52 INFO mapreduce.HFileOutputFormat2: Configuring 1 reduce partitions to match current region count
15/08/13 11:16:52 INFO mapreduce.HFileOutputFormat2: Writing partition information to /tmp/hadoop-root/partitions_aa02a3fe-23be-40a6-844f-cd2d64a38e92
15/08/13 11:16:52 INFO compress.CodecPool: Got brand-new compressor [.deflate]
15/08/13 11:16:52 INFO mapreduce.HFileOutputFormat2: Incremental table user output configured.
15/08/13 11:16:52 INFO client.ConnectionManager$HConnectionImplementation: Closing master protocol: MasterService
15/08/13 11:16:52 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x14f2495fb0f000c
15/08/13 11:16:52 INFO zookeeper.ZooKeeper: Session: 0x14f2495fb0f000c closed
15/08/13 11:16:52 INFO zookeeper.ClientCnxn: EventThread shut down
15/08/13 11:16:53 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.222.131:8032
15/08/13 11:16:55 INFO input.FileInputFormat: Total input paths to process : 1
15/08/13 11:16:55 INFO mapreduce.JobSubmitter: number of splits:1
15/08/13 11:16:55 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
15/08/13 11:16:55 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1439427807631_0002
15/08/13 11:16:56 INFO impl.YarnClientImpl: Submitted application application_1439427807631_0002
15/08/13 11:16:56 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1439427807631_0002/
15/08/13 11:16:56 INFO mapreduce.Job: Running job: job_1439427807631_0002
15/08/13 11:17:11 INFO mapreduce.Job: Job job_1439427807631_0002 running in uber mode : false
15/08/13 11:17:11 INFO mapreduce.Job: map 0% reduce 0%
15/08/13 11:17:19 INFO mapreduce.Job: map 100% reduce 0%
15/08/13 11:17:29 INFO mapreduce.Job: map 100% reduce 100%
15/08/13 11:17:29 INFO mapreduce.Job: Job job_1439427807631_0002 completed successfully
15/08/13 11:17:29 INFO mapreduce.Job: Counters: 50
File System Counters
FILE: Number of bytes read=42188
FILE: Number of bytes written=356921
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=8496
HDFS: Number of bytes written=44391
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=5813
Total time spent by all reduces in occupied slots (ms)=11916
Total time spent by all map tasks (ms)=5813
Total time spent by all reduce tasks (ms)=5958
Total vcore-seconds taken by all map tasks=5813
Total vcore-seconds taken by all reduce tasks=5958
Total megabyte-seconds taken by all map tasks=5952512
Total megabyte-seconds taken by all reduce tasks=7888392
Map-Reduce Framework
Map input records=538
Map output records=538
Map output bytes=41106
Map output materialized bytes=42188
Input split bytes=103
Combine input records=538
Combine output records=538
Reduce input groups=538
Reduce shuffle bytes=42188
Reduce input records=538
Reduce output records=1076
Spilled Records=1076
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=241
CPU time spent (ms)=4890
Physical memory (bytes) snapshot=476753920
Virtual memory (bytes) snapshot=5710151680
Total committed heap usage (bytes)=384303104
ImportTsv
Bad Lines=0
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=8393
File Output Format Counters
Bytes Written=44391
查看HDFS,即可看到生成的HFile:
# hadoop fs -ls -R /user/root/hbase_tmp
-rw-r--r-- 3 root supergroup 0 2015-08-13 11:17 /user/root/hbase_tmp/_SUCCESS
drwxr-xr-x - root supergroup 0 2015-08-13 11:17 /user/root/hbase_tmp/info
-rw-r--r-- 3 root supergroup 44391 2015-08-13 11:17 /user/root/hbase_tmp/info/e8cf8a1ac70d40e2a985711dfb678cdd
②将HFile数据导入user表中。
# hadoop jar $HBASE_HOME/lib/hbase-server-1.0.3.jar completebulkload /user/root/hbase_tmp user
15/08/13 11:29:02 INFO zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x5ed731d0 connecting to ZooKeeper ensemble=master:2181,slave1:2181,slave2:2181
15/08/13 11:29:02 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
15/08/13 11:29:02 INFO zookeeper.ZooKeeper: Client environment:host.name=master.example.com
15/08/13 11:29:02 INFO zookeeper.ZooKeeper: Client environment:java.version=1.7.0_75
15/08/13 11:29:02 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
15/08/13 11:29:02 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.75.x86_64/jre
15/08/13 11:29:02 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/opt/hadoop-2.6.0/etc/hadoop:/opt/hadoop-2.6.0/share/hadoop/common/lib/log4j-1.2.17.jar:/opt/hadoop-2.6.0/share/hadoop/common/lib/paranamer-2.3.jar:/opt/hadoop-2.6.0/share/ha……
:/opt/hbase-1.0.1.1/lib/hbase-thrift-1.0.1.1.jar:classpath:/opt/hadoop-2.6.0/contrib/capacity-scheduler/*.jar
15/08/13 11:29:02 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/opt/hadoop-2.6.0/lib/native
15/08/13 11:29:02 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
15/08/13 11:29:02 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
15/08/13 11:29:02 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
15/08/13 11:29:02 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
15/08/13 11:29:02 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.32-504.el6.x86_64
15/08/13 11:29:02 INFO zookeeper.ZooKeeper: Client environment:user.name=root
15/08/13 11:29:02 INFO zookeeper.ZooKeeper: Client environment:user.home=/root
15/08/13 11:29:02 INFO zookeeper.ZooKeeper: Client environment:user.dir=/root
15/08/13 11:29:02 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181 sessionTimeout=90000 watcher=hconnection-0x5ed731d00x0, quorum=master:2181, baseZNode=/hbase
15/08/13 11:29:02 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
15/08/13 11:29:02 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
15/08/13 11:29:02 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x14f2495fb0f000f, negotiated timeout = 90000
15/08/13 11:29:03 INFO zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x4bee18dc connecting to ZooKeeper ensemble=localhost:2181
15/08/13 11:29:03 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181 sessionTimeout=90000 watcher=hconnection-0x4bee18dc0x0, quorum=master:2181, baseZNode=/hbase
15/08/13 11:29:03 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
15/08/13 11:29:03 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
15/08/13 11:29:03 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x14f2495fb0f0010, negotiated timeout = 90000
15/08/13 11:29:04 WARN mapreduce.LoadIncrementalHFiles: Skipping non-directory hdfs://master:8020/user/root/hbase_tmp/_SUCCESS
15/08/13 11:29:04 INFO hfile.CacheConfig: CacheConfig:disabled
15/08/13 11:29:04 INFO mapreduce.LoadIncrementalHFiles: Trying to load hfile
=hdfs://master:8020/user/root/hbase_tmp/info/e8cf8a1ac70d40e2a985711dfb678cdd first=1 last=rowkey
15/08/13 11:29:05 INFO client.ConnectionManager$HConnectionImplementation: Closing master protocol: MasterService
15/08/13 11:29:05 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x14f2495fb0f0010
15/08/13 11:29:05 INFO zookeeper.ZooKeeper: Session: 0x14f2495fb0f0010 closed
15/08/13 11:29:05 INFO zookeeper.ClientCnxn: EventThread shut down
4)查看HBase 中user表的数据
hbase(main):007:0>scan 'user'
……
99 column=info:age, timestamp=1439435809424, value=57
99 column=info:name, timestamp=1439435809424, value=user99
rowkey column=info:age, timestamp=1439435809424, value=age
rowkey column=info:name, timestamp=1439435809424, value=name
538 row(s) in 2.2050 seconds
注意:若出现Class path包含了多个SLF4J的绑定。分别在目录:/usr/local/hbase/lib和/usr/local/hadoop/hadoop-1.0.3/lib下,解决的办法很简单,只要移除其中一个SLF4J就行了。我们移除了/usr/local/hadoop/hadoop-1.0.3/lib下的slf4j-log4j12-1.4.3.jar包后,则不会再出现上述错误提示。
若出现找不到jar包,则直接使用/home/Hadoop/local/opt/hbase-1.0.3这样的路径,可以避免这一问题,也可以设置环境变量来解决。
原文地址:https://blog.csdn.net/u013571432/article/details/140561053
免责声明:本站文章内容转载自网络资源,如本站内容侵犯了原著者的合法权益,可联系本站删除。更多内容请关注自学内容网(zxcms.com)!