Apache Hadoop文件上传、下载、分布式计算案例初体验

🕗 发布于 2024-07-07 15:04 apache hadoop 大数据

通过上篇，我们搭建了完整的Hadoop集群，此篇我们简单通过集群上传和下载文件，同时测试分布式worldCount案例。后续的篇章再对分布式计算、分布式存储作更深的理解。

上传下载测试

从linux本地文件系统上传下载文件验证HDFS集群工作是否正常

#创建目录
hdfs dfs -mkdir -p /test/input

#本地hoome目录创建一个文件,随便写点内容进去
cd /root
vim test.txt

#上传linxu文件到Hdfs
hdfs dfs -put /root/test.txt /test/input

#从Hdfs下载文件到linux本地（可以换别的节点进行测试）
hdfs dfs -get /test/input/test.txt

分布式计算测试

在HDFS文件系统根目录下面创建一个wcinput文件夹

[root@hadoop01 hadoop-2.9.2]# hdfs dfs -mkdir /wcinput

创建wc.txt文件，输入如下内容

hadoop mapreduce yarn
hdfs hadoop mapreduce
mapreduce yarn kmning
kmning
kmning

上传wc.txt到Hdfs目录/wcinput下

hdfs dfs -put wc.txt /wcinput

执行mapreduce任务

hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar wordcount /wcinput/ /wcoutput

打印如下

24/07/03 20:44:26 INFO client.RMProxy: Connecting to ResourceManager at hadoop03/192.168.43.103:8032
24/07/03 20:44:28 INFO input.FileInputFormat: Total input files to process : 1
24/07/03 20:44:28 INFO mapreduce.JobSubmitter: number of splits:1
24/07/03 20:44:28 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
24/07/03 20:44:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1720006717389_0001
24/07/03 20:44:29 INFO impl.YarnClientImpl: Submitted application application_1720006717389_0001
24/07/03 20:44:29 INFO mapreduce.Job: The url to track the job: http://hadoop03:8088/proxy/application_1720006717389_0001/
24/07/03 20:44:29 INFO mapreduce.Job: Running job: job_1720006717389_0001
24/07/03 20:44:45 INFO mapreduce.Job: Job job_1720006717389_0001 running in uber mode : false
24/07/03 20:44:45 INFO mapreduce.Job:  map 0% reduce 0%
24/07/03 20:44:57 INFO mapreduce.Job:  map 100% reduce 0%
24/07/03 20:45:13 INFO mapreduce.Job:  map 100% reduce 100%
24/07/03 20:45:14 INFO mapreduce.Job: Job job_1720006717389_0001 completed successfully
24/07/03 20:45:14 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=70
                FILE: Number of bytes written=396911
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=180
                HDFS: Number of bytes written=44
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=9440
                Total time spent by all reduces in occupied slots (ms)=11870
                Total time spent by all map tasks (ms)=9440
                Total time spent by all reduce tasks (ms)=11870
                Total vcore-milliseconds taken by all map tasks=9440
                Total vcore-milliseconds taken by all reduce tasks=11870
                Total megabyte-milliseconds taken by all map tasks=9666560
                Total megabyte-milliseconds taken by all reduce tasks=12154880
        Map-Reduce Framework
                Map input records=5
                Map output records=11
                Map output bytes=124
                Map output materialized bytes=70
                Input split bytes=100
                Combine input records=11
                Combine output records=5
                Reduce input groups=5
                Reduce shuffle bytes=70
                Reduce input records=5
                Reduce output records=5
                Spilled Records=10
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=498
                CPU time spent (ms)=3050
                Physical memory (bytes) snapshot=374968320
                Virtual memory (bytes) snapshot=4262629376
                Total committed heap usage (bytes)=219676672
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=80
        File Output Format Counters
                Bytes Written=44

查看结果

[root@hadoop01 hadoop-2.9.2]# hdfs dfs -cat /wcoutput/part-r-00000
hadoop  2
hdfs    1
kmning  3
mapreduce       3
yarn    2

可见，程序将单词出现的次数通过MapReduce分布式计算统计了出来。

原文地址：https://blog.csdn.net/u012882823/article/details/140171027

免责声明：本站文章内容转载自网络资源，如本站内容侵犯了原著者的合法权益，可联系本站删除。更多内容请关注自学内容网（zxcms.com）！

上一篇：【项目设计】负载均衡式——Online Judge
下一篇：VSCode设置字体大小

长度最小的子数组（滑动窗口）
子数组[4,3]是该条件下的长度最小的子数组。因为缩短左边界的同时还要保存区间长度和sum。如果不存在符合条件的子数组，返回。找出该数组中满足其总和大于等于。文字不好描述，我们直接看这道题。个正整数的
阅读更多2024-11-08
Vue实现登录功能
总体来说，这段代码主要负责处理用户登录成功后的一系列操作，包括权限控制、路由跳转、消息提示和加载状态管理。方法用于获取有当前用户权限标识的路由数组，并进行对原路由的替换。方法用于设置递归过滤有权限的路
阅读更多2024-11-08
uniApp之uni-file-picker使用踩坑
uni-file-picker的上传图片都裂了
阅读更多2024-11-08
hybrid-app开发模式中，关于导航栏的问题
混合app开发当中，导航栏的一些问题，导航栏遮盖问题，100vh到底指什么呢，在app中，什么才是真正的视口高度
阅读更多2024-11-08
在鱼皮的模拟面试里面学习有感
1107鱼皮的模拟面试，观看学习一下，也算是给我自己的前期的学习一个指导意义吧
阅读更多2024-11-08
分离编译(介绍,解决“类模板定义和声明不在同一文件导致链接错误“的问题),类模板实例化原理,
分离编译(介绍,解决"类模板定义和声明不在同一文件导致链接错误"的问题),类模板实例化原理,
阅读更多2024-11-08
2024年第六届全球校园人工智能算法精英大赛——【算法挑战赛】钢材表面缺陷检测与分割比赛复盘
钢材表面缺陷检测在钢铁生产中是确保质量的关键环节，传统的人工检测方式难以满足大规模工业生产的需求。近年来，基于深度学习的缺陷检测方法因其高效性和准确性受到广泛关注。然而，现有的深度学习模型如U-N
阅读更多2024-11-08
【SQL实验】高级查询（难点.三）含附加数据库操作
完整代码在文章末尾【代码是自己的解答，并发标准答案，也有可能写错，文中可能会有不准确或待完善之处，恳请各位读者不吝批评指正，共同促进学习交流】将素材中的“学生管理”数据库附加到SQL SERVER中，
阅读更多2024-11-08
《Python 与 SQLite：强大的数据库组合》
Python 与 SQLite：强大的数据库组合》一、Python 与 SQLite 的结合Python 作为一种高级编程语言，具有诸多特点与优势。它以其用户友好性著称，语法简洁明了，代码结构清晰，即
阅读更多2024-11-08
特定数据库的备份脚本
该脚本是一个 MySQL 数据库的备份脚本，以下是它的工作原理和需要注意的细节： 1.设置时间变量 TIME :该变量存储当前日期和时间，格式为，用于生成备份文件的时间戳。2. 定义备份目录
阅读更多2024-11-08

Apache Hadoop文件上传、下载、分布式计算案例初体验

上传下载测试

分布式计算测试

相关文章