【HBase分布式数据库】第七章数据的导入导出 (2-5)

🕗 发布于 2024-11-11 22:39 分布式 数据库 hbase

7.2 bulkload导入数据

任务目的

掌握引入外部依赖包的方法
掌握eclipse打包的方法
掌握bulkload导入数据的逻辑代码

任务清单

任务1：引入外部依赖包
任务2：bulkload导入数据

任务步骤

任务1：引入外部依赖包

Bulkload是通过一个MapReduce Job来实现的，通过Job直接生成一个HBase的内部HFile格式文件来形成一个特殊的HBase数据表，然后直接将数据文件加载到运行的集群中。使用bulk load功能最简单的方式就是使用importtsv 工具。importtsv 是从TSV文件直接加载内容至HBase的一个内置工具。它通过运行一个MapReduce Job，将数据从TSV文件中直接写入HBase的表或者写入一个HBase的自有格式数据文件。在编写代码逻辑之前，我们首先要引入程序依赖的jar包，步骤如下：

1、右键项目，选择【build path】>【configure build path】

8.2-1

2、在弹出的对话框内，单击【libraries】> 【add external jars】

8.2-2

3、弹出的对话框中，找到Hadoop存放jar包的路径，路径如图所示。

8.2-3

4、当前页面下的文件夹包括MapReduce、hdfs、yarn和common下的所有jar包，选中之后，单击底部的open按钮。需要注意的是，这4个包每个包需要单独打开，单独选中。全部添加完毕之后，单击【apply and close】

任务2：bulkload导入数据
构建BulkLoadJob类

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FsShell;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2;
import org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;

public class BulkLoadJob {
    //指定的类BulkLoadJob初始化日志对象，方便在日志输出的时候，可以打印出日志信息所属的类。
        static Logger logger = LoggerFactory.getLogger(BulkLoadJob.class);
//构建map端输入
        public static class BulkLoadMap extends Mapper<LongWritable, Text, ImmutableBytesWritable, KeyValue> {
//map方法
                public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
//对输入数据进行切分
                        String[] valueStrSplit = value.toString().split("\t");
                    //拿到行键
                        String hkey = valueStrSplit[0];
                    //拿到列族
                        String family = valueStrSplit[1].toString().split(":")[0];
                    //拿到列
                        String column = valueStrSplit[1].toString().split(":")[1];
                    //拿到数值
                        String hvalue = valueStrSplit[2];
                    //行键转换成不可变型的字节
                        final byte[] rowKey = Bytes.toBytes(hkey);
                        final ImmutableBytesWritable HKey = new ImmutableBytesWritable(rowKey);
                        //把行键、列族、列和值封装成KV对儿
                    KeyValue kv = new KeyValue(rowKey, Bytes.toBytes(family), Bytes.toBytes(column), Bytes.toBytes(hvalue));
                        //写到磁盘
                    context.write(HKey, kv);
                }
        }

        public static void main(String[] args) throws Exception {
            //配置信息的创建
                Configuration conf = HBaseConfiguration.create();
                conf.set("hbase.zookeeper.property.clientPort", "2181");
                conf.set("hbase.zookeeper.quorum", "localhost");
            //指定数据的输入和输出
                String[] dfsArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
                String inputPath = dfsArgs[0];
                System.out.println("source: " + dfsArgs[0]);
                String outputPath = dfsArgs[1];
                System.out.println("dest: " + dfsArgs[1]);
                HTable hTable = null;
                Job job = Job.getInstance(conf, "Test Import HFile & Bulkload");
                job.setJarByClass(BulkLoadJob.class);
                job.setMapperClass(BulkLoadJob.BulkLoadMap.class);
                job.setMapOutputKeyClass(ImmutableBytesWritable.class);
                job.setMapOutputValueClass(KeyValue.class);
                // 避免测试task
                job.setSpeculativeExecution(false);
                job.setReduceSpeculativeExecution(false);
                // 输入输出端的格式
                job.setInputFormatClass(TextInputFormat.class);
                job.setOutputFormatClass(HFileOutputFormat2.class);

                FileInputFormat.setInputPaths(job, inputPath);
                FileOutputFormat.setOutputPath(job, new Path(outputPath));
            //指定表名
                hTable = new HTable(conf, "ns:t_table");
                HFileOutputFormat2.configureIncrementalLoad(job, hTable);

                if (job.waitForCompletion(true)) {
                        FsShell shell = new FsShell(conf);
                        try {
                                shell.run(new String[] { "-chmod", "-R", "777", dfsArgs[1] });
                        } catch (Exception e) {
                                logger.error("Couldnt change the file permissions ", e);
                                throw new IOException(e);
                        }
                        //数据导入hbase表
                        LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
                        loader.doBulkLoad(new Path(outputPath), hTable);
                } else {
                        logger.error("loading failed.");
                        System.exit(1);
                }
                if (hTable != null) {
                        hTable.close();
                }
        }
}

创建测试表

进入hbase的shell环境，创建测试命名空间和测试表。

bin/hbase shell
create_namespace 'ns'
create 'ns:t_table','cf1','cf2'

8.2-4

数据上传

HDFS上创建目录，把数据bulkdata.csv上传到HDFS。

cat bulkdata.csv
hadoop fs -mkdir -p /data/input
hadoop fs -put bulkdata.csv /data/input

8.2-5

完成了数据和程序之后，就要对程序打包了。

1、选中程序所在包，右键选择【export】

8.2-6

2、弹出的对话框中选择【Java】下的【jar file】，单击next。

8.2-7

3、在弹出的对话框，勾选依赖，指定jar包的输出路径。单击next。

8.2-8

4、本对话框不需要操作

8.2-9

5、在接下来的对话框中，需要指定运行主类。最后单击finish

8.2-10

运行jar包

使用Hadoop运行jar包的命令，执行导入数据操作。

hadoop jar /headless/Desktop/test.jar /data/input/bulkdata.csv /data/output/bulk_out

8.2-11

查看结果

进入shell环境，查看表中是否有数据。

bin/hbase shell
scan 'ns:t_table'

8.2-12

7.3 HBase的WordCount

任务目的
实践hbase的Wordcount
任务清单
任务1：准备工作
任务2：WordCount

任务步骤

任务1：准备工作

测试命名空间和测试表

进入shell环境。创建测试命名空间ns以及测试表src_table和dest_table，两张表都只有一个列族cf。

bin/hbase shell
create_namespace 'ns'
create 'ns:src_table','cf'
create 'ns:dest_table','cf'

测试数据

为测试表插入测试数据。

put 'ns:src_table','1','cf:word','hello'
put 'ns:src_table','2','cf:word','Java'
put 'ns:src_table','3','cf:word','hello'
put 'ns:src_table','4','cf:word','Scala'

8.3-1

任务2：WordCount

程序逻辑

新建wordcount包，在包下新建HbaseWordCount类。

package wordcount;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Mutation;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class HbaseWordCount {
//日志输出的时候，可以打印出日志信息所在类
        static Logger logger = LoggerFactory.getLogger(HbaseWordCount.class);
    //设置服务器端口以及服务器地址
        static Configuration conf = null;
        static {
                conf = HBaseConfiguration.create();
                conf.set("hbase.zookeeper.property.clientPort", "2181");
                conf.set("hbase.zookeeper.quorum", "localhost");
        }

        public static class HBMapper extends TableMapper<Text,IntWritable>{
                private static IntWritable one = new IntWritable(1);
                private static Text word = new Text();
                @Override
                protected void map(ImmutableBytesWritable key, Result value,
                                Mapper<ImmutableBytesWritable, Result, Text, IntWritable>.Context context)
                                throws IOException, InterruptedException {
                        for(Cell cell : value.rawCells()) {
                                word.set(CellUtil.cloneValue(cell));
                                context.write(word, one);
                        }
                }
        }

        public static class HBReducer extends TableReducer<Text,IntWritable,ImmutableBytesWritable>{

                @SuppressWarnings("deprecation")
                @Override
                protected void reduce(Text key, Iterable<IntWritable> values,
                                Reducer<Text, IntWritable, ImmutableBytesWritable, Mutation>.Context context)
                                throws IOException, InterruptedException {
                        int sum = 0;
                        for(IntWritable value : values) {
                                sum += value.get();
                        }
                        //把单词作为行键进行存储
                        Put put = new Put(Bytes.toBytes(key.toString()));
                        //数据存储到hbase表，列族为cf，列为col，值为sum
                        put.add(Bytes.toBytes("cf"),
                                        Bytes.toBytes("col"), 
                                        Bytes.toBytes(String.valueOf(sum)));
                        //写到hbase中的需要指定行键和put
                        context.write(new ImmutableBytesWritable(Bytes.toBytes(key.toString())), put);
                }
        }

        public static void main(String [] args) throws IOException, ClassNotFoundException, InterruptedException {
                @SuppressWarnings("deprecation")
                Job job = new Job(conf,"hbase wordcount");
                Scan scan = new Scan();
            //使用TableMapReduceUtil工具类初始化map，扫描源表中数据执行map操作
                TableMapReduceUtil.initTableMapperJob(
                                "ns:src_table",
                                scan,
                                HBMapper.class,
                                Text.class,
                                IntWritable.class,
                                job);
            //使用TableMapReduceUtil工具类初始化reduce，把reduce之后的结果存储到目标表
                TableMapReduceUtil.initTableReducerJob(
                                "ns:dest_table",
                                HBReducer.class,
                                job);
                job.waitForCompletion(true);
                System.out.println("finished");
        }
}

执行结果

程序完成之后，运行程序。当我们看到“finished”后，进入shell环境查看目标表中数据。

scan 'ns:dest_table'

8.3-2

8.4 HDFS数据导入HBase

任务目的
掌握Hadoop与HBase的集成使用
任务清单
任务1：HDFS数据导入HBase

任务步骤

任务1：HDFS数据导入HBase

上传数据

在HDFS上新建/data/input目录，把hbase目录下的testdata中的csvdata.txt上传到该目录。

hadoop fs -mkdir -p /data/input
hadoop fs -put ./csvdata.txt /data/input

在这里插入图片描述
8.4-1

程序

新建一个hdfsandhbase包，包下新建一个Hdfs2Hbase类。

package hdfsandhbase;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.NamespaceDescriptor;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

public class Hdfs2Hbase {

        public static class MyMapper extends Mapper<LongWritable,Text,Text,Text>{
                @Override
                protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, Text>.Context context)
                                throws IOException, InterruptedException {
                        String line = value.toString();
                        String[] lines = line.split(",");
                        for (String s : lines) {
                                context.write(new Text(s), new Text(1+""));
                        }
                }
        }

        public static class MyReduce extends TableReducer<Text,Text,ImmutableBytesWritable>{
                @Override
                protected void reduce(Text key, Iterable<Text> value,Context context)
                                throws IOException, InterruptedException {
                        int counter = 0;
                        for(Text t:value) {
                                counter += Integer.parseInt(t.toString());
                        }
                        //写出到hbase中去
                        Put put = new Put(Bytes.toBytes(key.toString()));
                        put.addColumn("data".getBytes(), "count".getBytes(), (counter+"").getBytes());
                        context.write(new ImmutableBytesWritable(key.getBytes()), put);
                }
        }

        public static void main(String [] args) throws IOException, ClassNotFoundException, InterruptedException {
                Configuration conf = new Configuration();
                conf.set("fs.defaultFS", "hdfs://localhost:9000");
                conf.set("hbase.zookeeper.quorum", "localhost");
                TableName tn = TableName.valueOf("ns:test");
            //对hbase进行操作
                Connection conn = ConnectionFactory.createConnection(conf);
                Admin admin = conn.getAdmin();
            //创建命名空间
                NamespaceDescriptor nsd = NamespaceDescriptor.create("ns").build();
                admin.createNamespace(nsd);
            //创建表
                HTableDescriptor htd = new HTableDescriptor(TableName.valueOf("ns:test"));
                HColumnDescriptor hcd = new HColumnDescriptor("data");
                htd.addFamily(hcd);
            //判断表是否存在
                if(admin.tableExists(tn)) {
                        if(admin.isTableEnabled(tn)) {
                                admin.disableTable(tn);
                        }
                        admin.deleteTable(tn);
                }
                admin.createTable(htd);
            //定义job
                Job job = Job.getInstance(conf,"hdfs2hbase");
                job.setMapperClass(MyMapper.class);
                job.setMapOutputKeyClass(Text.class);
                job.setMapOutputValueClass(Text.class);
            //数据输入路径
                FileInputFormat.addInputPath(job, new Path("/data/input/csvdata.txt"));
            //使用TableMapreduceUtil初始化reduce 
                TableMapReduceUtil.initTableReducerJob(
                                "ns:test",
                                MyReduce.class,
                                job);
                job.waitForCompletion(true);
                System.out.println("finished");
        }
}

查看结果

运行程序没有报错的情况下，进入shell环境，查看test表中数据。

bin/hbase shell
scan 'ns:test'

8.4-2

8.5 HBase数据导入HDFS

任务目的
掌握hbase数据导入到HDFS的程序逻辑
任务清单
任务1：HBase数据导入HDFS

任务步骤

任务1：HBase数据导入HDFS

原始表和数据

进入shell环境，创建测试表“ns:test”，包括一个列族cf和一个列col，并插入两条数据。

create_namespace 'ns'
create 'ns:test','cf'
put 'ns:test','1','cf:col','value1'
put 'ns:test','2','cf:col','value2'

8.5-1

程序

在hdfsandhbase包下新建Hbase2Hdfs类。

package hdfsandhbase;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class Hbase2Hdfs {
        public static class MyMapper extends TableMapper<Text,Text>{
                @Override
                protected void map(ImmutableBytesWritable key, Result value,
                                Mapper<ImmutableBytesWritable, Result, Text, Text>.Context context)
                                throws IOException, InterruptedException {
                        //获取对应的列族和列，设置为utf-8
                        String cfandc = new String(value.getValue("cf".getBytes(),
                                        "col".getBytes()),"utf-8");
                        context.write(new Text(""), new Text(cfandc));
                }
        }

        public static class MyReducer extends Reducer<Text,Text,Text,Text>{

            //实例化Text  用来存储获取到的数据
                private Text result = new Text();
                @Override
                protected void reduce(Text key, Iterable<Text> values, Context context)
                                throws IOException, InterruptedException {
                        for(Text t : values) {
                                result.set(t);
                                context.write(key, result);
                        }

                }
        }

        public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
                //配置相关信息
            Configuration conf = new Configuration();
                conf.set("fs.defaultFS", "hdfs://localhost:9000");
                conf.set("hbase.zookeeper.quorum", "localhost");
            //实例化任务
                Job job = Job.getInstance(conf,"hbase2hdfs");
                //设置运行主类
            job.setJarByClass(Hbase2Hdfs.class);
                Scan scan = new Scan();
                TableMapReduceUtil.initTableMapperJob(
                                "ns:test",
                                scan,
                                MyMapper.class,
                                Text.class,
                                Text.class,
                                job);
                job.setReducerClass(MyReducer.class);
            //设置输出路径
                FileOutputFormat.setOutputPath(job, new Path("/data/output/out"));
                job.waitForCompletion(true);
                System.out.println("finished");
        }
}

执行结果

退出shell环境，查看输出路径下的文件结果。

quit
hadoop fs -cat /data/output/out/part-r-00000

8.5-2
8.5-2

原文地址：https://blog.csdn.net/qq_23934063/article/details/140887612

免责声明：本站文章内容转载自网络资源，如本站内容侵犯了原著者的合法权益，可联系本站删除。更多内容请关注自学内容网（zxcms.com）！

上一篇：力扣每日一题有序数组中的单一元素
下一篇：【LeetCode】每日一题 2024_11_9 设计相邻元素求和服务（构造，哈希）

【软件工程】耦合
耦合性指软件结构中模块相互紧密连接的紧密程度。耦合性由高到低分别为：内容耦合、公共耦合、外部耦合、控制耦合、标记耦合、数据耦合、非直接耦合。
阅读更多2024-11-13
【How AI Works】读书笔记1 前言
开头部分说明:本书没有从复杂的数学上讲解AI其次介绍了AI的发展历史,以1980s为节点,经过中间的几十年,定义出了人工智能的目标:用机器模仿智能行为机器学习的分支学科:深度学习暂且可以认为深度学习,
阅读更多2024-11-13
第八章利用css制作导航菜单
nav>标签是HIML5新增文档结构标签，用于标记导航栏，以便后续与网站的其他内容整合，所以<nav>标签在页面上创建导航栏菜单区域。
阅读更多2024-11-13
PostgreSql - 字符串函数
PostgreSql - 字符串函数
阅读更多2024-11-13
[241108] AMD 开源首批 10 亿参数语言模型：AMD OLMo | Xfce 4.20 Pre1发布
- AMD 开源首批 10 亿参数语言模型：AMD OLMo- Xfce 4.20 Pre1发布
阅读更多2024-11-13
利用滑动窗口解题
本篇文章还是以先介绍题目意思，然后再解释以下算法原理，最后再是给出代码的实现。但是与上篇文章不同的是，这篇文章回给出为什么要用滑动窗口解题，怎么想到的，以及滑动窗口的原理到底是什么。所以前两题会以最详
阅读更多2024-11-13
【OceanBase 诊断调优】—— OceanBase 数据库统计信息被禁用，状态为 broken 的原因和解决方法
因为人为因素导致部分统计信息函数未安装，自动统计信息触发执行长期失败。重新安装统计信息相关函数后，发现仍然无法正常自动统计信息收集，统计信息状态为 broken。
阅读更多2024-11-13
hbase集成phoenix
hbase集成phoenix
阅读更多2024-11-13
./mysqld: error while loading shared libraries: libaio.so.1: cannot open sha
使用离线方式安装：rpm -ivh --nodeps mysql* ，执行 systemctl start mysqld.service发现启动不了，通过vi /var/log/mysql.log看到
阅读更多2024-11-13
Jtti：服务器为什么要做raid？原因是什么？
不同的RAID级别(如RAID 1、RAID 5、RAID 6等)通过将数据复制或分布到多个硬盘上，当其中一个硬盘发生故障时，系统仍然能够从其他硬盘上恢复数据，从而避免数据丢失。RAID 10(镜像+
阅读更多2024-11-13

【HBase分布式数据库】第七章 数据的导入导出 (2-5)

7.2 bulkload导入数据

任务目的

任务清单

任务步骤

任务1：引入外部依赖包

7.3 HBase的WordCount

任务步骤

任务1：准备工作

任务2：WordCount

8.4 HDFS数据导入HBase

任务步骤

任务1：HDFS数据导入HBase

8.5 HBase数据导入HDFS

任务步骤

任务1：HBase数据导入HDFS

相关文章

【HBase分布式数据库】第七章数据的导入导出 (2-5)