Apache Paimon Hive引擎解析

🕗 发布于 2024-03-01 12:07 apache hive hadoop

HIve 引擎

Paimon 当前支持 Hive 的 3.1, 2.3, 2.2, 2.1 和 2.1-cdh-6.3 版本。

1.执行引擎

当使用Hive Read数据时，Paimon 支持 MR 和 Tez 引擎, 当使用Hive Write数据时，Paimon支持MR引擎，如果使用 beeline，需要重启hive cluster。

2.安装

Version           Jar
Hive 3.1        paimon-hive-connector-3.1-0.7.0-incubating.jar
Hive 2.3        paimon-hive-connector-2.3-0.7.0-incubating.jar
Hive 2.2        paimon-hive-connector-2.2-0.7.0-incubating.jar
Hive 2.1        paimon-hive-connector-2.1-0.7.0-incubating.jar
Hive 2.1-cdh-6.3paimon-hive-connector-2.1-cdh-6.3-0.7.0-incubating.jar

将Jar添加到Hive中有以下方法：

在Hive的根目录下创建一个auxlib文件夹，并将paimon-hive-connector-0.7.0-incubating.jar复制到auxlib中。
将jar复制到Hive可访问的路径，然后使用add jar /path/to/paimon-hive-connector-0.7.0-incubating.jar在Hive中启用paimon支持，注意，不建议使用这种方法，如果使用MR引擎运行Join语句，可能报错org.apache.hive.com.esotericsoftware.kryo.kryoexception: unable to find class。

注意：

如果使用HDFS，请确保设置了环境变量HADOOP_HOME或HADOOP_CONF_DIR。
如果使用hive cbo优化器，可能会导致一些不正确的查询结果，例如使用not null谓词查询struct类型，可以通过set hive.cbo.enable=false;命令禁用cbo优化器。

3.Flink SQL: 使用 Paimon Hive Catalog

使用paimon Hive catalog，可以通过Flink create，drop，select 和 insert 到 paimon 表中，这些操作直接影响对应的Hive metastore，以这种方式创建的表可以直接从Hive访问。

步骤一：

准备Flink Hive Connector Bundled Jar。

步骤二：

在Flink SQL Client中执行以下Flink SQL脚本，以定义Paimon Hive catalog并创建Table。

-- Flink SQL CLI
-- Define paimon Hive catalog

CREATE CATALOG my_hive WITH (
  'type' = 'paimon',
  'metastore' = 'hive',
  -- 'uri' = 'thrift://<hive-metastore-host-name>:<port>', default use 'hive.metastore.uris' in HiveConf
  -- 'hive-conf-dir' = '...', this is recommended in the kerberos environment
  -- 'hadoop-conf-dir' = '...', this is recommended in the kerberos environment
  -- 'warehouse' = 'hdfs:///path/to/table/store/warehouse', default use 'hive.metastore.warehouse.dir' in HiveConf
);

-- Use paimon Hive catalog

USE CATALOG my_hive;

-- Create a table in paimon Hive catalog (use "default" database by default)

CREATE TABLE test_table (
  a int,
  b string
);

-- Insert records into test table

INSERT INTO test_table VALUES (1, 'Table'), (2, 'Store');

-- Read records from test table

SELECT * FROM test_table;

/*
+---+-------+
| a |     b |
+---+-------+
| 1 | Table |
| 2 | Store |
+---+-------+
*/

4.Hive SQL: 访问已经在 Hive metastore 中的 Paimon Tables

在Hive CLI中运行以下Hive SQL访问创建的表。

-- Assume that paimon-hive-connector-<hive-version>-0.7.0-incubating.jar is already in auxlib directory.
-- List tables in Hive
-- (you might need to switch to "default" database if you're not there by default)

SHOW TABLES;

/*
OK
test_table
*/

-- Read records from test_table

SELECT a, b FROM test_table ORDER BY a;

/*
OK
1Table
2Store
*/

-- Insert records into test table
-- Note tez engine does not support hive write, only the hive engine is supported.

INSERT INTO test_table VALUES (3, 'Paimon');

SELECT a, b FROM test_table ORDER BY a;

/*
OK
1Table
2Store
3Paimon
*/

-- time travel

SET paimon.scan.snapshot-id=1;
SELECT a, b FROM test_table ORDER BY a;
/*
OK
1Table
2Store
3Paimon
*/
SET paimon.scan.snapshot-id=null;

5.Hive SQL: 创建新的 Paimon Tables

在Hive CLI中运行以下Hive SQL创建新的paimon表。

-- Assume that paimon-hive-connector-0.7.0-incubating.jar is already in auxlib directory.
-- Let's create a new paimon table.

SET hive.metastore.warehouse.dir=warehouse_path;

CREATE TABLE hive_test_table(
    a INT COMMENT 'The a field',
    b STRING COMMENT 'The b field'
)
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler';

6.Hive SQL: 通过外部表访问 Paimon 的表

在Hive中将paimon表注册为外部表，在Hive CLI中运行以下Hive SQL来访问。

-- Assume that paimon-hive-connector-0.7.0-incubating.jar is already in auxlib directory.
-- Let's use the test_table created in the above section.
-- To create an external table, you don't need to specify any column or table properties.
-- Pointing the location to the path of table is enough.

CREATE EXTERNAL TABLE external_test_table
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler'
LOCATION '/path/to/table/store/warehouse/default.db/test_table';
    
-- In addition to the way setting location above, you can also place the location setting in TBProperties
-- to avoid Hive accessing Paimon's location through its own file system when creating tables.
-- This method is effective in scenarios using Object storage,such as s3.

CREATE EXTERNAL TABLE external_test_table
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler'
TBLPROPERTIES (
 'paimon_location' ='s3://xxxxx/path/to/table/store/warehouse/default.db/test_table'
);

-- Read records from external_test_table

SELECT a, b FROM external_test_table ORDER BY a;

/*
OK
1Table
2Store
*/

-- Insert records into test table

INSERT INTO external_test_table VALUES (3, 'Paimon');

SELECT a, b FROM external_test_table ORDER BY a;

/*
OK
1Table
2Store
3Paimon
*/

7.Hive 和 Paimon 的类型映射

列出了Hive和Paimon之间所有支持的类型转换，Hive的所有数据类型都可以在org.apache.hadoop.hive.serde2.typeinfo中找到。

Hive Data Type	Paimon Data Type	Atomic Type
`StructTypeInfo`	`RowType`	false
`MapTypeInfo`	`MapType`	false
`ListTypeInfo`	`ArrayType`	false
`PrimitiveTypeInfo("boolean")`	`BooleanType`	true
`PrimitiveTypeInfo("tinyint")`	`TinyIntType`	true
`PrimitiveTypeInfo("smallint")`	`SmallIntType`	true
`PrimitiveTypeInfo("int")`	`IntType`	true
`PrimitiveTypeInfo("bigint")`	`BigIntType`	true
`PrimitiveTypeInfo("float")`	`FloatType`	true
`PrimitiveTypeInfo("double")`	`DoubleType`	true
`CharTypeInfo(length)`	`CharType(length)`	true
`PrimitiveTypeInfo("string")`	`VarCharType(VarCharType.MAX_LENGTH)`	true
`VarcharTypeInfo(length)`	`VarCharType(length), length is less than VarCharType.MAX_LENGTH`	true
`PrimitiveTypeInfo("date")`	`DateType`	true
`PrimitiveTypeInfo("timestamp")`	`TimestampType`	true
`DecimalTypeInfo(precision, scale)`	`DecimalType(precision, scale)`	true
`PrimitiveTypeInfo("binary")`	`VarBinaryType`, `BinaryType`	true

原文地址：https://blog.csdn.net/m0_50186249/article/details/136389540

免责声明：本站文章内容转载自网络资源，如本站内容侵犯了原著者的合法权益，可联系本站删除。更多内容请关注自学内容网（zxcms.com）！

上一篇：CV论文--2024.3.1
下一篇：idea项目中文乱码

第2章-PostgreSQL 15安装及登录
PostgreSQL数据库Windows及Linux下安装操作
阅读更多2024-11-18
操作系统学习笔记-5 传输层
传输层TCP协议，UDP协议
阅读更多2024-11-18
代码随想录第46期单调栈
这道题主要是单调栈的简单应用。比上一题多了个处理循环的操作。这道题同样是一个双指针问题。与上一题类似，但是更麻烦些。也可以是直接扩充数组。
阅读更多2024-11-18
常见长选项和短选项对应表
【代码】常见长选项和短选项对应表。
阅读更多2024-11-18
学习日记_20241115_聚类方法（DBSCAN）
学习日记，聚类方法DBSCAN
阅读更多2024-11-18
chatgpt训练需要什么样的gpu硬件
**显存容量**: 训练大型语言模型需要处理大量的数据和模型参数，因此需要大显存。- **散热系统**: 高性能GPU在训练过程中会产生大量的热量，因此需要良好的散热系统来保证GPU的稳定运行。- *
阅读更多2024-11-18
Redis设计与实现学习笔记第十八章发布与订阅
因为服务器状态中的pubsub_patterns链表记录了所有模式的订阅关系，所以为了将消息发送给所有与channel频道相匹配的模式的订阅者，PUBLISH命令要做的就是遍历整个pubsub_pat
阅读更多2024-11-18
C++和OpenGL实现3D游戏编程【连载18】——加载OBJ三维模型
以前我们加载过立方体木箱，立方体的顶点数据都是在程序运行时临时定义的。但后期如果模型数量增多，模型逐步复杂，我们就必须加载外部模型文件。这节课我们就先了解一下加载OBJ模型文件的方法，这样可以让编程和
阅读更多2024-11-18
LeetCode题解：18.四数之和【Python题解超详细】，三数之和 vs. 四数之和
LeetCode题解：18.四数之和【Python题解超详细】，四数之和 vs. 三数之和的异同，求解五数之和。四数之和：给你一个由n个整数组成的数组nums，和一个目标值target。请你找出并返回
阅读更多2024-11-18
C/C++学习-常量指针&指针常量
指针常量指的是指针本身是一个常量，也就是说，一旦指针被初始化指向某个地址，它的值就不能再改变，但是它可以用来修改它所指向的数据（前提是该数据不是常量）。当我们在C或C++中使用一个常量指针指向常量数据
阅读更多2024-11-18