Apache Paimon Hive引擎解析
HIve 引擎
Paimon 当前支持 Hive 的 3.1, 2.3, 2.2, 2.1 和 2.1-cdh-6.3 版本。
1.执行引擎
当使用Hive Read数据时,Paimon 支持 MR 和 Tez 引擎, 当使用Hive Write数据时,Paimon支持MR引擎,如果使用 beeline,需要重启hive cluster。
2.安装
Version Jar
Hive 3.1 paimon-hive-connector-3.1-0.7.0-incubating.jar
Hive 2.3 paimon-hive-connector-2.3-0.7.0-incubating.jar
Hive 2.2 paimon-hive-connector-2.2-0.7.0-incubating.jar
Hive 2.1 paimon-hive-connector-2.1-0.7.0-incubating.jar
Hive 2.1-cdh-6.3paimon-hive-connector-2.1-cdh-6.3-0.7.0-incubating.jar
将Jar添加到Hive中有以下方法:
- 在Hive的根目录下创建一个
auxlib
文件夹,并将paimon-hive-connector-0.7.0-incubating.jar
复制到auxlib
中。 - 将jar复制到Hive可访问的路径,然后使用
add jar /path/to/paimon-hive-connector-0.7.0-incubating.jar
在Hive中启用paimon支持,注意,不建议使用这种方法,如果使用MR引擎运行Join语句,可能报错org.apache.hive.com.esotericsoftware.kryo.kryoexception: unable to find class
。
注意:
- 如果使用HDFS,请确保设置了环境变量
HADOOP_HOME
或HADOOP_CONF_DIR
。 - 如果使用hive cbo优化器,可能会导致一些不正确的查询结果,例如使用
not null
谓词查询struct
类型,可以通过set hive.cbo.enable=false;
命令禁用cbo优化器。
3.Flink SQL: 使用 Paimon Hive Catalog
使用paimon Hive catalog,可以通过Flink create,drop,select 和 insert 到 paimon 表中,这些操作直接影响对应的Hive metastore,以这种方式创建的表可以直接从Hive访问。
步骤一:
准备Flink Hive Connector Bundled Jar。
步骤二:
在Flink SQL Client中执行以下Flink SQL脚本,以定义Paimon Hive catalog并创建Table。
-- Flink SQL CLI
-- Define paimon Hive catalog
CREATE CATALOG my_hive WITH (
'type' = 'paimon',
'metastore' = 'hive',
-- 'uri' = 'thrift://<hive-metastore-host-name>:<port>', default use 'hive.metastore.uris' in HiveConf
-- 'hive-conf-dir' = '...', this is recommended in the kerberos environment
-- 'hadoop-conf-dir' = '...', this is recommended in the kerberos environment
-- 'warehouse' = 'hdfs:///path/to/table/store/warehouse', default use 'hive.metastore.warehouse.dir' in HiveConf
);
-- Use paimon Hive catalog
USE CATALOG my_hive;
-- Create a table in paimon Hive catalog (use "default" database by default)
CREATE TABLE test_table (
a int,
b string
);
-- Insert records into test table
INSERT INTO test_table VALUES (1, 'Table'), (2, 'Store');
-- Read records from test table
SELECT * FROM test_table;
/*
+---+-------+
| a | b |
+---+-------+
| 1 | Table |
| 2 | Store |
+---+-------+
*/
4.Hive SQL: 访问已经在 Hive metastore 中的 Paimon Tables
在Hive CLI中运行以下Hive SQL访问创建的表。
-- Assume that paimon-hive-connector-<hive-version>-0.7.0-incubating.jar is already in auxlib directory.
-- List tables in Hive
-- (you might need to switch to "default" database if you're not there by default)
SHOW TABLES;
/*
OK
test_table
*/
-- Read records from test_table
SELECT a, b FROM test_table ORDER BY a;
/*
OK
1Table
2Store
*/
-- Insert records into test table
-- Note tez engine does not support hive write, only the hive engine is supported.
INSERT INTO test_table VALUES (3, 'Paimon');
SELECT a, b FROM test_table ORDER BY a;
/*
OK
1Table
2Store
3Paimon
*/
-- time travel
SET paimon.scan.snapshot-id=1;
SELECT a, b FROM test_table ORDER BY a;
/*
OK
1Table
2Store
3Paimon
*/
SET paimon.scan.snapshot-id=null;
5.Hive SQL: 创建新的 Paimon Tables
在Hive CLI中运行以下Hive SQL创建新的paimon表。
-- Assume that paimon-hive-connector-0.7.0-incubating.jar is already in auxlib directory.
-- Let's create a new paimon table.
SET hive.metastore.warehouse.dir=warehouse_path;
CREATE TABLE hive_test_table(
a INT COMMENT 'The a field',
b STRING COMMENT 'The b field'
)
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler';
6.Hive SQL: 通过外部表访问 Paimon 的表
在Hive中将paimon表注册为外部表,在Hive CLI中运行以下Hive SQL来访问。
-- Assume that paimon-hive-connector-0.7.0-incubating.jar is already in auxlib directory.
-- Let's use the test_table created in the above section.
-- To create an external table, you don't need to specify any column or table properties.
-- Pointing the location to the path of table is enough.
CREATE EXTERNAL TABLE external_test_table
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler'
LOCATION '/path/to/table/store/warehouse/default.db/test_table';
-- In addition to the way setting location above, you can also place the location setting in TBProperties
-- to avoid Hive accessing Paimon's location through its own file system when creating tables.
-- This method is effective in scenarios using Object storage,such as s3.
CREATE EXTERNAL TABLE external_test_table
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler'
TBLPROPERTIES (
'paimon_location' ='s3://xxxxx/path/to/table/store/warehouse/default.db/test_table'
);
-- Read records from external_test_table
SELECT a, b FROM external_test_table ORDER BY a;
/*
OK
1Table
2Store
*/
-- Insert records into test table
INSERT INTO external_test_table VALUES (3, 'Paimon');
SELECT a, b FROM external_test_table ORDER BY a;
/*
OK
1Table
2Store
3Paimon
*/
7.Hive 和 Paimon 的类型映射
列出了Hive和Paimon之间所有支持的类型转换,Hive的所有数据类型都可以在org.apache.hadoop.hive.serde2.typeinfo
中找到。
Hive Data Type | Paimon Data Type | Atomic Type |
---|---|---|
StructTypeInfo | RowType | false |
MapTypeInfo | MapType | false |
ListTypeInfo | ArrayType | false |
PrimitiveTypeInfo("boolean") | BooleanType | true |
PrimitiveTypeInfo("tinyint") | TinyIntType | true |
PrimitiveTypeInfo("smallint") | SmallIntType | true |
PrimitiveTypeInfo("int") | IntType | true |
PrimitiveTypeInfo("bigint") | BigIntType | true |
PrimitiveTypeInfo("float") | FloatType | true |
PrimitiveTypeInfo("double") | DoubleType | true |
CharTypeInfo(length) | CharType(length) | true |
PrimitiveTypeInfo("string") | VarCharType(VarCharType.MAX_LENGTH) | true |
VarcharTypeInfo(length) | VarCharType(length), length is less than VarCharType.MAX_LENGTH | true |
PrimitiveTypeInfo("date") | DateType | true |
PrimitiveTypeInfo("timestamp") | TimestampType | true |
DecimalTypeInfo(precision, scale) | DecimalType(precision, scale) | true |
PrimitiveTypeInfo("binary") | VarBinaryType , BinaryType | true |
原文地址:https://blog.csdn.net/m0_50186249/article/details/136389540
免责声明:本站文章内容转载自网络资源,如本站内容侵犯了原著者的合法权益,可联系本站删除。更多内容请关注自学内容网(zxcms.com)!