自学内容网 自学内容网

Apache Paimon Hive引擎解析

HIve 引擎

Paimon 当前支持 Hive 的 3.1, 2.3, 2.2, 2.1 和 2.1-cdh-6.3 版本。

1.执行引擎

当使用Hive Read数据时,Paimon 支持 MR 和 Tez 引擎, 当使用Hive Write数据时,Paimon支持MR引擎,如果使用 beeline,需要重启hive cluster。

2.安装
Version           Jar
Hive 3.1        paimon-hive-connector-3.1-0.7.0-incubating.jar
Hive 2.3        paimon-hive-connector-2.3-0.7.0-incubating.jar
Hive 2.2        paimon-hive-connector-2.2-0.7.0-incubating.jar
Hive 2.1        paimon-hive-connector-2.1-0.7.0-incubating.jar
Hive 2.1-cdh-6.3paimon-hive-connector-2.1-cdh-6.3-0.7.0-incubating.jar

将Jar添加到Hive中有以下方法

  • 在Hive的根目录下创建一个auxlib文件夹,并将paimon-hive-connector-0.7.0-incubating.jar复制到auxlib中。
  • 将jar复制到Hive可访问的路径,然后使用add jar /path/to/paimon-hive-connector-0.7.0-incubating.jar在Hive中启用paimon支持,注意,不建议使用这种方法,如果使用MR引擎运行Join语句,可能报错org.apache.hive.com.esotericsoftware.kryo.kryoexception: unable to find class

注意

  • 如果使用HDFS,请确保设置了环境变量HADOOP_HOMEHADOOP_CONF_DIR
  • 如果使用hive cbo优化器,可能会导致一些不正确的查询结果,例如使用not null谓词查询struct类型,可以通过set hive.cbo.enable=false;命令禁用cbo优化器。
3.Flink SQL: 使用 Paimon Hive Catalog

使用paimon Hive catalog,可以通过Flink create,drop,select 和 insert 到 paimon 表中,这些操作直接影响对应的Hive metastore,以这种方式创建的表可以直接从Hive访问。

步骤一

准备Flink Hive Connector Bundled Jar。

步骤二

在Flink SQL Client中执行以下Flink SQL脚本,以定义Paimon Hive catalog并创建Table。

-- Flink SQL CLI
-- Define paimon Hive catalog

CREATE CATALOG my_hive WITH (
  'type' = 'paimon',
  'metastore' = 'hive',
  -- 'uri' = 'thrift://<hive-metastore-host-name>:<port>', default use 'hive.metastore.uris' in HiveConf
  -- 'hive-conf-dir' = '...', this is recommended in the kerberos environment
  -- 'hadoop-conf-dir' = '...', this is recommended in the kerberos environment
  -- 'warehouse' = 'hdfs:///path/to/table/store/warehouse', default use 'hive.metastore.warehouse.dir' in HiveConf
);

-- Use paimon Hive catalog

USE CATALOG my_hive;

-- Create a table in paimon Hive catalog (use "default" database by default)

CREATE TABLE test_table (
  a int,
  b string
);

-- Insert records into test table

INSERT INTO test_table VALUES (1, 'Table'), (2, 'Store');

-- Read records from test table

SELECT * FROM test_table;

/*
+---+-------+
| a |     b |
+---+-------+
| 1 | Table |
| 2 | Store |
+---+-------+
*/
4.Hive SQL: 访问已经在 Hive metastore 中的 Paimon Tables

在Hive CLI中运行以下Hive SQL访问创建的表。

-- Assume that paimon-hive-connector-<hive-version>-0.7.0-incubating.jar is already in auxlib directory.
-- List tables in Hive
-- (you might need to switch to "default" database if you're not there by default)

SHOW TABLES;

/*
OK
test_table
*/

-- Read records from test_table

SELECT a, b FROM test_table ORDER BY a;

/*
OK
1Table
2Store
*/

-- Insert records into test table
-- Note tez engine does not support hive write, only the hive engine is supported.

INSERT INTO test_table VALUES (3, 'Paimon');

SELECT a, b FROM test_table ORDER BY a;

/*
OK
1Table
2Store
3Paimon
*/

-- time travel

SET paimon.scan.snapshot-id=1;
SELECT a, b FROM test_table ORDER BY a;
/*
OK
1Table
2Store
3Paimon
*/
SET paimon.scan.snapshot-id=null;
5.Hive SQL: 创建新的 Paimon Tables

在Hive CLI中运行以下Hive SQL创建新的paimon表。

-- Assume that paimon-hive-connector-0.7.0-incubating.jar is already in auxlib directory.
-- Let's create a new paimon table.

SET hive.metastore.warehouse.dir=warehouse_path;

CREATE TABLE hive_test_table(
    a INT COMMENT 'The a field',
    b STRING COMMENT 'The b field'
)
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler';
6.Hive SQL: 通过外部表访问 Paimon 的表

在Hive中将paimon表注册为外部表,在Hive CLI中运行以下Hive SQL来访问。

-- Assume that paimon-hive-connector-0.7.0-incubating.jar is already in auxlib directory.
-- Let's use the test_table created in the above section.
-- To create an external table, you don't need to specify any column or table properties.
-- Pointing the location to the path of table is enough.

CREATE EXTERNAL TABLE external_test_table
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler'
LOCATION '/path/to/table/store/warehouse/default.db/test_table';
    
-- In addition to the way setting location above, you can also place the location setting in TBProperties
-- to avoid Hive accessing Paimon's location through its own file system when creating tables.
-- This method is effective in scenarios using Object storage,such as s3.

CREATE EXTERNAL TABLE external_test_table
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler'
TBLPROPERTIES (
 'paimon_location' ='s3://xxxxx/path/to/table/store/warehouse/default.db/test_table'
);

-- Read records from external_test_table

SELECT a, b FROM external_test_table ORDER BY a;

/*
OK
1Table
2Store
*/

-- Insert records into test table

INSERT INTO external_test_table VALUES (3, 'Paimon');

SELECT a, b FROM external_test_table ORDER BY a;

/*
OK
1Table
2Store
3Paimon
*/
7.Hive 和 Paimon 的类型映射

列出了Hive和Paimon之间所有支持的类型转换,Hive的所有数据类型都可以在org.apache.hadoop.hive.serde2.typeinfo中找到。

Hive Data TypePaimon Data TypeAtomic Type
StructTypeInfoRowTypefalse
MapTypeInfoMapTypefalse
ListTypeInfoArrayTypefalse
PrimitiveTypeInfo("boolean")BooleanTypetrue
PrimitiveTypeInfo("tinyint")TinyIntTypetrue
PrimitiveTypeInfo("smallint")SmallIntTypetrue
PrimitiveTypeInfo("int")IntTypetrue
PrimitiveTypeInfo("bigint")BigIntTypetrue
PrimitiveTypeInfo("float")FloatTypetrue
PrimitiveTypeInfo("double")DoubleTypetrue
CharTypeInfo(length)CharType(length)true
PrimitiveTypeInfo("string")VarCharType(VarCharType.MAX_LENGTH)true
VarcharTypeInfo(length)VarCharType(length), length is less than VarCharType.MAX_LENGTHtrue
PrimitiveTypeInfo("date")DateTypetrue
PrimitiveTypeInfo("timestamp")TimestampTypetrue
DecimalTypeInfo(precision, scale)DecimalType(precision, scale)true
PrimitiveTypeInfo("binary")VarBinaryType, BinaryTypetrue

原文地址:https://blog.csdn.net/m0_50186249/article/details/136389540

免责声明:本站文章内容转载自网络资源,如本站内容侵犯了原著者的合法权益,可联系本站删除。更多内容请关注自学内容网(zxcms.com)!