在TPCDS场景下使用TEZ引擎执行SQL作业的时候会出现如下问题:

error

看起来像是snappy的问题,但是这个表结构是parquet,并没有使用snappy,进而发现在tez下执行sql只要是涉及到聚合函数也会出现同样的问题,例如:

CREATE TABLE `test_hive02`(
`id` int COMMENT 'id',
`name` string COMMENT '??',
`age` int COMMENT '??',
`create_time` string COMMENT 'create_time')
PARTITIONED BY (
`date` string COMMENT 'date',
`hour` string COMMENT 'hour')
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
insert into test_hive02 partition(`date`='2020-01-01', hour=12) values(1, 'tom', 12, '123');
select sum(id) from test_hive02;

执行checknative检查的时候出现:

root@xxxx:# hadoop checknative -a
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
log4j:WARN custom level class [Relative to Yarn Log Dir Prefix] not found.
SLF4J: Actual binding is of type [org.slf4j.impl.Reload4jLoggerFactory]
22/08/30 11:34:06 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native
22/08/30 11:34:06 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
22/08/30 11:34:06 ERROR snappy.SnappyCompressor: failed to load SnappyCompressor
java.lang.UnsatisfiedLinkError: Cannot load libsnappy.so.1 (libsnappy.so.1: cannot open shared object file: No such file or directory)!
at org.apache.hadoop.io.compress.snappy.SnappyCompressor.initIDs(Native Method)
at org.apache.hadoop.io.compress.snappy.SnappyCompressor.<clinit>(SnappyCompressor.java:57)
at org.apache.hadoop.io.compress.SnappyCodec.isNativeCodeLoaded(SnappyCodec.java:82)
at org.apache.hadoop.util.NativeLibraryChecker.main(NativeLibraryChecker.java:91)
Native library checking:
hadoop: true /opt/emr/2.0.0/hadoop-2.10.2/lib/native/libhadoop.so
zlib: true /lib/x86_64-linux-gnu/libz.so.1
snappy: false
zstd : true /lib/x86_64-linux-gnu/libzstd.so.1
lz4: true revision:10301
bzip2: true /lib/x86_64-linux-gnu/libbz2.so.1
openssl: true /usr/local/lib/libcrypto.so
22/08/30 11:34:06 INFO util.ExitUtil: Exiting with status 1: ExitException

可以看到并没有找到snappy ,在hadoop的core site 文件中可以看到如下的配置:

error

这个问题的原因是因为编译hadoop的时候指定了snappy,例如:

mvn package -Pdist,native -DskipTests -Dtar -Drequire.snappy

所以解决这个问题有2种办法,一种是编译的时候不指定snappy,则不启用,另一种是编译后安装好os相关的snappy依赖。

可以参考这篇文章:https://www.jianshu.com/p/554e033bfa65

看os如何安装snappy。


扫码手机观看或分享: