Hive使用msck进行整表修复的注意事项

对于分区表，hive在迁移的时候，或者存算分离的架构下会很容易出现数据和元数据分离，也就是数据已经出现在底层存储之上，但是元数据还没出现在HMS中，这时候可以挨个按照分区去添加，也可以使用:

msck repair table TableName

对表做一次整体升级，使用msck的时候，需要注意点问题，一个是内存，因为msck是跑在CliDriver的，所以需要额外提升CliDriver的堆内存，具体的配置方式是在hive-env.sh里面增加：

export HADOOP_HEAPSIZE=4096

这样的方式，这里需要注意一个细节，hive-env.sh的结构是这样的：

if [ "$SERVICE" = "metastore" ]; then
  XXXX
fi

if [ "$SERVICE" = "service" ]; then
  XXXX
fi

这种情况下如果使用hive这个命令直接进去的话，$SERVICE 是client或者cli，是不会生效的，需要在if前统一配置。

msck在实现的时候是src/main/java/org/apache/hadoop/hive/metastore/Msck.java里面：

int batchSize = MetastoreConf.getIntVar(getConf(), MetastoreConf.ConfVars.MSCK_REPAIR_BATCH_SIZE);
 if (batchSize == 0) {
   //batching is not enabled. Try to add all the partitions in one call
   batchSize = partsNotInMs.size();
 }

也就是在修复分区的时候，是按照一个batch分批去更新背后的数据库的，batch的值是：

MSCK_REPAIR_BATCH_SIZE("msck.repair.batch.size",
      "hive.msck.repair.batch.size", 3000,
      "Batch size for the msck repair command. If the value is greater than zero,\n "
        + "it will execute batch wise with the configured batch size. In case of errors while\n"
        + "adding unknown partitions the batch size is automatically reduced by half in the subsequent\n"
        + "retry attempt. The default value is 3000 which means it will execute in the batches of 3000."),

默认是3000，可以通过修改hive-site，调整hive.msck.repair.batch.size去修改这个值，否则可能引起连接数据库的timeout，当一批太大，或者数据库配置太低的时候，也可以增大thrift的连接超时。

调整metastore.client.socket.timeout参数，默认是600秒也就是10分钟，修改这个参数，去提升连接超时。

jstat -gcutil PID 500 500

在修复过程中可以通过如上命令，检测内存full gc的变化。

如果发现msck比较慢的话，可以通过 jstack hivemetastore进程id 去查看状态，还有可以把hive.metastore.transactional.event.listeners设置成空，取消这个 notification 的动作。

hive.metastore.event.listeners 这个也可以取消。

也就是：从 hive-site 和 Advanced hivemetastore-site 中，删除 hive.metastore.transactional.event.listeners 的值 org.apache.hive.hcatalog.listener.DbNotificationListener。

如果 hive.metastore.event.listeners 有值，请删除它。

DbNotificationListener 仅在使用 REPL 命令时才需要，如果不需要，可以将它删除。

DbNotificationListener本身没有问题，但是某些情况下会有bug，造成背后的数据库的死锁，参考issue：https://issues.apache.org/jira/browse/HIVE-24363

扫码手机观看或分享：

惊帆的BLOG

关于我

Hive使用msck进行整表修复的注意事项