这两天顺手把Hadoop集群升级了一把,但是运行testdfsio:

hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.3.4-tests.jar TestDFSIO -write -nrFiles 1 -size 1GB -resFile /tmp/DFSIO-write.out

时候始终有个问题:

Stackmap Table:
same_frame(@19)
same_frame(@31)
same_frame(@40)

at org.apache.hadoop.mapreduce.v2.proto.MRProtos$JobIdProto.newBuilder(MRProtos.java:1017)
at org.apache.hadoop.mapreduce.v2.api.records.impl.pb.JobIdPBImpl.<init>(JobIdPBImpl.java:37)
... 15 more
2023-03-10 16:38:45,412 ERROR [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.yarn.factories.impl.pb.RecordFactoryPBImpl.newRecordInstance(RecordFactoryPBImpl.java:76)
at org.apache.hadoop.yarn.util.Records.newRecord(Records.java:36)
at org.apache.hadoop.mapreduce.v2.util.MRBuilderUtils.newJobId(MRBuilderUtils.java:39)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:299)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$6.run(MRAppMaster.java:1760)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1757)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1691)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.yarn.factories.impl.pb.RecordFactoryPBImpl.newRecordInstance(RecordFactoryPBImpl.java:73)
... 10 more
Caused by: java.lang.VerifyError: Bad type on operand stack
Exception Details:
Location:
org/apache/hadoop/mapreduce/v2/proto/MRProtos$JobIdProto$Builder.setAppId(Lorg/apache/hadoop/yarn/proto/YarnProtos$ApplicationIdProto;)Lorg/apache/hadoop/mapreduce/v2/proto/MRProtos$JobIdProto$Builder; @36: invokevirtual
Reason:
Type 'org/apache/hadoop/yarn/proto/YarnProtos$ApplicationIdProto' (current frame, stack[1]) is not assignable to 'com/google/protobuf/GeneratedMessage'
Current Frame:
bci: @36
flags: { }
locals: { 'org/apache/hadoop/mapreduce/v2/proto/MRProtos$JobIdProto$Builder', 'org/apache/hadoop/yarn/proto/YarnProtos$ApplicationIdProto' }
stack: { 'com/google/protobuf/SingleFieldBuilder', 'org/apache/hadoop/yarn/proto/YarnProtos$ApplicationIdProto' }
Bytecode:
0x0000000: 2ab4 0011 c700 1b2b c700 0bbb 002f 59b7
0x0000010: 0030 bf2a 2bb5 000a 2ab6 0031 a700 0c2a
0x0000020: b400 112b b600 3257 2a59 b400 1304 80b5
0x0000030: 0013 2ab0
Stackmap Table:
same_frame(@19)
same_frame(@31)
same_frame(@40)

看起来就是proto的兼容性问题,我去读了读对应的MR APP的代码,没有发现哪儿不对劲,后来发现是自己之前在mapred-site.xml里面写了:

<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=/workspace/fcbai/enhanced-sts/hadoop-3.3.4</value>
</property>

<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=/workspace/fcbai/enhanced-sts/hadoop-3.3.4</value>
</property>

<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=/workspace/fcbai/enhanced-sts/hadoop-3.3.4</value>
</property>

把对应的地址给配置在这个里面了,而这里配置的PATH的地址,和升级后的Hadoop不是同一个PATH,导致不同版本之间的proto协议是不兼容的,调整这个后,MR作业没问题了,但是依旧会有Cluster ID的问题:

2023-03-10 16:41:33,901 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /Users/fcbai/workspace/fcbai/enhanced-sts/hadoop/data/in_use.lock acquired by nodename 50342@C02G328CMD6V
2023-03-10 16:41:33,903 WARN org.apache.hadoop.hdfs.server.common.Storage: Failed to add storage directory [DISK]file:/Users/fcbai/workspace/fcbai/enhanced-sts/hadoop/data
java.io.IOException: Incompatible clusterIDs in /Users/fcbai/workspace/fcbai/enhanced-sts/hadoop/data: namenode clusterID = CID-97763e54-0903-4260-af24-f055d2143698; datanode clusterID = CID-5b2b5976-0534-4de2-923d-1d797e02854e
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:746)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadStorageDirectory(DataStorage.java:296)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadDataStorage(DataStorage.java:409)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:389)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:561)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1739)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1675)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:394)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:295)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:854)
at java.lang.Thread.run(Thread.java:748)
2023-03-10 16:41:33,905 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid 0802e860-cde0-4ddf-bad2-caea377c2a33) service to localhost/127.0.0.1:8020. Exiting.
java.io.IOException: All specified directories have failed to load.
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:562)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1739)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1675)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:394)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:295)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:854)
at java.lang.Thread.run(Thread.java:748)
2023-03-10 16:41:33,905 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool <registering> (Datanode Uuid 0802e860-cde0-4ddf-bad2-caea377c2a33) service to localhost/127.0.0.1:8020
2023-03-10 16:41:33,906 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool <registering> (Datanode Uuid 0802e860-cde0-4ddf-bad2-caea377c2a33)
2023-03-10 16:41:35,907 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2023-03-10 16:41:35,910 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:

这个倒好说,因为每次namenode format会重新创建一个namenodeId,而data目录包含了上次format时的id,namenode format清空了namenode下的数据,但是没有清空datanode下的数据,导致启动时失败,所要做的就是每次fotmat前,清空data下的所有目录.

如果想保留数据,则先停掉集群,然后将datanode节点目录/dfs/data/current/VERSION中的修改为与namenode一致即可,如果不需要数据的话停掉集群,删除问题节点的data目录下的所有内容。

即hdfs-site.xml文件中配置的dfs.data.dir目录。重新格式化namenode。


扫码手机观看或分享: