Hbase 使用高可用 (HA)的hadoop集群,hbase.rootdir如何配置

问题描述

在前面文章中搭建了高可用的hadoop集群,然后hbase使用这个集群的hdfs。但是我们d hbase.rootdir配置的仍然是写死的机器:

<property>
        <name>hbase.rootdir</name>
        <value>hdfs://node1:9000/hbase</value>
</property>

如果node1此时宕机,处于active状态的nameserver变成了node2,那么hbase集群将不可用,启动hbase集群时将会报错org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby

2019-07-22 11:11:24,431 ERROR [master/node1:16000:becomeActiveMaster] master.HMaster: ***** ABORTING master node1,16000,1563765070980: Unhandled exception. Starting shutdown. *****
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby
	at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
	at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1802)
	//省略部分日志
2019-07-22 11:11:24,431 INFO  [master/node1:16000:becomeActiveMaster] regionserver.HRegionServer: ***** STOPPING region server 'node1,16000,1563765070980' *****
2019-07-22 11:11:24,432 INFO  [master/node1:16000:becomeActiveMaster] regionserver.HRegionServer: STOPPED: Stopped by master/node1:16000:becomeActiveMaster
2019-07-22 11:11:27,078 INFO  [master/node1:16000] ipc.NettyRpcServer: Stopping server on /192.168.229.128:16000
2019-07-22 11:11:27,097 WARN  [master/node1:16000] regionserver.HRegionServer: Initialize abort timeout task failed
java.lang.IllegalAccessException: Class org.apache.hadoop.hbase.regionserver.HRegionServer can not access a member of class org.apache.hadoop.hbase.regionserver.HRegionServer$SystemExitWhenAbortTimeout with modifiers "private"
	at sun.reflect.Reflection.ensureMemberAccess(Reflection.java:102)
	at java.lang.reflect.AccessibleObject.slowCheckMemberAccess(AccessibleObject.java:296)
	at java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:288)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:413)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1044)
	at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:598)
	at java.lang.Thread.run(Thread.java:745)
2019-07-22 11:11:27,097 INFO  [master/node1:16000] regionserver.HRegionServer: Stopping infoServer
2019-07-22 11:11:27,128 INFO  [master/node1:16000] handler.ContextHandler: Stopped o.e.j.w.WebAppContext@168cd36b{/,null,UNAVAILABLE}{file:/data/program/hbase-2.1.5/hbase-webapps/master}
2019-07-22 11:11:27,143 INFO  [master/node1:16000] server.AbstractConnector: Stopped ServerConnector@319c3a25{HTTP/1.1,[http/1.1]}{0.0.0.0:16010}
2019-07-22 11:11:27,147 INFO  [master/node1:16000] handler.ContextHandler: Stopped o.e.j.s.ServletContextHandler@3b2f4a93{/static,file:///data/program/hbase-2.1.5/hbase-webapps/static/,UNAVAILABLE}
2019-07-22 11:11:27,148 INFO  [master/node1:16000] handler.ContextHandler: Stopped o.e.j.s.ServletContextHandler@3ffb3598{/logs,file:///data/program/hbase-2.1.5/logs/,UNAVAILABLE}
2019-07-22 11:11:27,151 INFO  [master/node1:16000] regionserver.HRegionServer: aborting server node1,16000,1563765070980
2019-07-22 11:11:27,169 INFO  [master/node1:16000] regionserver.HRegionServer: stopping server node1,16000,1563765070980; all regions closed.
2019-07-22 11:11:27,170 INFO  [master/node1:16000] hbase.ChoreService: Chore service for: master/node1:16000 had [] on shutdown
2019-07-22 11:11:27,175 WARN  [master/node1:16000] master.ActiveMasterManager: Failed get of master address: java.io.IOException: Can't get master address from ZooKeeper; znode data == null
2019-07-22 11:11:27,193 INFO  [master/node1:16000] zookeeper.ZooKeeper: Session: 0x16c178f38b80009 closed
2019-07-22 11:11:27,194 INFO  [master/node1:16000] regionserver.HRegionServer: Exiting; stopping=node1,16000,1563765070980; zookeeper connection closed.
2019-07-22 11:11:27,195 ERROR [main] master.HMasterCommandLine: Master exiting
java.lang.RuntimeException: HMaster Aborted
	at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:244)
	at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)
	at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:3117)

解决办法

这里需要配置HA高可用的hdfs集群地址,而不是写死的某台机器。

修改hbase-site.xml

<property>
        <name>hbase.rootdir</name>
        <value>hdfs://ns1/hbase</value>
</property>

备注:这里ns1来自于hdfs-site.xml的配置dfs.nameservices:

<configuration>
	<!--指定hdfs的nameservice为ns1,需要和core-site.xml中的保持一致 -->
	<property>
	    <name>dfs.nameservices</name>
	    <value>ns1</value>
	</property>
//省略其他配置

同时将hadoop的配置文件hdfs-site.xmlcore-site.xml复制到hbase的conf目录下。不然会报找不到myha的错误。

重启hbase集群即可。

网上很多都说遇到java.lang.RuntimeException: HMaster Aborted错误需要清理zk:因为在CDH时重新添加删除HBASE导致的,需要清理zk中的hbase缓存,将zk的/hbase删除即可。
但是并不是所有问题都说由于zk造成的,看日志要多看一点,看到其根本原因。

发布了234 篇原创文章 · 获赞 227 · 访问量 95万+
展开阅读全文

没有更多推荐了,返回首页

©️2019 CSDN 皮肤主题: Age of Ai 设计师: meimeiellie

分享到微信朋友圈

×

扫一扫,手机浏览