zk version: 3.8.3
3.8.3-6ad6d364c7c0bcf0de452d54ebefa3058098ab56, built on 2023-10-05 10:34 UTC
first of all
my zk cluster crushed, and i want to findout the reason.
- i dump the snapshot and datalog to my test server
- but it can't startup either
- then i delete the datalog file and restart , it succeed
- BUT when i stoped the instance and restart , it failed, can't startup
- and it print the log like this
2024-06-19 13:46:36,287 [myid:3] - DEBUG [main:o.a.z.s.DataTree@1816] - Digests are matching for Zxid: 600014b95, Digest in log and actual tree: 255870301941346
2024-06-19 13:46:36,287 [myid:3] - TRACE [main:o.a.z.s.ZooTrace@78] - playLog --- close session in log: 0x11fbb7e4d9ec593
2024-06-19 13:46:36,287 [myid:3] - DEBUG [main:o.a.z.s.DataTree@1816] - Digests are matching for Zxid: 600014b96, Digest in log and actual tree: 255870301941346
2024-06-19 13:46:36,287 [myid:3] - TRACE [main:o.a.z.s.ZooTrace@78] - playLog --- close session in log: 0x437e2bc2178aef2
2024-06-19 13:46:36,287 [myid:3] - DEBUG [main:o.a.z.s.DataTree@1816] - Digests are matching for Zxid: 600014b97, Digest in log and actual tree: 255870301941346
2024-06-19 13:46:36,288 [myid:3] - ERROR [main:o.a.z.s.ZooKeeperServerMain@91] - Unexpected exception, exiting abnormally
java.io.IOException: Unreasonable length = 2456732
at org.apache.jute.BinaryInputArchive.checkLength(BinaryInputArchive.java:166)
at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:127)
at org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:159)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:750)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:361)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.lambda$restore$0(FileTxnSnapLog.java:267)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:312)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:285)
at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:531)
at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:704)
at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:744)
at org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:130)
at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:161)
at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:113)
at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:68)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:141)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:91)
2024-06-19 13:46:36,288 [myid:3] - ERROR [main:o.a.z.a.Slf4jAuditLogger@33] - user=root operation=serverStart result=failure
2024-06-19 13:46:36,291 [myid:3] - ERROR [main:o.a.z.u.ServiceUtils@48] - Exiting JVM with code 1
Can anyone explain this situation? I tried parsing datalogs directly, and found a lot of 'closeSession' event logs,but it seems like there is no log bigger than 1Mb。It confused me a lot.
THX A LOT
when I use
/usr/share/apache-zookeeper-3.8.3-bin/bin/zkTxnLogToolkit.sh -d dataLog/version-2/log.600012cc0
I found dataLog have a lot 'closeSession' EVENT,maybe some session hold a lot transiant ZNode?
it seems like this issue, but it fixed.