1

zk version: 3.8.3

3.8.3-6ad6d364c7c0bcf0de452d54ebefa3058098ab56, built on 2023-10-05 10:34 UTC

first of all

my zk cluster crushed, and i want to findout the reason.

  1. i dump the snapshot and datalog to my test server
  2. but it can't startup either
  3. then i delete the datalog file and restart , it succeed
  4. BUT when i stoped the instance and restart , it failed, can't startup
  5. and it print the log like this
2024-06-19 13:46:36,287 [myid:3] - DEBUG [main:o.a.z.s.DataTree@1816] - Digests are matching for Zxid: 600014b95, Digest in log and actual tree: 255870301941346
2024-06-19 13:46:36,287 [myid:3] - TRACE [main:o.a.z.s.ZooTrace@78] - playLog --- close session in log: 0x11fbb7e4d9ec593
2024-06-19 13:46:36,287 [myid:3] - DEBUG [main:o.a.z.s.DataTree@1816] - Digests are matching for Zxid: 600014b96, Digest in log and actual tree: 255870301941346
2024-06-19 13:46:36,287 [myid:3] - TRACE [main:o.a.z.s.ZooTrace@78] - playLog --- close session in log: 0x437e2bc2178aef2
2024-06-19 13:46:36,287 [myid:3] - DEBUG [main:o.a.z.s.DataTree@1816] - Digests are matching for Zxid: 600014b97, Digest in log and actual tree: 255870301941346
2024-06-19 13:46:36,288 [myid:3] - ERROR [main:o.a.z.s.ZooKeeperServerMain@91] - Unexpected exception, exiting abnormally
java.io.IOException: Unreasonable length = 2456732
    at org.apache.jute.BinaryInputArchive.checkLength(BinaryInputArchive.java:166)
    at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:127)
    at org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:159)
    at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:750)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:361)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.lambda$restore$0(FileTxnSnapLog.java:267)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:312)
    at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:285)
    at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:531)
    at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:704)
    at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:744)
    at org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:130)
    at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:161)
    at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:113)
    at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:68)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:141)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:91)
2024-06-19 13:46:36,288 [myid:3] - ERROR [main:o.a.z.a.Slf4jAuditLogger@33] - user=root operation=serverStart   result=failure
2024-06-19 13:46:36,291 [myid:3] - ERROR [main:o.a.z.u.ServiceUtils@48] - Exiting JVM with code 1

Can anyone explain this situation? I tried parsing datalogs directly, and found a lot of 'closeSession' event logs,but it seems like there is no log bigger than 1Mb。It confused me a lot.

THX A LOT

when I use

/usr/share/apache-zookeeper-3.8.3-bin/bin/zkTxnLogToolkit.sh -d dataLog/version-2/log.600012cc0

I found dataLog have a lot 'closeSession' EVENT,maybe some session hold a lot transiant ZNode?

it seems like this issue, but it fixed.

0

Browse other questions tagged or ask your own question.