Forum: Managing VoltDB

Post: FATAL: Fatal exception while rejoining after failure

FATAL: Fatal exception while rejoining after failure
mateusz
Sep 19, 2013
Hi

I'm trying to emulate node recovery after failure
I use example/voter database
Kfactor is 1
There are 2 nodes

Initial config: two nodes
asynch-benchmark started on node 1
Then I kill node 2
benchmark continues on node 1, some missing transactions but continues to work
Then I try to recover node 2 and get following error:

Host id of this node is: 5
FATAL: Index: 0, Size: 0
FATAL: Fatal exception
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
        at java.util.ArrayList.RangeCheck(ArrayList.java:547)
        at java.util.ArrayList.get(ArrayList.java:322)
        at org.voltdb.RealVoltDB.initialize(RealVoltDB.java:487)
        at org.voltdb.VoltDB.initialize(VoltDB.java:886)
        at org.voltdb.VoltDB.main(VoltDB.java:870)
VoltDB has encountered an unrecoverable error and is exiting.


last entry from log before crash is:
2013-09-19 21:24:50,496   INFO  [LeaderCache] HOST: Mailbox is not registered for site id -4


I also noted that host id on the node 2 is increased each time I try to rejoin. Is it normal? Shouldn't it be the same every time?

Deployment file is:

<?xml version="1.0"?>
<deployment>
    <cluster hostcount="2" sitesperhost="4" kfactor="1" />
    <httpd enabled="true">
        <jsonapi enabled="true" />
    </httpd>
</deployment>
rmorgenstein
Sep 19, 2013
This (unhelpful) error can be thrown if you don't specify the 'rejoin' start action when trying to restart the node. See http://voltdb.com/docs/UsingVoltDB/KSafeRecover.php for the correct syntax. We already have an issue logged to improve this message.
mateusz
Sep 20, 2013
This (unhelpful) error can be thrown if you don't specify the 'rejoin' start action when trying to restart the node. See http://voltdb.com/docs/UsingVoltDB/KSafeRecover.php for the correct syntax. We already have an issue logged to improve this message.


I tried rejoin but also with no success:

#/voltdb rejoin host xxx deployment deployment.xml
Initializing VoltDB...

 _    __      ____  ____  ____
| |  / /___  / / /_/ __ \/ __ )
| | / / __ \/ / __/ / / / __  |
| |/ / /_/ / / /_/ /_/ / /_/ /
|___/\____/_/\__/_____/_____/

--------------------------------

Build: 3.6.1 This is not from a known repository Community Edition
Connecting to the VoltDB cluster leader /xxx:3021
2 Notified of host 0
Host id of this node is: 2
FATAL: KeeperException getting replicas for partition: 0
FATAL: Fatal exception
org.apache.zookeeper_voltpatches.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /db/leaders/initiators/partition_0
        at org.apache.zookeeper_voltpatches.KeeperException.create(KeeperException.java:101)
        at org.apache.zookeeper_voltpatches.KeeperException.create(KeeperException.java:44)
        at org.apache.zookeeper_voltpatches.ZooKeeper.getChildren(ZooKeeper.java:1301)
        at org.voltdb.iv2.Cartographer.getReplicasForPartition(Cartographer.java:293)
        at org.voltdb.iv2.Cartographer.getReplicaCountForPartition(Cartographer.java:348)
        at org.voltdb.iv2.Cartographer.getIv2PartitionsToReplace(Cartographer.java:400)
        at org.voltdb.RealVoltDB.initialize(RealVoltDB.java:460)
        at org.voltdb.VoltDB.initialize(VoltDB.java:886)
        at org.voltdb.VoltDB.main(VoltDB.java:870)
VoltDB has encountered an unrecoverable error and is exiting.
The log may contain additional information.