Announcement

Collapse
No announcement yet.

FATAL: Fatal exception while rejoining after failure

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • FATAL: Fatal exception while rejoining after failure

    Hi

    I'm trying to emulate node recovery after failure
    I use example/voter database
    Kfactor is 1
    There are 2 nodes

    Initial config: two nodes
    asynch-benchmark started on node 1
    Then I kill node 2
    benchmark continues on node 1, some missing transactions but continues to work
    Then I try to recover node 2 and get following error:

    Code:
    Host id of this node is: 5
    FATAL: Index: 0, Size: 0
    FATAL: Fatal exception
    java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
            at java.util.ArrayList.RangeCheck(ArrayList.java:547)
            at java.util.ArrayList.get(ArrayList.java:322)
            at org.voltdb.RealVoltDB.initialize(RealVoltDB.java:487)
            at org.voltdb.VoltDB.initialize(VoltDB.java:886)
            at org.voltdb.VoltDB.main(VoltDB.java:870)
    VoltDB has encountered an unrecoverable error and is exiting.
    last entry from log before crash is:
    Code:
    2013-09-19 21:24:50,496   INFO  [LeaderCache] HOST: Mailbox is not registered for site id -4
    I also noted that host id on the node 2 is increased each time I try to rejoin. Is it normal? Shouldn't it be the same every time?

    Deployment file is:

    Code:
    <?xml version="1.0"?>
    <deployment>
        <cluster hostcount="2" sitesperhost="4" kfactor="1" />
        <httpd enabled="true">
            <jsonapi enabled="true" />
        </httpd>
    </deployment>

  • #2
    This (unhelpful) error can be thrown if you don't specify the 'rejoin' start action when trying to restart the node. See http://voltdb.com/docs/UsingVoltDB/KSafeRecover.php for the correct syntax. We already have an issue logged to improve this message.

    Comment


    • #3
      Originally posted by rmorgenstein View Post
      This (unhelpful) error can be thrown if you don't specify the 'rejoin' start action when trying to restart the node. See http://voltdb.com/docs/UsingVoltDB/KSafeRecover.php for the correct syntax. We already have an issue logged to improve this message.
      I tried rejoin but also with no success:

      Code:
      #/voltdb rejoin host xxx deployment deployment.xml
      Initializing VoltDB...
      
       _    __      ____  ____  ____
      | |  / /___  / / /_/ __ \/ __ )
      | | / / __ \/ / __/ / / / __  |
      | |/ / /_/ / / /_/ /_/ / /_/ /
      |___/\____/_/\__/_____/_____/
      
      --------------------------------
      
      Build: 3.6.1 This is not from a known repository Community Edition
      Connecting to the VoltDB cluster leader /xxx:3021
      2 Notified of host 0
      Host id of this node is: 2
      FATAL: KeeperException getting replicas for partition: 0
      FATAL: Fatal exception
      org.apache.zookeeper_voltpatches.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /db/leaders/initiators/partition_0
              at org.apache.zookeeper_voltpatches.KeeperException.create(KeeperException.java:101)
              at org.apache.zookeeper_voltpatches.KeeperException.create(KeeperException.java:44)
              at org.apache.zookeeper_voltpatches.ZooKeeper.getChildren(ZooKeeper.java:1301)
              at org.voltdb.iv2.Cartographer.getReplicasForPartition(Cartographer.java:293)
              at org.voltdb.iv2.Cartographer.getReplicaCountForPartition(Cartographer.java:348)
              at org.voltdb.iv2.Cartographer.getIv2PartitionsToReplace(Cartographer.java:400)
              at org.voltdb.RealVoltDB.initialize(RealVoltDB.java:460)
              at org.voltdb.VoltDB.initialize(VoltDB.java:886)
              at org.voltdb.VoltDB.main(VoltDB.java:870)
      VoltDB has encountered an unrecoverable error and is exiting.
      The log may contain additional information.

      Comment

      Working...
      X