we have a 4 node cluster that generates automatically a snapshot every 5 minutes. We tried to simulate a node failure and killed the java process on one node. Our k-factor is 2. The cluster was still working, but when we tried to rejoin the node with
the whole cluster crashed with the message
voltdb rejoin host $NODE deployment $DEPLOYMENT_FILE
How can this happen? The corresponding procedure has a fixed number of fixed statements, that doesn't change at all.
FATAL: Stored procedure EndVisit generated different SQL queries at different partitions. Shutting down to preserve data integrity.
What even more problematic is, that the recovery didn't worked. The command
issued the messages
voltdb recover host $RUNNING_NODE deployment $DEPLOYMENT_FILE
Have we done something wrong? Was there a misunderstanding?
VoltDB has encountered an unrecoverable error and is exiting.
Message: Found multiple transactions ids during restore for a partition.
We are using VoltDB 3.7 and the OpenJDK Java 1.6 64-Bit.