Forum: VoltDB Architecture

Post: Restart after the full cluster fails

Restart after the full cluster fails
gambitg
Sep 9, 2011
From my tests it seems that restart after the FULL cluster fails requires all nodes to be restarted.
Is it true ? If so, why so ? Why can't the cluster restart with just a few nodes up ?
When I kill one node after another on a full live cluster, I can still have the database serve with just a node up (when I have hosts=k+1); but cannot have the database serve with just a node up after the FULL cluster fails.


Thanks for inputs.
It's true.
jhugg
Sep 11, 2011
Starting a cluster under any condition requires the full contingent of nodes, as specified in the deployment file, to be active. After a global cluster failure, if you only have N working nodes, you can set your cluster size to N in the deployment file and start successfully. However, this will not allow you to add nodes to add redundancy; a second restart would be required.


Making this scenario easier, and furthermore addressing live cluster topology changes, are both on our roadmap, but we don't currently have a public timeframe.