Forum: Building VoltDB Applications

Post: Unable to start a 4 nodes cluster

Unable to start a 4 nodes cluster
luke55
Feb 17, 2016
HI,

For testing purpose, I was using a 3 nodes cluster, no problem so far.
I decided to add a 4th node, and it's a nightmare.
When I start the db from the enterprise manager, I systematically get an error for the 4th node, saying:
Received remote hangup from foreign host xxxxx

I have attached the log.txt taken on the 4th node, which to me is totally ununderstandable.
It says that zookeeper service was unable to bind to port 7181 because it is already in use, but 2 lines above, it says that the voltdb process (14712) is precisely bound to 7181 !!!
I did a lsof on the voltdb process that is running after the failure, it confirms that the connection is established.
After the failure, in the enterprise manager window, I have a spinning wheel, indefinitely.

Using lsof, If I look at how this port 7181 is used, I get the same thing on all 4 nodes (see below), so why this particular node has a problem, while all other 3 nodes succeed to start ??
java 14153 root 82u IPv6 125443320 0t0 TCP localhost:7181 (LISTEN)
java 14153 root 88u IPv6 125443323 0t0 TCP localhost:58773->localhost:7181 (ESTABLISHED)
java 14153 root 89u IPv6 125443327 0t0 TCP localhost:7181->localhost:58773 (ESTABLISHED)

On top of all, in the log window of the enterprise manager, I see "cluster remains operational", but for sure, it's not, I am totally unable to connect to the DB.
If I try to shutdown the db with voltadmin, all I get is : Connection refused
The only way to stop the DB is to manually kill the running VoltDB process on each host.


Thanks for helping.
pzhao
Feb 17, 2016
luke55,
Sounds like you're having some issues and I'd be glad to help you with. I've taken a look at the files you've sent.
It appears you are using v5.6, a Voltdb or another process using port 7181, and VoltDB enterprise manager (VEM, correct me if I'm wrong). As a side note, VoltDB enterprise manager was deprecated in V5.0 and no longer recommended to manage a cluster. We recommend users use command line.
Back to the issue, you've stated you added a 4th node, how was this done?
Peter Zhao