Forum: Managing VoltDB

Post: Enterprise Manager Hang - port already in use during catalog.jar redeploy. Failure to stop database.

Enterprise Manager Hang - port already in use during catalog.jar redeploy. Failure to stop database.
mathewbutler
Aug 19, 2014
Build: 4.0 voltdb-4.0-0-gab8313d-local Enterprise Edition

Below is the detail of the issue. The resolution was to manually kill the database process and redeploy. We are in development so currently this is acceptable, I woudl like to get a resolution prior to this system going live.

Questions:

    Is the reported problem a known issue? (failure to cleanly shut-down a clean database)

    How do I avoid this happening again?

    Is there a fix for this?

    If thsi happens again, how do I diagnose the reason for the database not shutting down cleanly?



Thanks in advance for any assistance.

Regards,
Mat.

=========================================================================
=========================================================================
Details:


    Enterprise Manager used to redeploy a catalog.jar to an existing database. The option "recover and start database" was taken. Enterprise Manager then hangs.


Investigation:

    Log file appended records the port being used by Zookeeper was already in use

    Process currently using the port was the database whose catalog.jar was being redeployed.



Conclusion:

    Existing voltDB database required a restart to redeploy, but this existing database failed to stop, hanging onto the port



[SNIP log file...]

2014-08-19 10:34:48,202 WARN [main] HOST: Catalog is ignored for 'recover' action.
2014-08-19 10:34:48,205 INFO [main] CONSOLE: Initializing VoltDB...

_ __ ____ ____ ____
| | / /___ / / /_/ __ \/ __ )
| | / / __ \/ / __/ / / / __ |
| |/ / /_/ / / /_/ /_/ / /_/ /
|___/\____/_/\__/_____/_____/

--------------------------------

2014-08-19 10:34:48,237 INFO [main] CONSOLE: Build: 4.0 voltdb-4.0-0-gab8313d-local Enterprise Edition
2014-08-19 10:34:48,252 INFO [main] NETWORK: Default network thread count: 2
2014-08-19 10:34:48,305 INFO [main] HOST: Beginning inter-node communication on port 3022.
2014-08-19 10:34:48,305 INFO [main] HOST: Attempting to bind to leader ip XXXXXXXXXXXXXXXXXXXXX
2014-08-19 10:34:48,307 INFO [main] CONSOLE: Connecting to the VoltDB cluster leader XXXXXXXXXXXXXXXXXXXXXXX
2014-08-19 10:34:48,316 INFO [Socket Joiner] HOST: Received request type REQUEST_HOSTID
2014-08-19 10:34:48,333 INFO [main] HOST: Leader provided address XX.XX.XX.XX
2014-08-19 10:34:48,334 INFO [main] HOST: Clock skew to across all nodes in the cluster is 3
2014-08-19 10:34:48,335 INFO [main] NETWORK: 1 notified of host 0
2014-08-19 10:34:48,368 INFO [ZooKeeperServer] REJOIN: Joining site 1:-1 known active sites 0:-1, 1:-1
2014-08-19 10:34:48,438 INFO [main] ZK-SERVER: binding to port /127.0.0.1:2182
2014-08-19 10:34:48,463 FATAL [main] HOST: Unable to list ports in use at this time.
2014-08-19 10:34:48,487 FATAL [main] HOST: ZooKeeper service unable to bind to port : 2182
2014-08-19 10:34:48,487 FATAL [main] HOST: java.net.BindException: Address already in use
2014-08-19 10:34:48,818 WARN [Volt Network - 1] HOST: Received remote hangup from foreign host dap220.devstandard.dsp/10.110.66.23:50860
2014-08-19 10:34:48,819 WARN [Volt Network - 1] NETWORK: Host 1 failed
2014-08-19 10:34:48,824 INFO [ZooKeeperServer] REJOIN: Agreement, Processing FaultMessage {failed: 1:-1, reporting: 0:-1, witnessed: true}
2014-08-19 10:34:48,894 INFO [ZooKeeperServer] REJOIN: Agreement, Sending survivors SITE_FAILURE_UPDATE from site: 0:-1 survivors: [0:-1] failed: [1:-1]
2014-08-19 10:34:48,905 INFO [ZooKeeperServer] REJOIN: Agreement, Adding 1:-1 to failed sites history
2014-08-19 10:34:48,905 INFO [ZooKeeperServer] REJOIN: Agreement, handling site faults for newly failed sites 1:-1 initiatorSafeInitPoints {1:-1->-1}
2014-08-19 10:34:48,909 WARN [ZooKeeperServer] NETWORK: Host 1 failed
[...SNIP]
John T Crawford
Aug 19, 2014
Hi Mat,

This sounds like a support request. Could you please email this information to support@voltdb.com? Then one of our engineers will get back to you quickly. Thanks!
- John Crawford, VoltDB QA Automation & Support Engineer
mathewbutler
Aug 20, 2014
Thanks John. Now raised with Support.