Forum: Other

Post: [NETWORK] - VoltPort died due to an unexpected exception

[NETWORK] - VoltPort died due to an unexpected exception
radek1st
Jun 14, 2012
Hi,
I've been running a VoltDB CE server and a client (async calls to stored procedures) on one machine for two days. After around 200K inserts selects and deletes, the connection on the client side died, see the trace below. Not surprisingly, after this, there came repeated NoConnectionsException. I've got two questions:

1) Has anyone seen this error before or could offer a solution?

2) What is the recommended strategy for handling NoConnectionExceptions in the client code? For example, would you do something like this:

public void callX(ProcedureCallback callback, String abc){
try {
client.callProcedure(callback, STORED_PROCEDURE_X, abc);
}
catch (NoConnectionsException e) {
e.printStackTrace();
//create new connection
client.createConnection(hostName);
//retry the same call
callX(callback, abc);
} catch (IOException e) {
e.printStackTrace();
}
}


And the stacktrace:

2012-06-14 13:19:32,026 ERROR (Volt Network - 0) [NETWORK] - VoltPort died due to an unexpected exception
java.lang.ArrayIndexOutOfBoundsException: -2
at org.voltdb.LatencyBucketSet.update(LatencyBucketSet.java:45)
at org.voltdb.client.ClientStats.update(ClientStats.java:193)
at org.voltdb.client.Distributer$NodeConnection.updateStats(Distributer.java:249)
at org.voltdb.client.Distributer$NodeConnection.handleMessage(Distributer.java:297)
at org.voltcore.network.VoltPort.run(VoltPort.java:189)
at org.voltcore.network.VoltNetwork.callPort(VoltNetwork.java:391)
at org.voltcore.network.VoltNetwork.invokeCallbacks(VoltNetwork.java:419)
at org.voltcore.network.VoltNetwork.run(VoltNetwork.java:309)
at java.lang.Thread.run(Thread.java:722)
java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
at sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImpl.java:87)
at org.voltcore.network.VoltPort.lockForHandlingWork(VoltPort.java:169)
at org.voltcore.network.VoltNetwork.callPort(VoltNetwork.java:389)
at org.voltcore.network.VoltNetwork.access$300(VoltNetwork.java:85)
at org.voltcore.network.VoltNetwork$3.run(VoltNetwork.java:276)
at org.voltcore.network.VoltNetwork.run(VoltNetwork.java:305)
at java.lang.Thread.run(Thread.java:722)
org.voltdb.client.NoConnectionsException: No connections.
at org.voltdb.client.Distributer.queue(Distributer.java:503)
at org.voltdb.client.ClientImpl.private_callProcedure(ClientImpl.java:308)
at org.voltdb.client.ClientImpl.callProcedure(ClientImpl.java:289)
at org.voltdb.client.ClientImpl.callProcedure(ClientImpl.java:238)
....
Client errors
rbetts
Jun 14, 2012
The ArraysOutOfBound exception is a regression in some latency statistics code that was recently changed. Possibly a clock moved backward somewhere and caused a negative latency value to be calculated. I filed https://issues.voltdb.com/browse/ENG-3200 to fix this; Our next iteration release should contain the fix. Thank you for reporting this.

You can retry your connection that way; or you can handle connection errors to the DB a little more canonically by registering a ClientStatusListenerExt that implements connectionLost() in the ClientConfig passed to the Client factory.

The examples/voter/src/voter/AsyncBenchmark.java client demonstrates implementing and registering the listerner.

Ryan.
Thanks Ryan, Makes sense. I
radek1st
Jun 14, 2012
The ArraysOutOfBound exception is a regression in some latency statistics code that was recently changed. Possibly a clock moved backward somewhere and caused a negative latency value to be calculated. I filed https://issues.voltdb.com/browse/ENG-3200 to fix this; Our next iteration release should contain ...


Thanks Ryan,

Makes sense. I can see this error every so often (running on Amazon EC2):

ERROR: Initiator time moved backwards from: 1339595676230 to 1339595676088, a difference of 0.14 seconds.
Initiator time moved backwards from: 1339595676230 to 1339595676088, a difference of 0.14 seconds.


At what point is the listener notified about the lost connection? Is it when there is an unsuccessful procedure call? I want to be certain that all of my calls client.callProcedure(callback, STORED_PROCEDURE_X, abc) complete successfully.

Cheers,
Radek
Listener notification
rbetts
Jun 14, 2012
Thanks Ryan,

Makes sense. I can see this error every so often (running on Amazon EC2):ERROR: Initiator time moved backwards from: 1339595676230 to 1339595676088, a difference of 0.14 seconds.
Initiator time moved backwards from: 1339595676230 to 1339595676088, a difference of 0.14 seconds.



The listener is notified when the connection is removed from the Client libraries network selector. There are multiple threads involved here; so before/after between them is a little fuzzy. Also, since you are using the callback-based client API, you can have many callbacks outstanding that can fail with ClientResponse.CONNECTION_LOST. You don't want to establish a new connection for each outstanding callback that reports no connection. Using the status listener will avoid that problem.

I'm not quite sure what to advise about retrying; you could enter a back-off retry loop possibly; or push the rejected transaction to a retry queue somewhere. It depends on the structure of your app and if there are logical interactions between your different stored procedure invocations.


Ryan.
Regarding the
rbetts
Jun 14, 2012
Regarding the time-moved-backwards...

Volt clusters care about time. Clock skew between servers and regressing clocks can affect transaction latency. There's some configuration help for EC2 available at Ariel's blog: http://www.afewmoreamps.com/2011/07/configuring-ntp-for-voltdb.html and in our documentation at: ntpSvcIntro

Ryan.
Big Thanks Ryan
radek1st
Jun 15, 2012
Regarding the time-moved-backwards...

Big Thanks Ryan