"Voltport died", seeing 'java.nio.channels.CancelledKeyException' and 'java.nio.BufferOverflowException'
Jul 18, 2013
Folks,
I'm seeing 2 seperate errors when I try and run more than about 40K TPS on a 3 node cluster of 16 core IBM boxes. Each box is:
Linux f3sn05 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
The first thing we see is a sudden application latency spike we we try to drain a connection, followed by nio.BufferOverflowException:
3079 [Volt Network - 0] INFO StatsUnloader11 - StatsUnloader:11: Opening stats file /home/voltdb/voltRuntime/SWBQGenerated/stats/volt_11_20130718_0453.stat
3483 [VoltSpace11] WARN Partition11 - Call to drain() took 3331ms
[voltdb@f3sn03 voltLocal]$ java.nio.BufferOverflowException
at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:182)
at org.voltdb.VoltTable.flattenToBuffer(VoltTable.java:1582)
at org.voltdb.ClientResponseImpl.flattenToBuffer(ClientResponseImpl.java:279)
at org.voltdb.messaging.InitiateResponseMessage.flattenToBuffer(InitiateResponseMessage.java:210)
at org.voltcore.messaging.ForeignHost$1.serialize(ForeignHost.java:202)
at org.voltcore.network.NIOWriteStream.swapAndSerializeQueuedWrites(NIOWriteStream.java:356)
at org.voltcore.network.VoltPort.drainWriteStream(VoltPort.java:233)
at org.voltcore.network.VoltPort.run(VoltPort.java:194)
at org.voltcore.network.VoltNetwork.callPort(VoltNetwork.java:385)
at org.voltcore.network.VoltNetwork.invokeCallbacks(VoltNetwork.java:413)
at org.voltcore.network.VoltNetwork.run(VoltNetwork.java:292)
at java.lang.Thread.run(Thread.java:662)
ERROR: VoltPort died due to an unexpected exception
java.nio.BufferOverflowException
at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:182)
at org.voltdb.VoltTable.flattenToBuffer(VoltTable.java:1582)
at org.voltdb.ClientResponseImpl.flattenToBuffer(ClientResponseImpl.java:279)
at org.voltdb.messaging.InitiateResponseMessage.flattenToBuffer(InitiateResponseMessage.java:210)
at org.voltcore.messaging.ForeignHost$1.serialize(ForeignHost.java:202)
at org.voltcore.network.NIOWriteStream.swapAndSerializeQueuedWrites(NIOWriteStream.java:356)
at org.voltcore.network.VoltPort.drainWriteStream(VoltPort.java:233)
at org.voltcore.network.VoltPort.run(VoltPort.java:194)
at org.voltcore.network.VoltNetwork.callPort(VoltNetwork.java:385)
at org.voltcore.network.VoltNetwork.invokeCallbacks(VoltNetwork.java:413)
at org.voltcore.network.VoltNetwork.run(VoltNetwork.java:292)
at java.lang.Thread.run(Thread.java:662)
Things then get a lot worse:
WARN: Attempted delivery of message to failed site: 1:-1
WARN: Attempted delivery of message to failed site: 1:-1
WARN: Attempted delivery of message to failed site: 1:-1
WARN: Attempted delivery of message to failed site: 1:-1
WARN: Received remote hangup from foreign host 129.40.80.12
WARN: Attempted delivery of message to failed site: 1:8
WARN: Host 1 failed
java.nio.BufferOverflowException
at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:182)
at org.voltdb.VoltTable.flattenToBuffer(VoltTable.java:1582)
at org.voltdb.ClientResponseImpl.flattenToBuffer(ClientResponseImpl.java:279)
at org.voltdb.messaging.InitiateResponseMessage.flattenToBuffer(InitiateResponseMessage.java:210)
at org.voltcore.messaging.ForeignHost$1.serialize(ForeignHost.java:202)
at org.voltcore.network.NIOWriteStream.swapAndSerializeQueuedWrites(NIOWriteStream.java:356)
at org.voltcore.network.VoltPort.drainWriteStream(VoltPort.java:233)
at org.voltcore.network.VoltPort.run(VoltPort.java:194)
at org.voltcore.network.VoltNetwork.callPort(VoltNetwork.java:385)
at org.voltcore.network.VoltNetwork.invokeCallbacks(VoltNetwork.java:413)
at org.voltcore.network.VoltNetwork.run(VoltNetwork.java:292)
at java.lang.Thread.run(Thread.java:662)
The second error we see is different but also related to networking. It may or may not happen at the same time as the first one:
WARN: Attempted delivery of message to failed site: 2:-1
WARN: Attempted delivery of message to failed site: 0:7
WARN: Attempted delivery of message to failed site: 2:7
WARN: Attempted delivery of message to failed site: 0:4
WARN: Attempted delivery of message to failed site: 2:4
WARN: Attempted delivery of message to failed site: 0:7
WARN: Attempted delivery of message to failed site: 2:7
WARN: Attempted delivery of message to failed site: 0:10
WARN: Attempted delivery of message to failed site: 2:10
WARN: Attempted delivery of message to failed site: 0:7
WARN: Attempted delivery of message to failed site: 2:7
WARN: Attempted delivery of message to failed site: 0:10
WARN: Attempted delivery of message to failed site: 2:10
WARN: Attempted delivery of message to failed site: 0:10
WARN: Attempted delivery of message to failed site: 2:10
WARN: Attempted delivery of message to failed site: 0:-1
WARN: Attempted delivery of message to failed site: 2:-1
WARN: Received remote hangup from foreign host 129.40.80.13
WARN: Received remote hangup from foreign host 129.40.80.15
WARN: Host 2 failed
WARN: Host 0 failed
java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
at sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImpl.java:69)
at org.voltcore.network.VoltPort.lockForHandlingWork(VoltPort.java:164)
at org.voltcore.network.VoltNetwork.callPort(VoltNetwork.java:383)
at org.voltcore.network.VoltNetwork.access$200(VoltNetwork.java:85)
at org.voltcore.network.VoltNetwork$3.run(VoltNetwork.java:252)
at org.voltcore.network.VoltNetwork.run(VoltNetwork.java:301)
at java.lang.Thread.run(Thread.java:662)
FATAL: Partition detection snapshot completed. Shutting down.
VoltDB has encountered an unrecoverable error and is exiting.
The log may contain additional information.
Now, my suspicion is that my network is a bit iffy, but what should I do to prove that?
Note that AFAIK we're not doing anything unusual at the VoltDB level other than more work. Memory is fine and CPU is around 12% when we see this.
David Rolfe
I'm seeing 2 seperate errors when I try and run more than about 40K TPS on a 3 node cluster of 16 core IBM boxes. Each box is:
Linux f3sn05 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
The first thing we see is a sudden application latency spike we we try to drain a connection, followed by nio.BufferOverflowException:
3079 [Volt Network - 0] INFO StatsUnloader11 - StatsUnloader:11: Opening stats file /home/voltdb/voltRuntime/SWBQGenerated/stats/volt_11_20130718_0453.stat
3483 [VoltSpace11] WARN Partition11 - Call to drain() took 3331ms
[voltdb@f3sn03 voltLocal]$ java.nio.BufferOverflowException
at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:182)
at org.voltdb.VoltTable.flattenToBuffer(VoltTable.java:1582)
at org.voltdb.ClientResponseImpl.flattenToBuffer(ClientResponseImpl.java:279)
at org.voltdb.messaging.InitiateResponseMessage.flattenToBuffer(InitiateResponseMessage.java:210)
at org.voltcore.messaging.ForeignHost$1.serialize(ForeignHost.java:202)
at org.voltcore.network.NIOWriteStream.swapAndSerializeQueuedWrites(NIOWriteStream.java:356)
at org.voltcore.network.VoltPort.drainWriteStream(VoltPort.java:233)
at org.voltcore.network.VoltPort.run(VoltPort.java:194)
at org.voltcore.network.VoltNetwork.callPort(VoltNetwork.java:385)
at org.voltcore.network.VoltNetwork.invokeCallbacks(VoltNetwork.java:413)
at org.voltcore.network.VoltNetwork.run(VoltNetwork.java:292)
at java.lang.Thread.run(Thread.java:662)
ERROR: VoltPort died due to an unexpected exception
java.nio.BufferOverflowException
at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:182)
at org.voltdb.VoltTable.flattenToBuffer(VoltTable.java:1582)
at org.voltdb.ClientResponseImpl.flattenToBuffer(ClientResponseImpl.java:279)
at org.voltdb.messaging.InitiateResponseMessage.flattenToBuffer(InitiateResponseMessage.java:210)
at org.voltcore.messaging.ForeignHost$1.serialize(ForeignHost.java:202)
at org.voltcore.network.NIOWriteStream.swapAndSerializeQueuedWrites(NIOWriteStream.java:356)
at org.voltcore.network.VoltPort.drainWriteStream(VoltPort.java:233)
at org.voltcore.network.VoltPort.run(VoltPort.java:194)
at org.voltcore.network.VoltNetwork.callPort(VoltNetwork.java:385)
at org.voltcore.network.VoltNetwork.invokeCallbacks(VoltNetwork.java:413)
at org.voltcore.network.VoltNetwork.run(VoltNetwork.java:292)
at java.lang.Thread.run(Thread.java:662)
Things then get a lot worse:
WARN: Attempted delivery of message to failed site: 1:-1
WARN: Attempted delivery of message to failed site: 1:-1
WARN: Attempted delivery of message to failed site: 1:-1
WARN: Attempted delivery of message to failed site: 1:-1
WARN: Received remote hangup from foreign host 129.40.80.12
WARN: Attempted delivery of message to failed site: 1:8
WARN: Host 1 failed
java.nio.BufferOverflowException
at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:182)
at org.voltdb.VoltTable.flattenToBuffer(VoltTable.java:1582)
at org.voltdb.ClientResponseImpl.flattenToBuffer(ClientResponseImpl.java:279)
at org.voltdb.messaging.InitiateResponseMessage.flattenToBuffer(InitiateResponseMessage.java:210)
at org.voltcore.messaging.ForeignHost$1.serialize(ForeignHost.java:202)
at org.voltcore.network.NIOWriteStream.swapAndSerializeQueuedWrites(NIOWriteStream.java:356)
at org.voltcore.network.VoltPort.drainWriteStream(VoltPort.java:233)
at org.voltcore.network.VoltPort.run(VoltPort.java:194)
at org.voltcore.network.VoltNetwork.callPort(VoltNetwork.java:385)
at org.voltcore.network.VoltNetwork.invokeCallbacks(VoltNetwork.java:413)
at org.voltcore.network.VoltNetwork.run(VoltNetwork.java:292)
at java.lang.Thread.run(Thread.java:662)
The second error we see is different but also related to networking. It may or may not happen at the same time as the first one:
WARN: Attempted delivery of message to failed site: 2:-1
WARN: Attempted delivery of message to failed site: 0:7
WARN: Attempted delivery of message to failed site: 2:7
WARN: Attempted delivery of message to failed site: 0:4
WARN: Attempted delivery of message to failed site: 2:4
WARN: Attempted delivery of message to failed site: 0:7
WARN: Attempted delivery of message to failed site: 2:7
WARN: Attempted delivery of message to failed site: 0:10
WARN: Attempted delivery of message to failed site: 2:10
WARN: Attempted delivery of message to failed site: 0:7
WARN: Attempted delivery of message to failed site: 2:7
WARN: Attempted delivery of message to failed site: 0:10
WARN: Attempted delivery of message to failed site: 2:10
WARN: Attempted delivery of message to failed site: 0:10
WARN: Attempted delivery of message to failed site: 2:10
WARN: Attempted delivery of message to failed site: 0:-1
WARN: Attempted delivery of message to failed site: 2:-1
WARN: Received remote hangup from foreign host 129.40.80.13
WARN: Received remote hangup from foreign host 129.40.80.15
WARN: Host 2 failed
WARN: Host 0 failed
java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
at sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImpl.java:69)
at org.voltcore.network.VoltPort.lockForHandlingWork(VoltPort.java:164)
at org.voltcore.network.VoltNetwork.callPort(VoltNetwork.java:383)
at org.voltcore.network.VoltNetwork.access$200(VoltNetwork.java:85)
at org.voltcore.network.VoltNetwork$3.run(VoltNetwork.java:252)
at org.voltcore.network.VoltNetwork.run(VoltNetwork.java:301)
at java.lang.Thread.run(Thread.java:662)
FATAL: Partition detection snapshot completed. Shutting down.
VoltDB has encountered an unrecoverable error and is exiting.
The log may contain additional information.
Now, my suspicion is that my network is a bit iffy, but what should I do to prove that?
Note that AFAIK we're not doing anything unusual at the VoltDB level other than more work. Memory is fine and CPU is around 12% when we see this.
David Rolfe