Forum: Building VoltDB Applications

Post: java.io.InterruptedIOException: Interrupted while waiting for response

java.io.InterruptedIOException: Interrupted while waiting for response
Edson Ramiro
Aug 19, 2014
Hi,

I am stress testing VoltDB following a new testing methodology that we are developing here in Luxembourg [1]. In our work, we propose a database state machine to represent the states of a DBMS when processing concurrent transactions. The state transitions are forced by increasing concurrency of the testing workload.

The goal of this email is to report what we've found and also to be certain that our testing results can be trusted.

In our tests, we are using VoltDB 4.5 in four machines running Debian 3.2.57 x86_64, with 32Gb of RAM and Intel Xeon 8 cores.

After calling the same stored procedure (which implements TPCB - see files attached) for a while I got this error message: "Interrupted while waiting for response". It worked fine during 50 minutes, but suddenly this error message rose up.

Follows the stack trace:

java.io.InterruptedIOException: Interrupted while waiting for response
at org.voltdb.client.ClientImpl.callProcedure(ClientImpl.java:267)
at org.voltdb.client.ClientImpl.callProcedureWithTimeout(ClientImpl.java:224)
at org.voltdb.client.ClientImpl.callProcedure(ClientImpl.java:204)
at org.dart.dsm.workload.TPCBVolt.run(TPCBVolt.java:59)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.runAndReset(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)



I cleaned the attached files removing log operations, so the line number from the error message doesn't match with the attached file. This is the code line 59:

response = client.callProcedure("TPCBForVoltDB", bid, tid, aid, delta);

A similar issue has been reported in this thread [2], but nobody answered.

1) I know that it's necessary to turn auto-commit off when using JDBC, but do I need to turn it off when I'm using a VoltDB client application? If so, how do I turn it off?
2) I think this issue is related to a timeout, but I'm not sure about which timeout could be.

[1] http://dx.doi.org/10.1145/2628194.2628201
[2] https://forum.voltdb.com/showthread.php?1185-SQL-problems-when-trying-to-execute-TPC-B
bballard
Sep 2, 2014
Hi Edson,

It looks like your Test.java class creates a new client object, connects to one node of VoltDB, calls a single procedure using a synchronous call, then closes the connection.

VoltDB java "Client" instances are thread-safe and can be shared by many threads. Normally one client instance is created, connections are made to each server in the cluster, and then this instance is shared by all of the application threads. For example, see [SyncBenchmark.java](https://github.com/VoltDB/voltdb/blob/master/examples/voltkv/src/voltkv/SyncBenchmark.java) which uses an inner class KVThread to generate work, which can access the client instance that was instantiated in the SyncBenchmark class. I've seen some cases with heavy payloads or when trying to test extreme performance, where the client instance could be a bottleneck and having a pool of 5-10 client instances allowed higher throughput, but again these instances are reused, not closed after one invocation, and the pool is small.

However, it looks like you're actually trying to test how many connections you can make. Can you confirm that is the purpose of your test (maximum connections), and you're not trying to get maximum throughput? Is this an arbitrary test? Again, even supporting large numbers of concurrent users, the application doesn't need to use many clients. I saw your other question about the MAX_CONNECTIONS limit, and I'm thinking maybe you're just trying to test up to that limit without regard for whether so many connections would ever be necessary in a practical situation.

Do you have an idea of how many threads were running when you encountered this exception, i.e. how many client instances?

How do you stop your benchmark? Is it possible there is a system exit prior to the completion of all the threads running on your ScheduledThreadPoolExecutor? Or perhaps your client application hit some limit which caused the interrupt.

Please explain more about how you set up this benchmark, it will help us figure out if you are hitting some sort of bug or if this is expected behavior in an extreme scenario.

Thanks,
Ben
Edson Ramiro
Sep 2, 2014
Hi bballard,

Actually you are right. Our methodology explore extreme scenarios in the sense of throughput and the connections queue.

In this way, we're trying to reach all the limits of VoltDB. That's why my other topic asking if MAX_CONNECTIONS should be >= m_numConnections.get() instead of ==.

>> Do you have an idea of how many threads were running when you encountered this exception, i.e. how many client instances?
We are executing VoltDB in a cluster with four machines (32Gb of RAM and 8 cores) using default settings of VoltDB.

>> How do you stop your benchmark? Is it possible there is a system exit prior to the completion of all the threads running on your ScheduledThreadPoolExecutor? Or perhaps your client application hit some limit which caused the interrupt.
It's possible that my system exit. I'm using the HPC cluster of the University of Luxembourg and the job scheduler (oar) may had killed my job in the middle of the execution. I'm working on this right now to scale the test.

>> Please explain more about how you set up this benchmark, it will help us figure out if you are hitting some sort of bug or if this is expected behavior in an extreme scenario.
In a simple manner, we have 10 machines submitting transactions at a fixed rate of 1 second. The number of transactions increase at each five minutes starting from 50 tps to 80 000 tps. We plan to submit 1 million tps, but I've to scale the test and solve these issues before.

I will keep you up to date about my experiments and I hope to test VoltDB under extreme conditions.

Please, let me know in case you need any additional information.

Thank you very much for your help,
bballard
Sep 2, 2014
Hi Edson,

Thanks, we're interested in your results. Please ask us to review your tests further if you get lower than expected performance. We're happy to take the conversation private if you prefer.

To hit 1M TPS, you should only need a few Client instances. If you want to instead create the maximum number of Client instances and connections, then this will I think require a lot more client machines than otherwise would be needed to hit 1M TPS.

Is there some reason why the connections should not be reused or shared?

For maximum concurrency and throughput, we've seen tests that showed 1 Client instance per server was not as fast as a pool of several instances, or simply running several instances of the client benchmark application. Beyond that, you'd just be incurring more overhead costs and not increasing throughput.

thanks,
Ben