Forum: Building VoltDB Clients

Post: Merging client and DB?

Merging client and DB?
monster
Sep 25, 2010
When I first read about the fact that all the data is kept in memory in the JVM, and that all access is done through stored procedures, the first thing that I asked myself was, could I just put a "lightweight" web application inside the VoltDB VM? I would completely bypass the marshaling/unmarshaling and socket IO, leading to an even greater performance. I could just have one huge VM using all the hardware memory on each machine (plus some HTTP load balancer and CDN). If the client code was divided in "modules", and those were stored as "dummy" stored-procedures, I could still update my client code at runtime (or the webserver could define it's own dynamic class loader). Using a load-balancer, the webserver would only need a single socket connection to the it, instead of one socket per client, saving resources. If all requests have an fixed upper bound memory footprint, and there is also an upper bound to the number of pending requests, then the amount of memory used by the client should be limited, so as to not interfere with VoltDB. Since the communication between the webserver and the DB would be basically instantaneous, the duration of a request would go down, and therefore the memory required by the webserver would also go down, since there would be less requests pending. This would also reduce the management work, by eliminating the client VM altogether, and it's associated management.
I'm sure there are some disadvantages, the main one being that buggy client code could bring the DB down, but atm, I would think the advantages are worth the risk. I would like to know what you think on the subject.
Maybe...
jhugg
Sep 27, 2010
We have had the thought before that you could simply return HTML from individual stored procedures and use an HTTP proxy server to send translate the http requests into stored procedure calls (and to handle any error messages). That would require few changes to VoltDB.
We've guessed that in practice, a web page might require a few differently-partitioned procedure calls to build the response, and perhaps even a blocking call to another system. In light of this, we think you might require a lot of slower, multi-partition procedures that would cancel out much of the benefit. Our guess is that best practices separate the app-serving layer from the data layer.
Still, let us know if you're interested in trying it out. As for serving http directly from the process, we don't have any direct plans to implement this. We do currently embed the jetty http server in VoltDB 1.2 (releasing soon) to handle JSON/HTTP api calls. You could dual purpose this code for your purposes if you wanted to try. It might be interesting.
My understanding is that
monster
Sep 27, 2010
We have had the thought before that you could simply return HTML from individual stored procedures and use an HTTP proxy server to send translate the http requests into stored procedure calls (and to handle any error messages). That would require few changes to VoltDB.
We've guessed that in practice, a web page might require a few differently-partitioned procedure calls to build the response, and perhaps even a blocking call to another system. In light of this, we think you might require a lot of slower, multi-partition procedures that would cancel out much of the benefit. Our guess is that best practices separate the app-serving layer from the data layer.
Still, let us know if you're interested in trying it out. As for serving http directly from the process, we don't have any direct plans to implement this. We do currently embed the jetty http server in VoltDB 1.2 (releasing soon) to handle JSON/HTTP api calls. You could dual purpose this code for your purposes if you wanted to try. It might be interesting.


My understanding is that Jetty is "lighter" then Tomcat, but I read recently about http://winstone.sourceforge.net/ , which might be even lighter and more appropriate for your simple use-case.
It is my understanding that for a "normal" database, actually reading and writing to the persistent storage media is where the time is spent, and therefore the divers marshaling and unmarshaling steps have little impact on the performance. But VoltDB is different, and the communication cost becomes much more important; that is why you have removed the usual JDBC driver interface, to cut back on the inefficient marshaling and unmarshaling steps. But unfortunately, these steps still exist, even with a more efficient wire protocol.
Example:
Internet => Proxy/Load-Balancer => Java Webserver => VoltDB
and then the other way around:
Internet <= Proxy/Load-Balancer <= Java Webserver <= VoltDB
So, moving the Java webserver into VoltDB would eliminate the communication between the two, surely saving some milliseconds. The web application could still call several stored procedures, and achieving optimal performance would "only" require that the web app query the VoltDB API to find out which VoltDB instance contains the data it needs, and then move the whole "request" to that VoltDB instance, as the request is normally much smaller then the "response". Since the front-end would be connected to all VoltDB instances, it would receive the response directly from the "most appropriate instance", even if it is a different one then the one that received the request.
If I go for VoltDB, this is something I will want to try, even if only to get a micro-benchmark that proves that it's not worth it. After all, every millisecond counts ...
Ok then.
jhugg
Sep 28, 2010
My understanding is that Jetty is "lighter" then Tomcat, but I read recently about http://winstone.sourceforge.net/ , which might be even lighter and more appropriate for your simple use-case.
It is my understanding that for a "normal" database, actually reading and writing to the persistent storage media is where the time is spent, and therefore the divers marshaling and unmarshaling steps have little impact on the performance. But VoltDB is different, and the communication cost becomes much more important; that is why you have removed the usual JDBC driver interface, to cut back on the inefficient marshaling and unmarshaling steps. But unfortunately, these steps still exist, even with a more efficient wire protocol.
Example:
Internet => Proxy/Load-Balancer => Java Webserver => VoltDB
and then the other way around:
Internet <= Proxy/Load-Balancer <= Java Webserver <= VoltDB
So, moving the Java webserver into VoltDB would eliminate the communication between the two, surely saving some milliseconds. The web application could still call several stored procedures, and achieving optimal performance would "only" require that the web app query the VoltDB API to find out which VoltDB instance contains the data it needs, and then move the whole "request" to that VoltDB instance, as the request is normally much smaller then the "response". Since the front-end would be connected to all VoltDB instances, it would receive the response directly from the "most appropriate instance", even if it is a different one then the one that received the request.
If I go for VoltDB, this is something I will want to try, even if only to get a micro-benchmark that proves that it's not worth it. After all, every millisecond counts ...


If you end up trying this out, let us know how it goes.