Forum: VoltDB Architecture

Post: Is VoldDB suitable

Is VoldDB suitable
jamesst
Apr 19, 2011
We have an application that currently uses MySQl to store engineering data. The data volume will need to scale to over 1 or 2 TB, 10 would be ideal but i dont think that is possible certainly with the architecture we have now. The data is stored at around 10K+ records per second. Queries are made on this data which is stored and queried based on time and unique name. The type of queries are not complex as core data stored in partitioned tables queried as i said on time and name. Result sets can be large when requested, although we generate statistics to prevent frequent large result sets. The idea being that they get a trend of an engineering value and if an issue get the raw data it is based on.


Is VoldtDB suitable for this type of environment?
Yes, but...
sebc
Apr 19, 2011
Hi James,


Yes, VoltDB would be a perfect solution to integrate your 10K records per seconds (and then some) and provide real-time analytics on recent data, however, you will likely want to separate your real-time needs from your historical analysis. Just storing *everything* in VoltDB is unlikely to be the solution you want: since VoltDB is a pure in-memory database engine, you'd be needing quite a bit of memory (so server nodes) to do this. It's certainly doable, but probably not the best use of your capital.


I think the first thing for you to look at is how you might partition your data over time: can your real-time analytics needs be covered by keeping 1 week worth of data in VoltDB, 2 weeks, 1 month?


From there, VoltDB can serve both as your integration/real-time analytics layer and gradually push out old data (through export tables) into your data warehouse, where you would perform more longer term historical analysis.


Another thing that happens quite often in analytical systems is that massive scale analytical results from the data warehouse are re-exported back into your real-time analytics layer (think for instance of maintaining historical min/max metrics to compare the real-time data with and raise alerts). This is totally feasible as well and can allow you to get the best of both worlds with a truly optimized deployment, both on performance (a data warehouse deployment *will* give you much better performance than VoltDB when it comes to performing truly complex aggregation, possibly access to MDX queries that only OLAP systems can give you), and TCO (no waste of money over-provisioning servers on memory because you're trying to keep everything in VoltDB, or over-provisioning on CPU/IO because you're trying to use a slower disk-based DB system for your real-time analytics)


I'm one of the senior Technical Consultants here at VoltDB - you should know that we're always quite happy to look into the specific of a problem to help you figure out whether VoltDB is a right fit for your needs and how best to architect your application. Feel free to contact Mark Kiley (mkiley[at]voltdb.com) to get us involved.


Cheers,


Seb