Forum: Building VoltDB Applications

Post: Do folks use Hadoop for file store on top of VoltDB? and other questions

Do folks use Hadoop for file store on top of VoltDB? and other questions
Aug 28, 2012
I am in the analysis phase of scaling our internal application, which has a lot of file upload by users.
I have recommended Volt to my manager and went through the benefits.
We have no internal experience with Hadoop/HFS, but it seems like Hadoop is "the way to go" when scaling for user file uploads.
VM resources are expensive. It seems to be more advantagous to deploy independent commodity 1U rack units instead. Do folks agree?
Good questions
Aug 29, 2012
This is a little difficult to answer as it is really an application design that has many solutions.
First, yes, you can use VoltDB to store metadata associated with files that are stored within Hadoop/HDFS. There are applications that use Hadoop's HDFS without using any of the other features of Hadoop. So you could create a voltdb table(s) that describes the file and points to its location within the HDFS cluster.
As far as system resources go, that is a different question. You likely would not want to run a volt cluster and an HDFS cluster on the same machines. You'd probably want to separate them since you be running significant queries between them. Volt, depending on the partitioning scheme you choose can benefit from a a high core count. Also, the amount of data you may store impacts how much memory you may need.
Commodity rack systems can run well, again depending on what you are looking at as far as machines and the type of durability you want to support. Locally hosted machines tend to provide much higher throughput and lower latency than cloud services, but it may be that the performance of the cloud services is well within your expected peak usage.
Ideally you would contact field operations, setup a meeting and go over your application design with an engineer and then we can give you a better answer.