Forum: VoltDB Architecture

Post: Big Data

Big Data
dragam
Mar 3, 2011
Hi all,


Are there plans to use VoltDB in any kind of
Big Data scenarios?


What about the problems outlined here?


http://highscalability.com/blog/2010/6/28/voltdb-decapitates-six-sql-urban-myths-and-delivers-internet.html


I'm very interested in Michael Stonebraker's ideas about next
generation databases that will be both relational and
ACID.


TIA and rgs,


Paul...
re: Big Data
tcallaghan
Mar 3, 2011
Paul,


We see VoltDB as a key component in many "Big Data" scenarios as these applications often have a high-throughput transactional component and need real-time alerting or analytics functionality. Coincidentally, I'm presenting a webinar next week on the very topic. You can sign-up at http://voltdb.com/content/voltdb-big-data-applications


As for the post on highscalability.com, are there specific portions of Todd's writeup that you are interested in discussing?


-Tim
Big Data
dragam
Mar 3, 2011
Paul,


We see VoltDB as a key component in many "Big Data" scenarios as these applications often have a high-throughput transactional component and need real-time alerting or analytics functionality. Coincidentally, I'm presenting a webinar next week on the very topic. You can sign-up at http://voltdb.com/content/voltdb-big-data-applications


As for the post on highscalability.com, are there specific portions of Todd's writeup that you are interested in discussing?


-Tim


Hi Tim,


Thanks for getting back to me.


> We see VoltDB as a key component in many "Big Data" scenarios


With no HDD? How can a Big Data app work without some sort
of persistent storage (bigger than RAM) - unless you're talking
about Petabytes of the stuff?


OK, fair enough, let's say that the VoltDB part is just a fraction
of what you want to analyse in real time - i.e. the part that's passing
through VoltDB/s may be reasonable.


However, what I would wonder then would be, what is your permanent
persistent storage engine? Clusters of column-oriented Vertica
machines that can answer OLAP style queries?


Re: other post.


> Let's say during a stored procedure you need to make a REST call to get a discount rate


?


> I've worked on we've used the stored procedure approach. It works fine until it doesn't.
> So that's why projects have learned not to trust trust the success of their project
> to how good of an application server your database can be. Instead, they separate out
> logic from data and let each scale independently. This is a more risk reduced approach.


?


> This is a maintenance nightmare


?


> number of restrictions that make VoltDB less than ideal as general purpose database.


If I want to do a query that crosses two partitions - say I'want to
join records from Europe with those in the States? Or even
just simply select from both regions at the same time?


I was thinking about this - if you have a query "optimiser" that
recognises that the queries cross partitions and, knowing that they
won't interfere with each other - the data being on different
machines ensures (how, I don't know) that they are performed
on the the same timestamp value?


I find the project very interesting - it just also strikes that
there's more to a system than speed - I mean would a user notice
a 3/4 second latency for the sake of threading &c.?


Take care,


Paul...
re: Big Data
tcallaghan
Mar 7, 2011
Hi Tim,


Thanks for getting back to me.


> We see VoltDB as a key component in many "Big Data" scenarios


With no HDD? How can a Big Data app work without some sort
of persistent storage (bigger than RAM) - unless you're talking
about Petabytes of the stuff?


OK, fair enough, let's say that the VoltDB part is just a fraction
of what you want to analyse in real time - i.e. the part that's passing
through VoltDB/s may be reasonable.


However, what I would wonder then would be, what is your permanent
persistent storage engine? Clusters of column-oriented Vertica
machines that can answer OLAP style queries?


Re: other post.


> Let's say during a stored procedure you need to make a REST call to get a discount rate


?


> I've worked on we've used the stored procedure approach. It works fine until it doesn't.
> So that's why projects have learned not to trust trust the success of their project
> to how good of an application server your database can be. Instead, they separate out
> logic from data and let each scale independently. This is a more risk reduced approach.


?


> This is a maintenance nightmare


?


> number of restrictions that make VoltDB less than ideal as general purpose database.


If I want to do a query that crosses two partitions - say I'want to
join records from Europe with those in the States? Or even
just simply select from both regions at the same time?


I was thinking about this - if you have a query "optimiser" that
recognises that the queries cross partitions and, knowing that they
won't interfere with each other - the data being on different
machines ensures (how, I don't know) that they are performed
on the the same timestamp value?


I find the project very interesting - it just also strikes that
there's more to a system than speed - I mean would a user notice
a 3/4 second latency for the sake of threading &c.?


Take care,


Paul...


With no HDD? How can a Big Data app work without some sort
of persistent storage (bigger than RAM) - unless you're talking
about Petabytes of the stuff?


OK, fair enough, let's say that the VoltDB part is just a fraction
of what you want to analyse in real time - i.e. the part that's passing
through VoltDB/s may be reasonable.


However, what I would wonder then would be, what is your permanent
persistent storage engine? Clusters of column-oriented Vertica
machines that can answer OLAP style queries?


We designed the "export" functionality of VoltDB to assist in moving important data that has become static to an OLAP database, that OLAP database could be anything (including Vertica).


If I want to do a query that crosses two partitions - say I'want to
join records from Europe with those in the States? Or even
just simply select from both regions at the same time?


I was thinking about this - if you have a query "optimiser" that
recognises that the queries cross partitions and, knowing that they
won't interfere with each other - the data being on different
machines ensures (how, I don't know) that they are performed
on the the same timestamp value?


I find the project very interesting - it just also strikes that
there's more to a system than speed - I mean would a user notice
a 3/4 second latency for the sake of threading &c.?


VoltDB allows data in different partitions to be accessed via SQL in a multi-partition stored procedure. The drawback to this approach is that all partitions must perform this transaction at the same transactional point in time, potentially blocking other transactions in the queue.


p.s. I'll sign up for the webinar - thanks.


Great, see you tomororw.