Forum: VoltDB Architecture

Post: Optimistic Locking?

Optimistic Locking?
henning
May 16, 2010
Is it correct to say that VoltDB is using Optimistic Locking internally?


Thanks,
Henning
No.
jhugg
May 16, 2010
In version 1.0, VoltDB uses little or no locking in it's core execution paths.


In a single partition procedure, from the moment the stored procedure begins to the moment it ends, there is no need to acquire a lock on anything, as the execution path owns the data entirely.


We do use some simple locks to order transactions and to message between sites. In a multi-partition transaction, locks are used to synchronize messaging and it's likely some parts of the cluster will be idle waiting for messages for some amount of time. This is why multi-partition transactions are so much slower.


In the future, we have ways to get around a large portion of the idleness for many flavors of multi-partition transactions. Some of these ideas are just smarter ways to do what we do now. Others could be described as "optimistic" or "speculative" execution.
Global database partition lock
Michael Robinson
May 19, 2010
In version 1.0, VoltDB uses little or no locking in it's core execution paths.


In a single partition procedure, from the moment the stored procedure begins to the moment it ends, there is no need to acquire a lock on anything, as the execution path owns the data entirely.


We do use some simple locks to order transactions and to message between sites. In a multi-partition transaction, locks are used to synchronize messaging and it's likely some parts of the cluster will be idle waiting for messages for some amount of time. This is why multi-partition transactions are so much slower.


In the future, we have ways to get around a large portion of the idleness for many flavors of multi-partition transactions. Some of these ideas are just smarter ways to do what we do now. Others could be described as "optimistic" or "speculative" execution.


"In a single partition procedure, from the moment the stored procedure begins to the moment it ends, there is no need to acquire a lock on anything, as the execution path owns the data entirely."


This sounds like managing concurrency control by eliminating concurrency. I.e., every transaction, whether writing or reading, effectively obtains a global read/write lock on the entire partition for the duration of the transaction.


Does this imply that in an SMP system, it is necessary to segment memory and allocate one partition per core (and incur the multi-partition performance hit)?


If so, is this really more effective use of the RAM than an in-memory MVCC implementation, where writers do not block readers on a larger contiguous partition?
Yes
jhugg
May 19, 2010
"In a single partition procedure, from the moment the stored procedure begins to the moment it ends, there is no need to acquire a lock on anything, as the execution path owns the data entirely."


This sounds like managing concurrency control by eliminating concurrency. I.e., every transaction, whether writing or reading, effectively obtains a global read/write lock on the entire partition for the duration of the transaction.


Does this imply that in an SMP system, it is necessary to segment memory and allocate one partition per core (and incur the multi-partition performance hit)?


If so, is this really more effective use of the RAM than an in-memory MVCC implementation, where writers do not block readers on a larger contiguous partition?


I.e., every transaction, whether writing or reading, effectively obtains a global read/write lock on the entire partition for the duration of the transaction.


This behavior is actually a core design decision. Most OLTP transactions take microseconds to run to completion, even with many SQL statements. Why bother with thread-safe data-structures or all the bookkeeping that comes with MVCC? In VoltDB, an 8-core machine becomes 8 pipelines of streamlined execution.


Does this imply that in an SMP system, it is necessary to segment memory and allocate one partition per core


It's not only necessary; it's desirable. If you don't do this, then as you add cores, you increase contention for locks. VoltDB's architecture has minimal contention. The primary bottleneck to scaling with cores is memory bandwidth.


(and incur the multi-partition performance hit)?


You're correct that this is a catch. Multi-partition transactions are significantly slower than single partition transactions.


There is some good news though. First, many of the real-world applications we've talked to customers about do partition well, such that the majority of transactions can be made single-partition. Second, over the next few releases, we have some implementation tricks planned that will really speed up many types of multi-partition transactions.


There is also another catch. A single partition transaction that runs for more than 50 milliseconds or so can clog up one of the cores and back up work behind it. This is another reason we don't recommend VoltDB for ad-hoc reporting or any query that reads a non-trivial fraction of multi-gigabyte tables. We've chosen to optimize for OLTP specifically; if you need reporting, VoltDB 1.0.01 will have support live exporting of data suitable for an OLAP-focused database.


If so, is this really more effective use of the RAM than an in-memory MVCC implementation, where writers do not block readers on a larger contiguous partition?


We believe for OLTP-style workloads, with frequent writes and many small, repetitive transactions, that our architecture offers a significant performance improvement, a significantly easier to debug architecture and a very strong transactional guarantee. VoltDB users see data as if it each transaction executed serially and exclusively. This serial ordering also allows us to make strong guarantees about the consistency of local replicas today, and eventually WAN replicas.


Developing high-volume transactional applications is never easy. For some of these applications, VoltDB should make it much easier.


-John Hugg
VoltDB Engineering
Oltp -> olap?
Michael Robinson
May 19, 2010
I.e., every transaction, whether writing or reading, effectively obtains a global read/write lock on the entire partition for the duration of the transaction.


This behavior is actually a core design decision. Most OLTP transactions take microseconds to run to completion, even with many SQL statements. Why bother with thread-safe data-structures or all the bookkeeping that comes with MVCC? In VoltDB, an 8-core machine becomes 8 pipelines of streamlined execution.


Does this imply that in an SMP system, it is necessary to segment memory and allocate one partition per core


It's not only necessary; it's desirable. If you don't do this, then as you add cores, you increase contention for locks. VoltDB's architecture has minimal contention. The primary bottleneck to scaling with cores is memory bandwidth.


(and incur the multi-partition performance hit)?


You're correct that this is a catch. Multi-partition transactions are significantly slower than single partition transactions.


There is some good news though. First, many of the real-world applications we've talked to customers about do partition well, such that the majority of transactions can be made single-partition. Second, over the next few releases, we have some implementation tricks planned that will really speed up many types of multi-partition transactions.


There is also another catch. A single partition transaction that runs for more than 50 milliseconds or so can clog up one of the cores and back up work behind it. This is another reason we don't recommend VoltDB for ad-hoc reporting or any query that reads a non-trivial fraction of multi-gigabyte tables. We've chosen to optimize for OLTP specifically; if you need reporting, VoltDB 1.0.01 will have support live exporting of data suitable for an OLAP-focused database.


If so, is this really more effective use of the RAM than an in-memory MVCC implementation, where writers do not block readers on a larger contiguous partition?


We believe for OLTP-style workloads, with frequent writes and many small, repetitive transactions, that our architecture offers a significant performance improvement, a significantly easier to debug architecture and a very strong transactional guarantee. VoltDB users see data as if it each transaction executed serially and exclusively. This serial ordering also allows us to make strong guarantees about the consistency of local replicas today, and eventually WAN replicas.


Developing high-volume transactional applications is never easy. For some of these applications, VoltDB should make it much easier.


-John Hugg
VoltDB Engineering


"We've chosen to optimize for OLTP specifically; if you need reporting, VoltDB 1.0.01 will have support live exporting of data suitable for an OLAP-focused database."


So, if I understand correctly, you've designed a write-optimized database with aggregate application throughput a couple orders of magnitude faster than current solutions, which scales out to millions of transactions per second on commodity hardware, and for reporting, you're going to stream all that transactional data to an OLAP-focused database which then will have to deal with the problem of managing all the locking and concurrency and writers blocking readers and whatnot to be able to incorporate this monster incoming data torrent into a practical reporting solution.


Is that roughly correct?
Getting good...
jhugg
May 20, 2010
"We've chosen to optimize for OLTP specifically; if you need reporting, VoltDB 1.0.01 will have support live exporting of data suitable for an OLAP-focused database."


So, if I understand correctly, you've designed a write-optimized database with aggregate application throughput a couple orders of magnitude faster than current solutions, which scales out to millions of transactions per second on commodity hardware, and for reporting, you're going to stream all that transactional data to an OLAP-focused database which then will have to deal with the problem of managing all the locking and concurrency and writers blocking readers and whatnot to be able to incorporate this monster incoming data torrent into a practical reporting solution.


Is that roughly correct?


Yes, that's roughly correct.


... you've designed a write-optimized database ...


It's not too shabby at reads, it's just that reads aren't a whole lot faster than writes. Also there are more solutions out there if your pain is scaling reads.


... an OLAP-focused database which then will have to deal with the problem of managing all the locking and concurrency and writers blocking readers and whatnot ...


A couple of notes now that we're getting into details.


1. You don't have to send 100% of your log to an OLAP system. VoltDB exporting allows you to specify a subset of tables that export live. Also, you can define an append-only table (no-reads) that is essentially a transactional queue into the export stream, deleting tuples automatically as they have been acknowledged by the export client. This allows you to do some transformation, aggregation and/or filtering in VoltDB. This can make it much easier to deal with the resulting stream, which, when unconstrained at 1M tps, can get pretty big. There will be more details in the forthcoming 1.0 manual.


2. You can use an ETL tool (Pentaho/Talend/etc) with the export stream to transform the data before you insert it into an OLAP system. Disclaimer: we've never tried this and aren't aware of anyone who has. There's no obvious reason why it wouldn't work though.


3. When faced with a big nail, you can get a big hammer. If you want to do your reporting on a larger (or complete) subset of the log, you can use a tool that can handle the stream. For one, Vertica 4.0 is right around the corner, and we've seen it's pretty capable of handling the stream. We haven't tested with the other players in the OLAP space, but many support pretty hefty insert rates as well. As for a free option (to own, not to run), Hadoop is the only thing that comes to mind at volume. To use anything not MPP is going to be hard, and few of the other free options are. If anyone has a tool they want to use with VoltDB exporting, contact us and we'll discuss how to help make it happen.


4. VoltDB has some limited support for materialized views. Specifically, we can maintain sum and count aggregates for a "group by" statement on any single table. See the docs for details. We hope to expand this functionality in the future, but this allows you to solve some problems that might otherwise be cumbersome.
Thanks for the clarification
Michael Robinson
May 20, 2010
Yes, that's roughly correct.


... you've designed a write-optimized database ...


It's not too shabby at reads, it's just that reads aren't a whole lot faster than writes. Also there are more solutions out there if your pain is scaling reads.


... an OLAP-focused database which then will have to deal with the problem of managing all the locking and concurrency and writers blocking readers and whatnot ...


A couple of notes now that we're getting into details.


1. You don't have to send 100% of your log to an OLAP system. VoltDB exporting allows you to specify a subset of tables that export live. Also, you can define an append-only table (no-reads) that is essentially a transactional queue into the export stream, deleting tuples automatically as they have been acknowledged by the export client. This allows you to do some transformation, aggregation and/or filtering in VoltDB. This can make it much easier to deal with the resulting stream, which, when unconstrained at 1M tps, can get pretty big. There will be more details in the forthcoming 1.0 manual.


2. You can use an ETL tool (Pentaho/Talend/etc) with the export stream to transform the data before you insert it into an OLAP system. Disclaimer: we've never tried this and aren't aware of anyone who has. There's no obvious reason why it wouldn't work though.


3. When faced with a big nail, you can get a big hammer. If you want to do your reporting on a larger (or complete) subset of the log, you can use a tool that can handle the stream. For one, Vertica 4.0 is right around the corner, and we've seen it's pretty capable of handling the stream. We haven't tested with the other players in the OLAP space, but many support pretty hefty insert rates as well. As for a free option (to own, not to run), Hadoop is the only thing that comes to mind at volume. To use anything not MPP is going to be hard, and few of the other free options are. If anyone has a tool they want to use with VoltDB exporting, contact us and we'll discuss how to help make it happen.


4. VoltDB has some limited support for materialized views. Specifically, we can maintain sum and count aggregates for a "group by" statement on any single table. See the docs for details. We hope to expand this functionality in the future, but this allows you to solve some problems that might otherwise be cumbersome.


Much clearer. Thanks.