Forum: VoltDB Architecture

Post: What are the downsides of Voltdb using a command log?

What are the downsides of Voltdb using a command log?
ccherng
Jan 16, 2013
The documentation indicates that Voltdb uses a command log instead of a write ahead log. Why do the standard sql databases use a write ahead log and not a command log? And what do they gain by using a write ahead log?
bballard
Jan 16, 2013
In most disk-based relational databases, a write-ahead log consists of both UNDO and REDO information and is in the form of changes to the data. For example, if you are updating a record, the UNDO contains the "before" copy of that record, and REDO contains the "after". This architecture involves multiple disk reads and writes in the execution of a transaction.

In VoltDB, each transaction is a procedure invocation that is committed upon successful completion otherwise rolled back. We do use an UNDO log, but it only needs to be kept until the procedure completes successfully, so it is stored in memory for only one transaction at a time. Rather than a REDO log that contains "after" records, we store the transaction invocation data. This consists of the timestamp and transaction id that were assigned when the request was received, along with the inputs. In some cases this is significantly smaller than the resulting changes to data, but also it does not need to wait for the execution of the procedure in order to be written to disk. Writes to the command log are sequential, one write per transaction, and are only read during recovery from disk. If a transaction results in a rollback, it doesn't affect the command log, i.e. the command remains in the log and will produce the same result when it is replayed.

Dr. Stonebraker talks about this and other architectural differences between VoltDB and traditional databases in the webinar "OldSQL vs. NoSQL vs. NewSQL on New OLTP" which you can find here: (http://voltdb.com/dig-deeper/webinars.php). For further reading, you may be interested in the academic papers here (http://hstore.cs.brown.edu/publications/) from the development of H-Store, which VoltDB was initially based on.
ccherng
Jan 17, 2013
In most disk-based relational databases, a write-ahead log consists of both UNDO and REDO information and is in the form of changes to the data. For example, if you are updating a record, the UNDO contains the "before" copy of that record, and REDO contains the "after". This architecture involves multiple disk reads and writes in the execution of a transaction.

In VoltDB, each transaction is a procedure invocation that is committed upon successful completion otherwise rolled back. We do use an UNDO log, but it only needs to be kept until the procedure completes successfully, so it is stored in memory for only one transaction at a time. Rather than a REDO log that contains "after" records, we store the transaction invocation data. This consists of the timestamp and transaction id that were assigned when the request was received, along with the inputs. In some cases this is significantly smaller than the resulting changes to data, but also it does not need to wait for the execution of the procedure in order to be written to disk. Writes to the command log are sequential, one write per transaction, and are only read during recovery from disk. If a transaction results in a rollback, it doesn't affect the command log, i.e. the command remains in the log and will produce the same result when it is replayed.

Dr. Stonebraker talks about this and other architectural differences between VoltDB and traditional databases in the webinar "OldSQL vs. NoSQL vs. NewSQL on New OLTP" which you can find here: (http://voltdb.com/dig-deeper/webinars.php). For further reading, you may be interested in the academic papers here (http://hstore.cs.brown.edu/publications/) from the development of H-Store, which VoltDB was initially based on.


Is it correct to infer that most disk-based relational databases use a write-ahead log in the form of changes to the data because they are very old code bases? That is, a disk-based relational databases written from scratch today would probably use a command log?
bballard
Jan 17, 2013
No, I wouldn't infer that. Command logging is appropriate for VoltDB's architecture, which is specialized for high velocity workloads.
ccherng
Jan 17, 2013
No, I wouldn't infer that. Command logging is appropriate for VoltDB's architecture, which is specialized for high velocity workloads.


So then I'm confused. When does a write-ahead log architecture give advantages over a command log?
bballard
Jan 17, 2013
A WAL would be more efficient for workloads where the changes to data are on average smaller than the commands that caused them.

There is also an efficiency trade-off on recovery. With a command log, you have to re-execute the commands, vs. applying changes.

Also, a command log requires that the commands be deterministic. If all you have is the command, not the changes to data, you need to ensure if you run the command again on recovery that it will produce the same result. This isn't easy and requires some enforcement. A WAL makes this easier because it logs the results of the initial execution, whatever they happen to be.
bballard
Jan 17, 2013
A WAL would be more efficient for workloads where the changes to data are on average smaller than the commands that caused them.

There is also an efficiency trade-off on recovery. With a command log, you have to re-execute the commands, vs. applying changes.

Also, a command log requires that the commands be deterministic. If all you have is the command, not the changes to data, you need to ensure if you run the command again on recovery that it will produce the same result. This isn't easy and requires some enforcement. A WAL makes this easier because it logs the results of the initial execution, whatever they happen to be.