It could be a viable VoltDB
Dec 20, 2011
Jan 24, 2012
Partitioning is an important part of the design of a VoltDB application, but hopefully it isn't a stumbling block to further analysis and testing of VoltDB. It is something that is relatively easy to change in a test application.
For fast inserts of individual records, data can be partitioned on essential any column that will evenly distribute the data across the partitions, generally anything with incremental or highly variable values, or something categorical if there are enough categories of relatively even volumes. The point is just to spread the processing of transactions evenly across the partitions. If you rarely ever needed to run queries, the partitioning column could be any of these options. Generally there are some queries needed, and so for individual records, the queries often drive the decision of a partitioning column. However, in many cases where you may want to summarize data on more than one key, one key could be used for partitioning and enabling those queries to run in a single partition, and materialized views can be used for summarizing by the other keys. The ultimate partitioning decision may depend on the frequency and complexity of the various queries.
In a double-entry book keeping use case, the partitioning decision is more complex, but still follows the same general guidelines. Rather than looking at any of the columns that may have incremental or variable values as possible partitioning keys, if you require two records to be inserted in a single transaction, the key must be a value that is common for both records so that they are in the same partition. This reduces the options, if it is vital for the records to be inserted together, and if the throughput requirement is high enough that single-partition transactions are needed. If the records do not share any common values, it may require the addition of a column used solely for partitioning where the application inserts a common value for sets of records that will run in a single transaction, such as an incremental ID, a common timestamp, or some other value. Another option may be to put debits in one table, and credits in another, and to copy one of the keys from one record to the corresponding record, i.e. including both the buyer and seller ID on both records, and partitioning both tables on buyer or seller only.
Hopefully this provides some ideas, but I don't presume to understand your application well enough to offer these options as anything more than food for thought. Ultimately this is an issue of designing distributed logic for an application that requires distributed computing throughput vs. using non-distributed logic and operating at single-node scale.
For further reading, consult the Planning Guide Chapter 2, and Using VoltDB Chapter 3.