Forum: Building VoltDB Applications

Post: Effect of Partitioning Stored Procedure, Partitioned and Replicated Tables

Effect of Partitioning Stored Procedure, Partitioned and Replicated Tables
Nov 24, 2016
I am trying to improve the VoltDB performance in terms of TPS for TPCC queries. Now I am using some trial and error method to get higher performance. The things which I have tried are:
1) I have replicated the tables, warehouse, item and district.
2) I have partitioned all other tables on column 'district id' rather than 'warehouse id'.
3) I have also partitioned all stored procedures. (Actually when I used all queries as stored procedures without partitioning I got worse TPS than using AdHoc queries. But as soon as I partitioned the stored procedures as recommended by warnings while VoltDB server startup, I got almost double TPS)

Hence my question is , is there any thumb rule for partitioning and replicating the tables, which column is used for partitioning and when to use AdHoc queries, Stored Procedures and Partitioned Stored Procedure? so that for large number of tables I can make the firm decision regarding all these facts.
Nov 25, 2016
Have you looked at our tpc-c implementation? You will notice that all tables are partitioned on the warehouse id. I'm not sure we've tried your method - of using district id. Is it much faster?

Here are a few things to keep in mind (and your results confirm these rules)
- Partitioned stored procedures are the fastest.
- An ad-hoc query that can determine partitioning will also be pretty fast. For example, if your statement includes the district_id, the planner figures this out and runs a single-partition adhoc query.
- Multi-partition procedures and ad-hoc queries will be the slower.

I'd be curious to see a comparison between warehouse partition (as we do in our implementation) and district partitioning.

Nov 26, 2016
Thanks for the guidance. Yes, from the experience that I came across, using the partition on district_id is faster that warehouse_id. I have also seen your tpc-c implementation and hence I initially used warehouse_id for partitioning but later I just tried district_id partitioning.
Overhead of partitioning
Nov 26, 2016
When I was going through the documentation of partitioning, I came across the overhead involved in managing partitions. What kinds of overheads are there? Currently I am working with 5 partitions and my cpu is 24 core without hyper-threading and I am getting the best TPS as compared to other number of partitions. Can you please guide me on this issue?