Forum: VoltDB Architecture

Post: Index statistics in query planner

Index statistics in query planner
petrohi
Sep 30, 2010
Does VoltDB consider index statistics when choosing query execution plan?


For example in SQL "select id from a, b where a.id=b.id" the execution time may depend a lot on ordering of join when executed with inner loop index lookups. Consider table a has one million rows and b has 100 row.


Now using "a" as outer table in inner loop will require one million lookups in "b", while using table "b" as outer will require only 100 lookups in "a".
re: Index statistics in query planner
tcallaghan
Oct 1, 2010
VoltDB currently plans the execution of SQL statements at compile time, so statistical information is not available. The SQL planner looks at the available indexes to determine the optimal execution plan.


We may add functionality in the future to allow "hinting" within SQL or for users to provide table statistics (estimated row counts, number of unique values by column, etc.) to improve execution plans.


-Tim
re: Index statistics in query planner
petrohi
Oct 1, 2010
VoltDB currently plans the execution of SQL statements at compile time, so statistical information is not available. The SQL planner looks at the available indexes to determine the optimal execution plan.


We may add functionality in the future to allow "hinting" within SQL or for users to provide table statistics (estimated row counts, number of unique values by column, etc.) to improve execution plans.


-Tim


In the VoltDB source code, inside the frontend I see that planner is using DatabaseEstimates object with max/min number of tuples per table to compute plan cost.


Is there a way to provide these estimated values in project.xml or DDL SQL? What are they derived from if not provided?
re: Index statistics in query planner
petrohi
Oct 1, 2010
VoltDB currently plans the execution of SQL statements at compile time, so statistical information is not available. The SQL planner looks at the available indexes to determine the optimal execution plan.


We may add functionality in the future to allow "hinting" within SQL or for users to provide table statistics (estimated row counts, number of unique values by column, etc.) to improve execution plans.


-Tim


Ah, I see table estimates are hardcoded for 1M/100K max/min.


When on the road map are you planning to introduce user provided estimates that you mentioned?
Scheduled as needed.
jhugg
Oct 3, 2010
Ah, I see table estimates are hardcoded for 1M/100K max/min.


When on the road map are you planning to introduce user provided estimates that you mentioned?


As you saw, we have not yet fully implemented a cost-based optimizer that considers the size of tables or the cardinality/distribution of columns.


Completing this work is on the schedule on an as needed basis for the moment. Right now, most of the workloads we've seen are quite quick (if not always perfectly optimal) using the current optimizer. If we run into a customer with a legitimate use case that performs much worse than expected, we try to figure out why, and incrementally try to improve the optimizer.


Now is when I ask if you're interested because you have a VoltDB app that runs slower than expected? If so, we'd love to hear more about it. As I said, if it's a good VoltDB use case, we'll do what we can to make it run faster in future versions of VoltDB.
re: Scheduled as needed.
petrohi
Oct 6, 2010
As you saw, we have not yet fully implemented a cost-based optimizer that considers the size of tables or the cardinality/distribution of columns.


Completing this work is on the schedule on an as needed basis for the moment. Right now, most of the workloads we've seen are quite quick (if not always perfectly optimal) using the current optimizer. If we run into a customer with a legitimate use case that performs much worse than expected, we try to figure out why, and incrementally try to improve the optimizer.


Now is when I ask if you're interested because you have a VoltDB app that runs slower than expected? If so, we'd love to hear more about it. As I said, if it's a good VoltDB use case, we'll do what we can to make it run faster in future versions of VoltDB.


Our database schema consists of tens of narrow tables. In each number of rows may vary in orders of magnitude. Workload is a) inserts in most of these tables b) selects with average of six joins between these tables with ordering and small limits.


In other database engines we often seen significant improvements in performance when planner was smart about reordering joins.
re: query planning
tcallaghan
Oct 6, 2010
Our database schema consists of tens of narrow tables. In each number of rows may vary in orders of magnitude. Workload is a) inserts in most of these tables b) selects with average of six joins between these tables with ordering and small limits.


In other database engines we often seen significant improvements in performance when planner was smart about reordering joins.


Petrohi,


How far along are you in the design and implementation of your use case in VoltDB? I'd like to see your performance numbers and their associated query plans when you have it available.


Please email anything you have to support@voltdb.com.


-Tim