Results 1 to 6 of 6

Thread: VoltDB partitioning

  1. #1
    New Member
    Join Date
    Apr 2014
    Posts
    3

    VoltDB partitioning

    Lets say I have a table

    id col1 col2 col3
    1 ABC DEF XYZ
    2 AAA BBB CCC


    Most frequent queries is going to be

    SELECT * from XYZ where col1='abc' and col2='def'
    SELECT * from XYZ where col1='abc' and col2='def' and col3='xyz'


    As per VoltDB docs:-

    1. Partition should be done on column on which most searches are going to be performed
    2. Partitioning should be done on one column


    I couldn't find any example where search is performed on multiple columns.

    I wonder what is the best way to partition table for multi column searches?

  2. #2
    Super Moderator
    Join Date
    Jun 2013
    Posts
    23
    Hi,

    For your use case, I would partition the table on column 'col1'. As you have multiple filters in WHERE clause, I would suggest building an index on those columns, like
    -- CREATE INDEX index1 ON XYZ (col1,col2,col3).

    In order to check whether your query is optimized or not, you can check the query plan by using Explain for AdHoc queries or ExplainProc for queries in stored procedures. Related documentation links can be found at:
    https://voltdb.com/docs/UsingVoltDB/sysprocexplain.php
    https://voltdb.com/docs/UsingVoltDB/...xplainproc.php

    Xin

    Quote Originally Posted by atul.pundhir View Post
    Lets say I have a table

    id col1 col2 col3
    1 ABC DEF XYZ
    2 AAA BBB CCC


    Most frequent queries is going to be

    SELECT * from XYZ where col1='abc' and col2='def'
    SELECT * from XYZ where col1='abc' and col2='def' and col3='xyz'


    As per VoltDB docs:-

    1. Partition should be done on column on which most searches are going to be performed
    2. Partitioning should be done on one column


    I couldn't find any example where search is performed on multiple columns.

    I wonder what is the best way to partition table for multi column searches?

  3. #3
    Super Moderator
    Join Date
    Feb 2012
    Posts
    42
    "The column on which most searches are going to be performed" is intended to include queries that filter multiple columns.
    So, for your two example queries, either col1 or col2 is a candidate for the partition column.

    If your third most frequent query were " SELECT * from XYZ where col1='abc' and col3='xyz' " (eliminating col2), col1 would be the better choice.
    If your third most frequent query were " SELECT * from XYZ where col1>'abc' and col2='def' " (only range-filtering col1), col2 would be the better choice.
    Only equality filters should be considered when choosing a partitioning column.

    Note that queries that use a column in JOIN filters that equate 2 keys rather than a filter equating a key and a constant can also influence choice of the partitioning column.
    This is particularly true when joining a large table with itself to get pairs of rows with the same column value or when joining multiple large tables with a matching column.
    A join between partitioned tables must always specify an equality filter between the partition column on each table. This will often be a primary key of one table that is referenced by a column in the other table.

    If two candidate partitioning columns seem to be used with about the same frequency, choose the column that has the larger number of unique values.

  4. #4
    New Member
    Join Date
    Apr 2014
    Posts
    3
    Quote Originally Posted by pmartel View Post
    "The column on which most searches are going to be performed" is intended to include queries that filter multiple columns.
    So, for your two example queries, either col1 or col2 is a candidate for the partition column.

    If your third most frequent query were " SELECT * from XYZ where col1='abc' and col3='xyz' " (eliminating col2), col1 would be the better choice.
    If your third most frequent query were " SELECT * from XYZ where col1>'abc' and col2='def' " (only range-filtering col1), col2 would be the better choice.
    Only equality filters should be considered when choosing a partitioning column.

    Note that queries that use a column in JOIN filters that equate 2 keys rather than a filter equating a key and a constant can also influence choice of the partitioning column.
    This is particularly true when joining a large table with itself to get pairs of rows with the same column value or when joining multiple large tables with a matching column.
    A join between partitioned tables must always specify an equality filter between the partition column on each table. This will often be a primary key of one table that is referenced by a column in the other table.

    If two candidate partitioning columns seem to be used with about the same frequency, choose the column that has the larger number of unique values.

    Thanks for the answer. However, What is my queries are "SELECT * from XYZ where col1 IN ('ABC', '....', '...') AND col2 IN ('XYZ', 'DEF')"

  5. #5
    New Member
    Join Date
    Apr 2014
    Posts
    3
    Quote Originally Posted by xin View Post
    Hi,

    For your use case, I would partition the table on column 'col1'. As you have multiple filters in WHERE clause, I would suggest building an index on those columns, like
    -- CREATE INDEX index1 ON XYZ (col1,col2,col3).

    In order to check whether your query is optimized or not, you can check the query plan by using Explain for AdHoc queries or ExplainProc for queries in stored procedures. Related documentation links can be found at:
    https://voltdb.com/docs/UsingVoltDB/sysprocexplain.php
    https://voltdb.com/docs/UsingVoltDB/...xplainproc.php

    Xin
    Thanks for the answer. However, What if my queries are "SELECT * from XYZ where col1 IN ('ABC', '....', '...') AND col2 IN ('XYZ', 'DEF')". Will it make sense to partition the data? I think voltdb will have to go to all partitions to fetch the data.

  6. #6
    Super Moderator
    Join Date
    Jun 2013
    Posts
    23
    Yes. If your table XYZ is partitioned on col1, VoltDB has to fetch data from multiple partitions for your new query. In generally, if your table XYZ has been read a lot without updating very often, you can design that table to be replicated if you can expect the size of the table is not too big for your cluster.

    For the questions like "Whether a table needs to be partitioned or not", you can find more information on page with link next:
    http://voltdb.com/docs/UsingVoltDB/C...esignPartition

    By the way, VoltDB currently does not support using multiple IN with an index. So for your query with index on (col1,col2), VoltDB only use this index for the first IN filter col1, the second filter for col2 will be as a normal predicate. That means VoltDB has to scan all the records satisfying the first IN filter in the table XYZ to validate the second filters. More query optimization with multiple IN filters are in our backlog with ticket number ENG-5549: https://issues.voltdb.com/browse/ENG-5549

    Please let us know if you have more questions.

    Xin



    Quote Originally Posted by atul.pundhir View Post
    Thanks for the answer. However, What if my queries are "SELECT * from XYZ where col1 IN ('ABC', '....', '...') AND col2 IN ('XYZ', 'DEF')". Will it make sense to partition the data? I think voltdb will have to go to all partitions to fetch the data.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •