Forum: Building VoltDB Applications

Post: Should partition key be included in secondary index ?

Should partition key be included in secondary index ?
gambitg
Feb 7, 2012
If a table is partitioned by column A and I wish to have an index on column B, should the secondary index by on (A,B) or just on B ?
Hi, You can put the index on
aweisberg
Feb 7, 2012
Hi,

You can put the index on just B. You do have to be aware of what you are doing because from a single partition procedure the index won't contain entries for rows in other partitions and will contain rows for each value of B that may be from values of column A that are different from the one used to route your stored procedure.

It might help to know more about what your are trying to index on and what query you want to use the index for.

-Ariel
Re: Secondary Index
gambitg
Feb 8, 2012
Hi,

You can put the index on just B. You do have to be aware of what you are doing because from a single partition procedure the index won't contain entries for rows in other partitions and will contain rows for each value of B that may be from values of column A that are different from the one used to route your stored procedure.

It might help to know more about what your are trying to index on and what query you want to use the index for.

-Ariel



The query will always include both A and B in the where clause. So I think what you mentioned (index on just B) should suffice.

Is there a scenario in general, in which one would want to put index on both A and B ?

Thanks.
Hi, If you only make an index
aweisberg
Feb 8, 2012
The query will always include both A and B in the where clause. So I think what you mentioned (index on just B) should suffice.

Is there a scenario in general, in which one would want to put index on both A and B ?

Thanks.


Hi,

If you only make an index on B but query on A,B then the index scan will have to scan all rows that match B even if they have the wrong value of A. If A,B is a primary key you could declare that and that would generate an index. If B is unique for each A then you would save some space in the index by not including A.

An index scan can only use the index up until it is covering so A = ?, ( B >[=] ? | B <[=] ? | B = ?) could use the index, but if the predicate for A is not equality then the index can't be used.

-Ariel
In your case, if column B is
nshi
Feb 8, 2012
The query will always include both A and B in the where clause. So I think what you mentioned (index on just B) should suffice.

Is there a scenario in general, in which one would want to put index on both A and B ?

Thanks.


In your case, if column B is non-unique and it has low cardinality, then having an index on both A and B would help.

In other cases, if an order by on both A and B is required, having a tree index on these two columns would save a sort because an index scan preserves the order.