Forum: Building VoltDB Applications

Post: Is my index not being used?

Is my index not being used?
pmarch
Aug 11, 2011
Let's say I have a couple tables and associated indexes:

CREATE TABLE Comment (
Comment_ID INT NOT NULL ,
User_ID INT NOT NULL ,
Parent_Thing_ID INT NOT NULL,
Parent_Comment_ID INT NOT NULL,
Conversation_ID INT NOT NULL,
Comment_Type_ID INT NOT NULL,
User_Group_ID INT NOT NULL,
Text VARCHAR(1024) NOT NULL,
Created BIGINT NOT NULL ,
Updated BIGINT NOT NULL ,
Deleted BIGINT ,
Delete_User_ID INT ,
Is_Deleted INT NOT NULL ,
PRIMARY KEY (Comment_ID) );
CREATE INDEX Comment_IDX_1 ON Comment ( User_ID );
CREATE INDEX Comment_IDX_2 ON Comment ( Is_Deleted );
CREATE INDEX Comment_IDX_3 ON Comment ( User_ID, Is_Deleted );

CREATE TABLE Temp_ID_Table_1 (
ID INT NOT NULL ,
PRIMARY KEY (ID) );

The Comment table is populated with a little over 3 million rows. The Temp... table can be populated with anywhere from 0-10000 rows (the timings below are for 10000). The number of Comment records with Is_Deleted = 0 is the total record count minus a few thousand (so, almost all of them).

Query 1: Runs in ~500ms
SELECT C.User_ID, COUNT(C.Comment_ID) as comments FROM Comment C, Temp_ID_Table_1 TEMP WHERE C.User_ID = TEMP.ID AND C.Is_Deleted = 0 GROUP BY C.User_ID;

Query 2: Runs in ~5 ms
SELECT C.User_ID, COUNT(C.Comment_ID) as comments FROM Comment C, Temp_ID_Table_1 TEMP WHERE C.User_ID = TEMP.ID GROUP BY C.User_ID;

Can you explain why Query 1 with the inclusion of C.Is_Deleted = 0 in the WHERE clause is so much slower than when it is left out? I tried using only IDX_1 and IDX_2 - then only IDX_3 - then all of them together.

I'm seeing behavior like this in more than this case - it's as if indexes are used in one case and not the other - is there anything I should be doing differently to get similar speed in both cases?
You can use join order hinting
nshi
Aug 12, 2011
pmarch,

The reason query 1 takes much longer than query 2 is because the planner plans it as an index scan on the Comment table using Comment_IDX_2, followed by a nest-loop index join on the two tables. Since records with Is_Deleted = 0 is close to the total record count in Comment, it takes much longer.

On the other hand, query 2 is planned as a sequential scan on the Temp table, followed by a nest-loop index join. The planner did not do this for query 1 because it thinks a sequential scan is much more expensive than an index scan.

In this case, you can use join order hinting to force the planner to plan query 1 the same way it plans query 2. To do so, you provide a list of comma-separated table names as a second argument to the constructor of the SQLStmt class. The order you list the tables will be used as the join order. As an example, for query 1, you can write it as the following,

new SQLStmt("SELECT C.User_ID,  COUNT(C.Comment_ID) as comments FROM Comment C, Temp_ID_Table_1 TEMP  WHERE C.User_ID = TEMP.ID AND C.Is_Deleted = 0 GROUP BY C.User_ID;",  "Temp_ID_Table_1,Comment");


The planner will plan it using index Comment_IDX_1. If you do not have Comment_IDX_1 and Comment_IDX_2, it will use Comment_IDX_3.

Ning
That's great info...
pmarch
Aug 12, 2011
pmarch,

The reason query 1 takes much longer than query 2 is because the planner plans it as an index scan on the Comment table using Comment_IDX_2, followed by a nest-loop index join on the two tables. Since records with Is_Deleted = 0 is close to the total record count in Comment, it takes much longer.




Your explanation of why join order hinting would help and your example of how to use it in this case was most helpful. I believe with a little thought, this will help me in a couple other areas as well. Thank you for your excellent reply.
Explain plans
nshi
Aug 12, 2011
Your explanation of why join order hinting would help and your example of how to use it in this case was most helpful. I believe with a little thought, this will help me in a couple other areas as well. Thank you for your excellent reply.


pmarch,

VoltDB compiler writes the plans it generates for SQL statements in a directory called "debugoutput" in your application root. If you look into the "statement-winner-plans" subdirectory, you will find the plans for all your SQL statements.

Ning