Forum: Building VoltDB Applications

Post: Catalog report complains about insufficient ordering

Catalog report complains about insufficient ordering
yzhu
Jun 16, 2014
Hi there,

We have this message showin in the catalog report:
Procedure getTag is RW and has a statement whose result has a non-deterministic ordering - statement: "SELECT tag FROM document WHERE master_id = ? AND document_id = master_id ;", reason: index scan may provide insufficient ordering

Here is the table definition. This table stores documents, and each of document may have many parts, both parts and the documents are stored in the same table.
CREATE TABLE Document
(
document_id VARCHAR(255) NOT NULL
,master_id VARCHAR(255) NOT NULL //FK to the master table
,tag VARCHAR(32) NOT NULL
,content VARBINARY(131071) DEFAULT NULL
,PRIMARY KEY (document_id, master_id)
);

CREATE INDEX Document_master_id_idx ON Document (master_id);

PARTITION TABLE Document ON COLUMN master_id;

Here is the SQL select statment in question: This question returns the tag information for a master document:

SELECT tag FROM Document WHERE master_id = ? AND document_id = master_id ; //if

I was wondering if you could help us explain why the catelog report complains about the insufficient ordering and how to resolve this complaint?

Regards,
Yaoling
xin
Jun 16, 2014
Hi Yaoling,

The select query results in your GetTag procedure is not deterministic, which is important to RW stored procedure. A simple way to get around this is to add a order by column on tag, which will be something like
"SELECT tag FROM document WHERE master_id = ? AND document_id = master_id ORDER BY tag;"

The reason that VoltDB require strong determinism for RW stored procedure is that VoltDB wants to get consistent results when replaying the transaction, either on VoltDB replica or data restore.

Let us know if you have more questions.

Thanks,
Xin

Hi there,

We have this message showin in the catalog report:
Procedure getTag is RW and has a statement whose result has a non-deterministic ordering - statement: "SELECT tag FROM document WHERE master_id = ? AND document_id = master_id ;", reason: index scan may provide insufficient ordering

Here is the table definition. This table stores documents, and each of document may have many parts, both parts and the documents are stored in the same table.
CREATE TABLE Document
(
document_id VARCHAR(255) NOT NULL
,master_id VARCHAR(255) NOT NULL //FK to the master table
,tag VARCHAR(32) NOT NULL
,content VARBINARY(131071) DEFAULT NULL
,PRIMARY KEY (document_id, master_id)
);

CREATE INDEX Document_master_id_idx ON Document (master_id);

PARTITION TABLE Document ON COLUMN master_id;

Here is the SQL select statment in question: This question returns the tag information for a master document:

SELECT tag FROM Document WHERE master_id = ? AND document_id = master_id ; //if

I was wondering if you could help us explain why the catelog report complains about the insufficient ordering and how to resolve this complaint?

Regards,
Yaoling
yzhu
Jun 16, 2014
Hi Xin,

Thanks for your reply. Would it increase the latency on a select statement and add additional workloads on the voltDB clusters if we added order-by statement?

My understanding on solving not deterministic queries is to either add an order-by clause to the select statement or using a tree index to scan records in a specific order from the VoltDB documentations. Would the PK and the additional index "Document_master_id_idx" we created not enforce the deterministic rule? Does a tree index have to be on the "select" part instead of "where" part?

We also have more complicated queries such as the following one:

SELECT a.name, b.value, b.context, b.expiry_date from TableX a, TableY b where a.name =? and a.name=b.name and b.expiry_date between ? and ?.

Shall we add the clause "order by name, value, context, expiry_date" to get around this ordering issue?

I look forward to your reply.

Thanks,
Yaoling
xin
Jun 16, 2014
Yes, Adding the extra ORDER BY clause will do extra work indeed. Your select query will pick up the index you defined to use. However, it is not deterministic because of possible duplicates on master_id column. If you can make the index unique, then there is no need for the extra ORDER BY clause.

So If the write query in your stored procedure does not relay on the results of your select queries, you can ignore the warning safely as you do not modify the data randomly.

For performance on your complicated case, you'd better define an [Maybe Hash] index on "name" column for TableX and another tree index on "expiry_date" column for tableY.
If non of your index is unique, you would probably need to add "order by name, value, context, expiry_date". If assuming "a.name" is unique, you may not need the extra ORDER BY clause. Again, if your stored procedure does not modify the data based on your select queries, you can safely ignore the WARNING message.

Let us know if this helps.

Xin