No announcement yet.

Single partition stored procedures clarifications

  • Filter
  • Time
  • Show
Clear All
new posts

  • Single partition stored procedures clarifications

    Hi everyone,

    I recently posted a question regarding single partition stored procedure .

    I was under the impression that in order for a custom stored procedure do be single partition we need 2 things:
    1. when defining the new custom stored procedure in the server we need to use partition on in the sqlcmd.
    2. in the stored procedure sql we need to execute queries that work on a partition (for example "select * from sometable where id=1" where id is the partition for sometable). However looking at the ContinuousDeleter example I encountered the DeleteAfterDate stored procedure. It seems like the custom delete stored procedure receives the partitionValue as an input parameter (the first argument to the run method partitionValue) even though it (the custom stored procedure) doesn't use it anywhere in its code.

    And so my question is: is it a requirement for a single stored procedure to receive the partitionkey as its argument or is it just a mistake in the example code? If it's a requirement could you explain what happens behind the scenes with this value? Going back to basics on how to create a stored procedure it seems like passing the partition value as the first argument to the run method in the custom stored procedure is not a requirement, but the DeleteAfterDate example shows something different.

    Another question I had regarding writing my own custom stored procedure is: you can also annotate a stored procedure as single partition using the @procInfo or define it when you save it in the server using "partition on", what's the difference between the 2? is there any benefit for using 1 of the approaches vs the other?

    As always, thanks in advance for the help :)

  • #2
    This is an astute question. While the most common case is as you mentioned, where the partition on <tablename> column <columname> in the DDL to corresponds to "SELECT * FROM <tablename> WHERE <columname> = ?" in the query, but that does not have to be the case. It is possible to run SQL statements within a single-partition procedure that do not use the partition key input parameter. You cannot insert a record into partitioned table in the wrong partition, that will cause an error which will roll back the transaction. But you can run queries against replicated tables that do not involve the partition key, or even against partitioned tables if you wish.

    The ContinuousDeleter client and DeleteAfterDate procedure are examples of the "Run Everywhere" pattern, in which the partitioning parameter to the stored procedure is arbitrary and used to cause the procedure to run on a single partition, but the SQL is then not necessarily specific to that partition. This can be useful for low-priority maintenance type activities like deleting expired records, where you want to run in a single-partition procedure for efficiency, or sometimes if you want to run queries on a sampling of data, or if you wanted to use client-side processing on the results of queries that were forced to run independently in each partition. There is a blog about this pattern here:


    • #3
      Specifically in answer to your questions, the syntax for partioning a procedure requires a table and column name, but also can have a PARAMETER index number which is optional because it defaults to 0 (the first parameter, it counts by z). If the partition key value was the second parameter, rather than the first, then you would have to specify "PARAMETER 1". The little known truth is that you could specify any table name and column name you wanted, as long as the column is in fact the partition key of that table, and the datatype of the column matches the datatype of the input parameter. What determines which partition the procedure runs in is the VALUE of the input parameter.

      The @ProcInfo annotation is an older way of partitioning the procedure that has been deprecated. There was no advantage to using it. The DDL syntax is recommended.


      • #4
        Thanks for the explanation it made a lot of clarifications.

        So just to make sure I got it right: if my SQL queries are "Run Everywhere" and I want to make them single partition, I need a way to let VoltDB know how to send the stored procedure to each partition. By passing the partition value to the run method I allow these "Run Everywhere" methods to be executed as single partition?

        The more I get to know the less I find that I understand :|


        • #5
          A procedure is either partitioned or not. If it is partitioned, the value of the parameter is what determines which partition it will run in. Once there, the database will check against illegal inserts that violate the partitioning key of tables, but otherwise the procedure is allowed to run any SQL, however the SQL statements executed within a single-partition procedure can only access data within that partition.

          "Run Everywhere" is just how we describe the pattern of usage where you have partitioned a procedure somewhat arbitrarily and call the procedure once for each partition using values provided from the GetPartitionKeys system procedure.


          • #6
            Got it. Thanks a lot