Announcement

Collapse
No announcement yet.

Partitioning: Table Data and Stored Procedures

Collapse
This is a sticky topic.
X
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Partitioning: Table Data and Stored Procedures

    This post is to help explain single-partition stored procedures that operate on multiple single-partition tables (select, insert, update, delete).
    Sample Schema

    create table table1 (col1 integer not null, value1 varchar(20) not null); create table table2 (col2 integer not null, col1 integer not null, value2 varchar(20) not null); Both tables are partitioned, table1 on col1 and table2 on col2.
    Sample Stored Procedure

    @ProcInfo( partitionInfo = "TABLE1.COL1: 0", singlePartition = true ) public class SampleStoredProcedure extends VoltProcedure { public final SQLStmt selectData = new SQLStmt("select t1.value1, t2.value2 from table1 t1, table2 t2 where t1.col1 = t2.col1 and t1.col1 = ?;"); public VoltTable[] run( int col1 ) { voltQueueSQL(selectData, col1); return voltExecuteSQL(true); } } The issue with the above code is that, as a single-partition stored procedure, it will only operate on data within that partition. Since table1 is partitioned on col1 and table2 is partitioned on col2, the select statement will only include rows from table2 that exist in the partition and not return a complete result set.
    The above code will return the correct result set if you change the partition column of table2 to be col1.
    Rule: You can only join partitioned tables in a stored procedure if they are partitioned on the same key.
    -Tim

  • #2
    Tim, Can a single table be

    Tim, Can a single table be partitioned on mutiple columns and if yes then will the joins on the table work if table2 has partition on col1 and col2 and we join table 1 and table 2 using col1 ?

    Comment


    • #3
      re: partitioning on multiple columns

      Originally posted by shetty_ritesh View Post
      Tim, Can a single table be partitioned on mutiple columns and if yes then will the joins on the table work if table2 has partition on col1 and col2 and we join table 1 and table 2 using col1 ?
      Ritesh,
      In the current release of VoltDB a table can only be partitioned on a single column.
      If you have two discreet INTEGER columns you could create a BIGINT that is the composite of both values as your partitioning column or do something similar for VARCHAR types.
      -Tim

      Comment


      • #4
        I have a problem and I suppose it is related to the one described here, but I do not see how.
        I have 12 tables (emp1, emp2 ... emp12), all identical, defined as this one:

        CREATE TABLE EMP1 (
        EMPNO INTEGER NOT NULL,
        ENAME VARCHAR(10) NOT NULL,
        JOB VARCHAR(9),
        MGR INTEGER,
        HIREDATE VARCHAR(19),
        SAL INTEGER,
        COMM INTEGER,
        DEPTNO INTEGER,
        PRIMARY KEY (EMPNO)
        );
        PARTITION TABLE EMP1 ON COLUMN EMPNO;

        I have, also, a stored procedure that does this:

        select count(*)
        from emp1,emp2,emp3,emp4,emp5,emp6,emp7,emp8,emp9,emp10 ,emp11,emp12
        where emp12.EMPNO=? and emp1.EMPNO>=emp2.EMPNO and emp2.EMPNO>=emp3.EMPNO and emp3.EMPNO>=emp4.EMPNO and emp4.EMPNO>=emp5.EMPNO and emp5.EMPNO>=emp6.EMPNO and emp6.EMPNO>=emp7.EMPNO and emp7.EMPNO>=emp8.EMPNO and emp8.EMPNO>=emp9.EMPNO and emp9.EMPNO>=emp10.EMPNO and emp10.EMPNO>=emp11.EMPNO and emp11.EMPNO>=emp12.EMPNO;

        That store procedure is declared as single-partitioned on all tables:

        CREATE PROCEDURE FROM CLASS ProcedureSelectEmp;
        PARTITION PROCEDURE ProcedureSelectEmp ON TABLE Emp1 COLUMN EMPNO;
        PARTITION PROCEDURE ProcedureSelectEmp ON TABLE Emp2 COLUMN EMPNO;
        PARTITION PROCEDURE ProcedureSelectEmp ON TABLE Emp3 COLUMN EMPNO;
        PARTITION PROCEDURE ProcedureSelectEmp ON TABLE Emp4 COLUMN EMPNO;
        PARTITION PROCEDURE ProcedureSelectEmp ON TABLE Emp5 COLUMN EMPNO;
        PARTITION PROCEDURE ProcedureSelectEmp ON TABLE Emp6 COLUMN EMPNO;
        PARTITION PROCEDURE ProcedureSelectEmp ON TABLE Emp7 COLUMN EMPNO;
        PARTITION PROCEDURE ProcedureSelectEmp ON TABLE Emp8 COLUMN EMPNO;
        PARTITION PROCEDURE ProcedureSelectEmp ON TABLE Emp9 COLUMN EMPNO;
        PARTITION PROCEDURE ProcedureSelectEmp ON TABLE Emp10 COLUMN EMPNO;
        PARTITION PROCEDURE ProcedureSelectEmp ON TABLE Emp11 COLUMN EMPNO;
        PARTITION PROCEDURE ProcedureSelectEmp ON TABLE Emp12 COLUMN EMPNO;

        Why, if I run this without partitioning, it shows me about 2M results (that is the right response) and if I run with partitioning enabled, it shows me only 300 results?
        Is the problem you described here? But I'm using always the EMPNO column, wich is the one with the key of the partitioning...
        Thanks in advance for your reply.
        Sincerely,
        Diego

        Comment


        • #5
          Diego,

          All of the "PARTITION PROCEDURE ProcedureSelectEmp ..." statements after the first one are redundant.
          The TABLE and COLUMN specifiers merely identify the type of the procedure's partitioning parameter.
          A procedure is either single-paritition (with respect to all partitioned tables) or it is multi-partition (with respect to all partitioned tables).
          It is the responsibility of every single-partition stored procedure to only issue queries that can be satisfied by data (in ANY table) residing on the single partition associated with the value of the partitioning parameter -- and its type as implied by the PARTITION PROCEDURE statement.
          In the case of data in partitioned tables, this is typically accomplished by a where clause that restricts the partitioning column of the paritioned table(s) to have the value of the partitioning parameter.
          Your stored procedure does not give correct results as a single partition procedure because it does not meet this restriction. Only table emp12 is properly filtered.
          Some queries, this one included, can only be satisfied by accessing data on multiple partitions via a multi-partition procedure or an ad hoc statement. An ad hoc statement is treated as implicitly multi-partitioned unless it is properly filtered.
          For example, if you replaced the ">=" filters in the query with "=" filters, the procedure would properly filter all tables so that they comply with single partition access and would run correctly within or as a single-partition procedure or as a single-partition ad hoc statement.

          Thanks for your question.
          --paul

          Comment

          Working...
          X