Forum: Building VoltDB Applications

Post: Aggregation Over Time Window

Aggregation Over Time Window
Jas
Apr 21, 2014
I have millions of transactions per day and I would like to create a materialized view aggregating the count grouping by the type of transaction over a 3 day rolling time window.

In the press release for VoltDB 4.0 it says "..VoltDB can now build a materialized view that aggregates the value of a JSON field by 5-minute time windows." I am hoping this means that VoltDB supports grouping over a time window.

Does someone know how to do this in VoltDB 4.0?
rmorgenstein
Apr 21, 2014
Here is an example of time-series view. We're working on a new sample application for time-series data, but here's the relevant DDL. The view creates 1 minute buckets based on the timestamped data. You could do the same thing with hourly buckets. When selecting, you might want to ignore the newest bucket - as it is still accumulating inserts until the current minute passes (or hour, for hourly buckets).

CREATE TABLE sensordata

(
recvtime TIMESTAMP NOT NULL
, boxid VARCHAR(8) NOT NULL
, lanenum TINYINT NOT NULL
, hovnum TINYINT NOT NULL
, breakdown TINYINT NOT NULL
, xbound TINYINT NOT NULL
, avgspeed SMALLINT NOT NULL
, vehicles SMALLINT NOT NULL
);

PARTITION TABLE sensordata ON COLUMN boxid;

-- rollup of average speed and vehicle count by boxid, lane number, and minute number
CREATE VIEW v_speed_totals_per_minute
(
boxid
, xbound
, bucket
, cnt
, sumspeeds
, sumcars
)
AS
SELECT boxid
, xbound
, TRUNCATE(MINUTE,recvtime)
, COUNT(*)
, SUM(avgspeed)
, SUM(vehicles)
FROM sensordata
WHERE lanenum > 0
AND breakdown = 0
AND avgspeed > 0
GROUP BY boxid
, xbound
, TRUNCATE(MINUTE,recvtime)
;

CREATE INDEX avgspeed ON v_speed_totals_per_minute (sumspeeds/cnt);
new sample application for time-series data
Jas
Sep 1, 2014
Here is an example of time-series view. We're working on a new sample application for time-series data, but here's the relevant DDL. The view creates 1 minute buckets based on the timestamped data. You could do the same thing with hourly buckets. When selecting, you might want to ignore the newest bucket - as it is still accumulating inserts until the current minute passes (or hour, for hourly buckets).

CREATE TABLE sensordata

(
recvtime TIMESTAMP NOT NULL
, boxid VARCHAR(8) NOT NULL
, lanenum TINYINT NOT NULL
, hovnum TINYINT NOT NULL
, breakdown TINYINT NOT NULL
, xbound TINYINT NOT NULL
, avgspeed SMALLINT NOT NULL
, vehicles SMALLINT NOT NULL
);

PARTITION TABLE sensordata ON COLUMN boxid;

-- rollup of average speed and vehicle count by boxid, lane number, and minute number
CREATE VIEW v_speed_totals_per_minute
(
boxid
, xbound
, bucket
, cnt
, sumspeeds
, sumcars
)
AS
SELECT boxid
, xbound
, TRUNCATE(MINUTE,recvtime)
, COUNT(*)
, SUM(avgspeed)
, SUM(vehicles)
FROM sensordata
WHERE lanenum > 0
AND breakdown = 0
AND avgspeed > 0
GROUP BY boxid
, xbound
, TRUNCATE(MINUTE,recvtime)
;

CREATE INDEX avgspeed ON v_speed_totals_per_minute (sumspeeds/cnt);


Did you get around to creating the new sample application for time-series data. I am interested in creating a time series that has 3 day buckets so the view should have 2 columns, bucket date (last day of the 3 day bucket) and a cnt column with the sum for the last three days.

Thanks.
rmorgenstein
Sep 1, 2014
Yes, it is in the downloadable kit at examples/windowing. You can also get it from our repository at https://github.com/VoltDB/voltdb/tree/master/examples/windowing.

Ruth
Index on View
Jas
Sep 1, 2014
Yes, it is in the downloadable kit at examples/windowing. You can also get it from our repository at https://github.com/VoltDB/voltdb/tree/master/examples/windowing.

Ruth


If the view is aggregating by time seconds and on a device id and the view returns millions of rows is it possible to create an index on the view so the store procedure that does the windowing runs faster?
jhugg
Sep 1, 2014
If the view is aggregating by time seconds and on a device id and the view returns millions of rows is it possible to create an index on the view so the store procedure that does the windowing runs faster?


Are you trying something and having a performance problem?

All views in VoltDB are materialized with an implicit index on the group-by columns. You can add additional indexes to the view, but none should be needed to make the windowing operation itself run faster. One of the points of this example is to demonstrate how fast VoltDB can be at this kind of operation without asking the user to do too much.
Jas
Sep 2, 2014
Are you trying something and having a performance problem?

All views in VoltDB are materialized with an implicit index on the group-by columns. You can add additional indexes to the view, but none should be needed to make the windowing operation itself run faster. One of the points of this example is to demonstrate how fast VoltDB can be at this kind of operation without asking the user to do too much.


I will be testing it soon and will let you know if I run into any issues, but I don't think I should since there is an implicit index on the group-by columns. I guess that is used to quickly refresh the materialized view when data is the base table changes.