Forum: Building VoltDB Applications

Post: Avg with materialized views.

Avg with materialized views.
Jan 24, 2017
How come we can't avg with Materialized views? I assume cause it would have recalculate all the time right?

So what kind of analytics are materialized views used mostly for? Is it mostly to count stuff?

So how do we do analytics for avg?
Jan 24, 2017
Yes, the columns in a materialized view are recalculated every time the underlying table is modified. There are restrictions on how a view can be defined so that the values can be maintained incrementally.

To avoid introducing floating point arithmetic error with every increment of an AVG() column, it would be better to internally store SUM() and COUNT() separately, and calculate the AVG() only when the view is queried. We simply haven't prioritized something like this yet, as it is very easy to define SUM() and COUNT() columns in the view and divide them to get the average when querying the view.

Average is a very common use case, as well as counts, sums, mins and maxes. It is also common to use expressions or functions within the GROUP BY columns, so you can get metrics for a unit of time, or other derived groupings. For example, if you wanted to get the average value for some sensor minute by minute, you might do something like this:

CREATE TABLE sensor_values (
sensor_id BIGINT not null,
value DECIMAL,
PARTITION TABLE sensor_values ON COLUMN sensor_id;

CREATE VIEW sensor_values_minutely AS
TRUNCATE(MINUTE,ts) as minute,
COUNT(*) as num_readings,
SUM(value) as sum_value
FROM sensor_values

Then you can query the last however many minutes of average readings for a given sensor like this:

SELECT sensor_id, minute, sum_value/num_readings AS avg_value
FROM sensor_values_minutely
WHERE sensor_id = ? and minute > ?;

Each view automatically has a primary key index on the GROUP BY columns. So if you wanted the values for ALL sensors within a given range of minutes, you might then define the view with the minute column first, followed by the sensor_id. Or, if you needed to query both ways, you could add a secondary index with the columns in the opposite order.