No announcement yet.

Avg with materialized views.

  • Filter
  • Time
  • Show
Clear All
new posts

  • Avg with materialized views.

    How come we can't avg with Materialized views? I assume cause it would have recalculate all the time right?

    So what kind of analytics are materialized views used mostly for? Is it mostly to count stuff?

    So how do we do analytics for avg?

  • #2
    Yes, the columns in a materialized view are recalculated every time the underlying table is modified. There are restrictions on how a view can be defined so that the values can be maintained incrementally.

    To avoid introducing floating point arithmetic error with every increment of an AVG() column, it would be better to internally store SUM() and COUNT() separately, and calculate the AVG() only when the view is queried. We simply haven't prioritized something like this yet, as it is very easy to define SUM() and COUNT() columns in the view and divide them to get the average when querying the view.

    Average is a very common use case, as well as counts, sums, mins and maxes. It is also common to use expressions or functions within the GROUP BY columns, so you can get metrics for a unit of time, or other derived groupings. For example, if you wanted to get the average value for some sensor minute by minute, you might do something like this:

    CREATE TABLE sensor_values (
    sensor_id BIGINT not null,
    value DECIMAL,
    PARTITION TABLE sensor_values ON COLUMN sensor_id;

    CREATE VIEW sensor_values_minutely AS
    TRUNCATE(MINUTE,ts) as minute,
    COUNT(*) as num_readings,
    SUM(value) as sum_value
    FROM sensor_values
    GROUP BY sensor_id, TRUNCATE(MINUTE,ts);

    Then you can query the last however many minutes of average readings for a given sensor like this:

    SELECT sensor_id, minute, sum_value/num_readings AS avg_value
    FROM sensor_values_minutely
    WHERE sensor_id = ? and minute > ?;

    Each view automatically has a primary key index on the GROUP BY columns. So if you wanted the values for ALL sensors within a given range of minutes, you might then define the view with the minute column first, followed by the sensor_id. Or, if you needed to query both ways, you could add a secondary index with the columns in the opposite order.