The query performance of a distribute hypertable depends heavily on the ability of the planner to push down computations to data nodes. Without a way to push down computations, the access node will fetch the raw data from each data node and then perform any necessary computations locally. Therefore, queries that can be pushed down and involve many data nodes are more likely to see better performance on a distributed hypertable compared to a regular hypertable. In particular, an aggregate computation that down-samples data is a good candidate to push down since it both distributes the computational workload across data nodes and reduces the amount of data that needs to be transferred in the result set.
The planner can consider two methods to push down aggregates:
FULL: fully pushing down an aggregate offloads all the
computation to data nodes and the access node only appends the
individual results. To fully push down an aggregate computation,
GROUP BY clause must include all the partition keys
(dimension columns) or only the first space partition key. An
example is computing the
max temperature for each location
SELECT location, max(temperature) FROM conditions GROUP BY location;
Given that we use
location as a space partition, each data node
can compute the max on its own distinct subset of the data.
PARTIAL: if it is not possible to fully push down the
aggregate, the planner can instead consider a partial aggregation
on data nodes. In this case, the access node offloads most of the
computation to data nodes, yielding a partial result that is sent
back and finalized on the access node by combining all the partials
from each data node. An example is computing the
each data node computes a local max and then the access node
finalizes the result by computing the
max over all the data
SELECT max(temperature) FROM conditions;
Other types of computations that can be pushed down include sorting
operations, groupings, and joins. Joins on data nodes are currently
unsupported, however. To see how the a query is pushed down to a data
EXPLAIN VERBOSE on the query and inspect the query plan
and the remote SQL statement sent to each data node.
The query planner might not always be able to push down queries, however, or it can only push down parts of it:
random()function depends on the current seed and the state of the pseudo-random sequence. If the function was to be pushed down to each data node, it would not generate a valid pseudo-random sequence from the point of view of the access node that executes the query. Another example is the
now()function to get the current transaction time. This function depends on the current time zone setting on each node.
TimescaleDB employs several optimizations to increase the likelihood
of being able to push down queries and getting around some of the
above limitations. For instance, to get around the limitation of not
pushing down the
now() function, the function is constified on the
access node so that the resulting timestamp is instead pushed down to
the data nodes.
Found an issue on this page?Report an issue!