This is an historical document, which outlines the primary differences between Timescale 1 and Timescale 2.0. Timescale 2.0 was released in February, 2021. For release notes pertaining to the most recent version of Timescale, see the current release section.
TimescaleDB 2.0 introduces new features and capabilities to its advanced relational database for time-series data. Driven by user feedback, 2.0 is a major milestone and introduces the first multi-node, petabyte-scale relational database for time-series. In addition to multi-node capabilities, this release includes new features, and improvements to existing ones, focused on giving you more flexibility, control over your data, and the ability to customize behavior to suit your needs.
To facilitate many of the improvements in TimescaleDB 2.0, several existing APIs and function definitions have been modified which may require updates to your existing code.
Most notably, the following API and PostgreSQL compatibility changes may impact existing code using these interfaces. If your workloads use any of the features, this document covers each in detail to help you understand what action you need to take.
- Dropping support for PostgreSQL versions 9.6 and 10: As mentioned in our upgrade documentation, TimescaleDB 2.0 no longer supports older PostgreSQL versions. You need to be running PostgreSQL version 11 or 12 to upgrade your TimescaleDB installation to 2.0.
- Continuous aggregates: We have made major changes in the creation and management of continuous aggregates to address user feedback.
- Data retention and compression policies: All policies are now managed through a unified API and a consolidated set of Views.
- Informational views: Based on user feedback and our desire to make TimescaleDB easy to manage, multiple information views have been consolidated for better clarity.
The following migration document provides more information on each of these API changes and how they affect users of TimescaleDB 1.x . We've also included a bit of information about the impetus for these changes and our design decisions. For a more in-depth coverage of APIs, as well as a detailed description of new features included in 2.0, such as distributed hypertables and the ability to run user-defined actions (custom background jobs), please review our updated documentation (navigate to specific features of interest).
Read this guide to understand the impact of upgrading to the latest version and to evaluate the effects on your application and infrastructure. Then, for upgrade instructions and recommendations, see the documentation on upgrading to TimescaleDB 2.0.
It's been two years since we released TimescaleDB 1.0 in October 2018 with the ambition to make it easy to scale PostgreSQL for time-series workloads. Since that initial release, we've added many new capabilities: support for compression, continuous aggregates, data lifecycle management, and numerous optimizations, as well as two new underlying PostgreSQL versions (11 and 12). At the same time, the core of TimescaleDB hasn't changed significantly: we still have hypertables, auto-partitioning, query optimizations, and a basic user experience that many have come to appreciate because things "just work."
Still, as with any technology solution, there are some things that we didn't get 100% right in our first release. In particular, as usage of TimescaleDB has skyrocketed, many users have told us that our informational views aren't always clear and consistent, nor do they provide enough detail on what is happening in the background. Likewise, while it is "easy" to create a continuous aggregate, it wasn't always clear why aggregates sometimes returned no data, or why new raw data is slow to be materialized. As a side effect of creating a continuous aggregate, data retention on hypertables can become blocked, and understanding how to re-enable it is a common support question.
While our ambition to provide a solution that "just works" remains, too much "magic" and automation behind the scenes related to features like hypertables, compression, and continuous aggregations can lead to outcomes not in line with user expectations or intentions. And, with a general database like PostgreSQL, the expected behavior varies widely depending on the use case (and users' expertise). This has led us to recommit to the user experience: simplifying APIs and making them more consistent, yet at the same time empowering users with more control to customize behaviors when needed.
For example, in TimescaleDB 2.0 we've separated the automation of refreshing continuous aggregate data from the core functionality, giving users the option to manually refresh the data and, if desired, add automation via a policy. This also makes this feature consistent with TimescaleDB's other policy-driven features such as retention, reordering, and compression, which offer both manual control and automation using policies. We have also opened up our jobs scheduling framework to enable user-defined actions. User-defined actions are custom background jobs that you can define yourself.
In the rest of this document, we go through each of the features and API changes in detail, and what users migrating from an earlier version of TimescaleDB should consider prior to updating to TimescaleDB 2.0.
In TimescaleDB 2.0, we have made changes to existing APIs for working with hypertables, as well as improvements to the related information views and size functions. These views and functions provide information about basic configuration, partitioning, data chunks, and disk size, and they also have been updated to work for distributed hypertables.
The following APIs to create and configure hypertables have changed:
main_tableparameter has been renamed to
relation, and additional parameters for distributed hypertables have been added.
main_tableparameter has been renamed to
Consistent with our desire to improve visibility into all aspects of TimescaleDB configuration, the following views and functions about hypertable information have been updated or added:
- The view with basic information about hypertables has been renamed from the singular "hypertable".
- Some columns have new names for consistency with other views.
- Table size information has been removed and made available through new size functions discussed later.
- Additional columns have been added related to distributed hypertables.
- The view no longer shows internal hypertables for continuous aggregates and compression.
- For continuous aggregates, the internal materialized hypertable name is available in the
timescaledb_information.dimensions: A new view allows users to see partitioning information and settings for various dimensions, such as the chunk time interval or number of space partitions used in a hypertable.
timescaledb_information.chunks: A new view allows users to see information about individual data chunks of all hypertables, including the tablespace or data node on which each chunk is stored.
show_chunks(relation): The function now requires providing a hypertable or continuous aggregate identifier as the first argument, which is consistent with
drop_chunks(relation). Previously, it was possible to view the chunks of all hypertables by eliding the hypertable argument. To view all chunks in the database, we instead recommend using the new chunks view described above.
These views can be used together to answer certain questions. For example:
Q: Get all chunks written to tablespace "ts1" during the past month:
Q: Get compression status of all chunks on hypertables with compression enabled:
Information views no longer display size information about hypertables and other objects. Instead, size information is available through a set of functions that all return the size in number of bytes. Removing size information makes the views faster since the information is often read dynamically from disk, or, in the case of distributed hypertables, read across a network.
Size functions are also split into basic and detailed. Basic functions return only a single aggregate value and can be easily applied in queries, while the detailed functions return multiple columns and possibly multiple rows of information.
hypertable_detailed_size(hypertable): The function has been renamed from
hypertable_relation_size(hypertable). Further, if the hypertable is distributed, it returns multiple rows, one per each of the hypertable's data nodes.
hypertable_size(hypertable): Returns a single value giving the aggregated hypertable size, including both tables (chunks) and indexes.
chunks_detailed_size(hypertable): Returns the size information about each of the chunks in a hypertable. On a distributed hypertable, this function returns one row per data node that holds a copy of the chunk.
hypertable_index_size(index): Returns the aggregate number of bytes corresponding to a hypertable index across all chunks.
approximate_row_count(relation): The function has been renamed from
hypertable_approximate_row_count. It can now also be called on regular SQL tables.
In previous versions of TimescaleDB, you could get size information for all hypertables in the
In TimescaleDB 2.0, you can now instead combine the new
hypertables view with size functions to achieve a similar result:
Major changes have been made to continuous aggregates in TimescaleDB 2.0 to better clarify this feature.
First, continuous aggregates have always been more closely aligned with materialized views in PostgreSQL. Therefore,
creating a continuous aggregate now uses
CREATE MATERIALIZED VIEW, rather than
Second, the continuous aggregate API now separates the explicit mechanism for updating a continuous aggregate from the policy that automates the process of keeping a continuous aggregate up-to-date. This change both simplifies the continuous aggregate API in TimescaleDB 2.0, provides more flexibility to users (especially when combined with user-defined actions and a newly exposed API for scheduling jobs directly), and makes it consistent with other policy automation in TimescaleDB 2.0:
|Policy API for Automation
In practice, this means that creating a continuous aggregate in TimescaleDB 2.0 is now a two-step process:
- Create via a
CREATE MATERIALIZED VIEWstatement
- Add an (automation) policy on the continuous aggregate via a separate API function call
In the example above,
CREATE MATERIALIZED VIEWcreates a continuous aggregate without any automation yet
associated with it. Notice also that
WITH NO DATA is specified at the end. This prevents the view from
materializing data at creation time, instead deferring the population of aggregated data until the policy runs
as a background job or as part of a manual refresh. Therefore, we recommend that users create continuous aggregates
WITH NO DATA option, especially if a significant amount of historical data can be materialized.
Once the continuous aggregate is created, calling
add_continuous_aggregate_policy creates a continuous
aggregate policy, which automatically materializes or refreshes the data following the schedule and rules
provided. Inputs to the policy function include the continuous aggregate name, a refresh window, and a
schedule interval. The refresh window is specified by the start and end offsets, which are used to calculate
a new refresh window every time the policy runs by subtracting the offsets from the current time (as normally
returned by the function
It is worth noting the way that
end_offset work as new data arrives and is added to the source hypertables.
This example sets the refresh interval between four weeks and two hours
end_offset). Therefore, if any late data
arrives with timestamps within the last four weeks that is backfilled into the
source hypertable, the continuous aggregate view is updated with the old
data the next time the policy executes.
This policy can, in the worst case, materialize the whole window every time it runs if data at least four weeks old continues to arrive and be inserted into the source hypertables. However, since a continuous aggregate tracks changes since the last refresh, it can in most cases materialize a subset of the window that corresponds to the data that has actually changed.
In this example, data backfilled more than 4 weeks ago is not rematerialized, nor does the continuous aggregate
include data less than 2 hours old. However, querying the continuous aggregate view can still return aggregates
about the latest data, and not just aggregated data more than 2 hours old, based on support for
specified as before with the
timescaledb.materialized_only=false parameter. Real-time aggregates are still the default
setting unless otherwise specified.
Finally, it is recommended that the
end_offset lags the current time by at least one
in the aggregate SQL, otherwise it might affect performance when inserting new data, which usually is written to
what would be the latest bucket. In TimescaleDB 1.x, the
refresh_lag parameter was used for a similar purpose,
but we found that using it correctly was more difficult to understand.
TimescaleDB 2.0 removes support for
REFRESH MATERIALIZED VIEW in favor of the new, more flexible function,
refresh_continuous_aggregate, which enables
a user to refresh a specific window in a continuous aggregate:
Users can use this command explicitly; a typical use of this command is to manually refresh historical data
while creating a continuous aggregate policy to aggregate new data. It is even possible to build custom
continuous aggregation policies using this function within the new user-defined action framework which are
then scheduled via
refresh_continuous_aggregate only recomputes the aggregated time buckets that completely fall
inside the given refresh window and are in a region that has seen changes in the underlying hypertable.
Thus, if no changes have occurred in the underlying source data (that is, no data has been backfilled to the
region or no updates to existing data have been made), no materialization is performed over that
region. This behavior is similar to the continuous aggregate policy and ensures more efficient operation.
With greater power and flexibility also comes a greater responsibility to understand how features interact. Specifically, users should understand the interactions between data retention policy settings and continuous aggregate settings.
Before starting the upgrade to TimescaleDB 2.0, we highly recommend checking the database log for errors related
to failed retention policies that were occurring in TimescaleDB 1.x and then either removing them or updating them
to be compatible with existing continuous aggregates. Any remaining retention policies that are still incompatible
ignore_invalidation_older_than setting is automatically disabled with a notice during the upgrade.
As an example, if a data retention policy on a hypertable is set for
drop_after => '4 weeks', then the policy
associated with a continuous aggregate on that same hypertable should have a
start_offset less than or equal
to 4 weeks. Similarly, any manual call to
refresh_continuous_aggregate should likely specify a
that's also less than the date from 4 weeks ago.
If not understood properly, users could overwrite existing aggregated data in continuous aggregates with empty data by recomputing data over a now-dropped time window of data, not likely the result a user was expecting. Instead, users seeking to keep only a certain time window of data in their continuous aggregate view should create a data retention policy on the continuous aggregate itself.
This differs significantly from how TimescaleDB 1.x handled conflicts between retention policies and continuous aggregates.
Previously, if the continuous aggregation setting
ignore_invalidation_older_than overlapped with data that would
be dropped by a retention policy, the retention policy would silently fail. Making the retention policy work again
required users to modify settings in either the retention policy or the continuous aggregate, and even then some
data wasn't always materialized as expected.
After upgrading to TimescaleDB 2.0, retention policies no longer fail due to incompatibilities with continuous aggregates and users have to ensure that retention and continuous aggregate policies have the desired interplay.
Another change in 2.0 is that
drop_chunks and the retention policy no longer
automatically refresh continuous aggregates to account for changes in original hypertable
after the last refresh. Previously, the goal was to ensure that all updates were processed
prior to dropping chunks in the original hypertable. In practice, it often didn't work as intended.
In TimescaleDB 2.0 users can ensure that all updates are processed before dropping data by
combining the following experimental function, which refreshes updates in the given chunk,
The example below demonstrates how to use the chunk-based refresh function to define a retention policy, which ensures that all updates to data in the original hypertable are refreshed in all continuous aggregates prior to dropping chunks older than 2 weeks:
In TimescaleDB 1.x, data that was backfilled into hypertables wasn't handled
optimally. Modification of any hypertable data, regardless of how old, would
cause the continuous aggregate materializer job to restart from the point of the
earliest backfill and then materialize from that point forward. Unfortunately,
this could also cause the materializer to get "stuck," since the background job
only processed a limited amount of data per run, as governed by the
max_interval_per_job setting. When this happened, one job run could
potentially force the next job to start all over again.
To prevent this, the
ignore_invalidation_older_than setting could be used to ignore backfill data older than
a specified interval (for example, ignore backfill older than 1 week), preventing the materializer from restarting if
hypertable data was modified beyond this interval boundary. However, this would also stop TimescaleDB 1.x from
tracking changes in the underlying data beyond the
ignore_invalidation_older_than threshold too. This meant
that it was not possible to revert this setting later to a larger interval (for example, 1 month) without potentially
having a mismatch between the raw data and the aggregated data in the continuous aggregate.
In contrast, TimescaleDB 2.0 never stops tracking backfill, and to avoid materializing too much historical data, one should simply use a refresh window that does not include that region of data. The backfilled region can always be refreshed at a later time, either manually or via a policy.
To ensure that previously ignored backfill can be refreshed after the upgrade to TimescaleDB 2.0, the upgrade
process marks the region older than the
ignore_invalidation_older_than threshold as "requiring refresh."
This allows a manual refresh to bring a continuous aggregate up-to-date with the underlying source data. If
ignore_invalidation_older_than threshold was modified at some point to a longer interval, we recommend
setting it back to the smaller interval prior to upgrading to ensure that all the backfill can be refreshed,
if one so desires.
Note again, however, that if backfill was previously ignored due to a retention policy on the underlying hypertable, a manual refresh of older data into a continuous aggregate could remove data when hypertable chunks have been dropped due to a data retention policy as discussed in the previous section.
In TimescaleDB 2.0, views surrounding continuous aggregates (and other policies) have been simplified and generalized.
timescaledb_information.continuous_aggregates: now provides information related to the materialized view, which includes the view name and owner, the real time aggregation flag, the materialization and the view definition (the select statement defining the view).
timescaledb_information.jobs: displays information for all policies including continuous aggregates.
timescaledb_information.job_stats: displays job statistics related to all jobs.
timescaledb_information.continuous_aggregate_stats: Removed in favor of the
job_statsview mentioned above.
If you have existing continuous aggregates and you update your database to TimescaleDB 2.0, the update scripts automatically reconfigure your continuous aggregates to use the new framework.
In particular, the update process should:
- Maintain the same view schema, name, owner, and view definition as before, as well as the setting for
- Schedule the refresh to run at the same interval as before (although the parameter is now named
- Automatically configure
start_offsetto the same offset as specified by the old
- Automatically configure
end_offsetto have an offset from
now()equivalent to the old
- Mark all the data older than the interval
ignore_invalidation_older_thanas out-of-date, so that it can be refreshed.
- Disable any retention policies that are failing due to being incompatible with the current setting of
ignore_invalidation_older_thanon a continuous aggregate (as described above). Disabled policies remain after upgrade, but are not scheduled to run (
timescaledb_information.jobs). If failing policies were to be migrated to 2.0 they would start to work again, but likely with unintended consequences. Therefore, any retention policies that are disabled post update should have their settings carefully reviewed before being enabled again.
You can validate these in the proper informational views, given above.
Other minor changes were made to various APIs for greater understandability and consistency, including in the following areas.
drop_chunks: This function now requires specifying a hypertable or continuous aggregate as the first argument, and does not allow dropping chunks across all hypertables in a database. Additionally, the arguments
cascade_to_materializationswere removed (and behave as if the arguments were set to
falsein earlier versions). In TimescaleDB 2.0, we instead recommend creating a separate retention policy on each continuous aggregate.
remove_retention_policy: Creating (or removing) a data retention policy now has explicit functions. Additionally, the arguments
cascade_to_materializationswere removed (and behave as if the arguments were set to
falsein earlier versions).
timescaledb_information.jobs: General information about data retention policies are now available in the main jobs view.
add_drop_chunks_policy: removed in favor of the explicit functions above.
timescaledb_information.drop_chunks_policies: view has been removed in favor of the more general jobs view.
remove_compression_policy: Creating (or removing) a compression policy now has explicit functions.
hypertable_compression_stats(hypertable): The function returns statistics only for hypertables with compression enabled.
chunk_compression_stats(hypertable): The function returns information about currently compressed chunks.
timescaledb_information.compression_settings: This new view gives information about the compression settings on hypertables.
timescaledb_information.jobs: General information about compression policies are now available in the main jobs view.
add_compress_chunk_policy: Removed in favor of the explicit functions above.
timescaledb_information.compressed_hypertable_stats: Removed in favor of the new
hypertable_compression_stats(hypertable)function linked above
timescaledb_information.compressed_chunk_stats: Removed in favor of the new
chunk_compression_stats(hypertable)function linked above.
TimescaleDB 2.0 introduces user-defined actions and creates a more unified jobs API. Now, jobs created by the TimescaleDB policies and for user-defined actions can be managed and viewed through a single API.
add_job: Adds a new user-defined action to the job scheduling framework.
alter_job: Changes settings for existing jobs. Renamed from
alter_job_schedulein previous versions, it introduces additional settings, including
scheduledto pause and resume jobs, and
configto change policy or action-specific settings.
run_job: Manually executes a job immediately and in the foreground.
delete_job: Removes the job from the scheduler. This is equivalent to functions that remove policies for built-in actions (for example,
timescaledb_information.jobs: The new view provides all job settings available, and it replaces all policy-specific views.
timescaledb_information.jobs_stats: The view presents statistics of executing jobs for policies and actions.
In TimescaleDB 2.0, all features which had been classified previously as "enterprise" have become "community" features and are available for free under the Timescale License. As such, the need for an "enterprise license" to unlock any features has been removed. All features are available either under the community Timescale License or under the open-source Apache-2 License. This blog post explains the changes. The following changes were made to license API:
timescaledb_information.license: This view has been removed, as it primarily provided information on the enterprise license key's expiration date, which is no longer applicable. The current license used by the extension can instead be viewed in the GUC below.
timescaledb.license: This GUC value (which replaces the former
timescaledb.license_keyGUC) can take the value
apache. It can be set only at startup (in the postgresql.conf configuration file or on the server command line), and allows limiting access to certain features by license. For example, setting the license to
apacheallows access to only Apache-2 licensed features.
Found an issue on this page?Report an issue!