Migrate from Multi-node to TimescaleDB

The following instructions show you how to move your data from a self-hosted TimescaleDB Multi-node deployment to a Timescale instance using pg_dump and psql. To not lose any data, applications which connect to the database should be taken offline. The duration of the migration is proportional to the amount of data stored in your database. Should you wish to migrate more than 400 GB of data with this method, open a support request to ensure that enough disk is pre-provisioned on your Timescale instance.

You can open a support request directly from the Timescale console, or by email to support@timescale.com.

Note

In the context of migrations, your existing production database is referred to as the "source" database, while the new Timescale database that you intend to migrate your data to is referred to as the "target" database.

Prerequisites

For minimal downtime, the pg_dump and psql commands should be run from a machine with a low-latency, high-throughput link to the database that they are connected to.

Before you begin, ensure that you have:

Installed the PostgreSQL client libraries on the machine that you will perform the migration from, you will require pg_dump and psql.
Created a database service in Timescale.
Checked that all PostgreSQL extensions you use are available on Timescale. For more information, see the list of compatible extensions.
Checked that the version of PostgreSQL in your target database is greater than or equal to that of the source database.
Checked that you're running the exact same version of Timescale on both your target and source databases (the major/minor/patch version must all be the same). For more information, see the upgrade instructions for self-hosted TimescaleDB.

Migrate roles from source to target


pg_dumpall -d "$SOURCE" \
  -l $DB_NAME \
  --quote-all-identifiers \
  --roles-only \
  --file=roles.sql

Important

Some providers like Managed Service for TimescaleDB (MST) and AWS RDS don't support role password dumps. If dumping the passwords results in the error:


pg_dumpall: error: query failed: ERROR:  permission denied for table pg_authid

Execute the command adding the --no-role-passwords flag. After restoring the roles into the target database, manually set passwords with ALTER ROLE name WITH PASSWORD '<YOUR_PASSOWRD>';

Timescale services do not support roles with superuser access. If your SQL dump includes roles that have such permissions, you'll need to modify the file to be compliant with the security model.

You can use the following sed command to remove unsupported statements and permissions from your roles.sql file:


sed -i -E \
-e '/CREATE ROLE "postgres";/d' \
-e '/ALTER ROLE "postgres"/d' \
-e 's/(NO)*SUPERUSER//g' \
-e 's/(NO)*REPLICATION//g' \
-e 's/(NO)*BYPASSRLS//g' \
-e 's/GRANTED BY "[^"]*"//g' \
roles.sql

Note

This command works only with the GNU implementation of sed (sometimes referred to as gsed). For the BSD implementation (the default on macOS), you need to add an extra argument to change the -i flag to -i ''.

To check the sed version, you can use the command sed --version. While the GNU version explicitly identifies itself as GNU, the BSD version of sed generally doesn't provide a straightforward --version flag and simply outputs an "illegal option" error.

A brief explanation of this script is:

CREATE ROLE "postgres"; and ALTER ROLE "postgres": These statements are removed because they require superuser access, which is not supported by Timescale.
(NO)SUPERUSER | (NO)REPLICATION | (NO)BYPASSRLS: These are permissions that require superuser access.
GRANTED BY role_specification: The GRANTED BY clause can also have permissions that require superuser access and should therefore be removed. Note: Per the TimescaleDB documentation, the GRANTOR in the GRANTED BY clause must be the current user, and this clause mainly serves the purpose of SQL compatibility. Therefore, it's safe to remove it.

Note

For the sake of convenience, connection strings to the source and target databases are referred to as $SOURCE and $TARGET throughout this guide. This can be set in your shell, for example:


export SOURCE=postgres://<user>:<password>@<source host>:<source port>/<db_name>
export TARGET=postgres://<user>:<password>@<target host>:<target port>/<db_name>

Migrate schema pre-data

Migrate your pre-data from your source database to your Timescale instance. This includes table and schema definitions, as well as information on sequences, owners, and settings. This doesn't include Timescale-specific schemas.

Migrating schema pre-data

Dump the schema pre-data from your source database into a dump_pre_data.dump file, using your source database connection details. Exclude Timescale-specific schemas. If you are prompted for a password, use your source database credentials:
```
pg_dump -U <SOURCE_DB_USERNAME> -W \
-h <SOURCE_DB_HOST> -p <SOURCE_DB_PORT> -Fc -v \
--section=pre-data --exclude-schema="_timescaledb*" \
-f dump_pre_data.dump <DATABASE_NAME>
```
Restore the dumped data from the dump_pre_data.dump file into your Timescale database, using your Timescale connection details. To avoid permissions errors, include the --no-owner flag:
```
pg_restore -U tsdbadmin -W \
-h <HOST> -p <PORT> --no-owner -Fc \
-v -d tsdb dump_pre_data.dump
```

Restore hypertables in your Timescale instance

After pre-data migration, your distributed hypertables from your source database become regular PostgreSQL tables. Recreate them as regular hypertables in your Timescale instance to restore them.

Note

Distributed hypertables are typically created with additional space dimensions space-partitioning. While they might make sense to allow distribution of data across data nodes these additional dimensions might not be useful in standard single node deployments. So, it might be worthwhile to consider dropping these space dimensions when converting distributed hypertables into regular hypertables.

Restoring hypertables in Timescale

Connect to your Timescale database:


psql "postgres://tsdbadmin:<PASSWORD>@<HOST>:<PORT>/tsdb?sslmode=require"

Restore the hypertable:


SELECT create_hypertable(
   '<TABLE_NAME>', '<TIME_COLUMN_NAME>',
    chunk_time_interval =>
        INTERVAL '<CHUNK_TIME_INTERVAL>');

Copy data from the source database

After restoring your hypertables, return to your source database to copy your data, table by table.

Copying data from your source database

Connect to your source database:


psql "postgres://<SOURCE_DB_USERNAME>:<SOURCE_DB_PASSWORD>@<SOURCE_DB_HOST>:<SOURCE_DB_PORT>/<SOURCE_DB_NAME>?sslmode=require"

Dump the data from the first table into a .csv file:
```
\copy (SELECT * FROM <TABLE_NAME>) TO <TABLE_NAME>.csv CSV
```
Repeat for each table and distributed hypertable you want to migrate.

Note

If your distributed hypertables are very large, you can migrate each distributed hypertable in multiple pieces. Split each distributed hypertable by time range, and copy each range individually. For example:


\copy (SELECT * FROM <TABLE_NAME> WHERE time > '2021-11-01' AND time < '2011-11-02') TO <TABLE_NAME_DATE_RANGE>.csv CSV

Restore data into TimescaleDB instance

When you have copied your data into .csv files, you can restore it to Timescale instance by copying from the .csv files. There are two methods: using regular PostgreSQL COPY, or using the TimescaleDB timescaledb-parallel-copy function. The timescaledb-parallel-copy tool is not included by default. You must install the tool.

Restoring data into Timescale with timescaledb-parallel-copy

At the command prompt, install timescaledb-parallel-copy:


go get github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy

Use timescaledb-parallel-copy to import data into your Timescale database. Set <NUM_WORKERS> to twice the number of CPUs in your database. For example, if you have 4 CPUs, <NUM_WORKERS> should be 8.
```
timescaledb-parallel-copy \
--connection "host=<HOST> \
user=tsdbadmin password=<PASSWORD> \
port=<PORT> \
sslmode=require" \
--db-name tsdb \
--table <TABLE_NAME> \
--file <FILE_NAME>.csv \
--workers <NUM_WORKERS> \
--reporting-period 30s
```
Repeat for each table and hypertable you want to migrate.

Restoring data into Timescale with COPY

Connect to your Timescale database:


psql "postgres://tsdbadmin:<PASSWORD>@<HOST>:<PORT>/tsdb?sslmode=require"

Restore the data to your Timescale database:
```
\copy <TABLE_NAME> FROM '<TABLE_NAME>.csv' WITH (FORMAT CSV);
```
Repeat for each table and hypertable you want to migrate.

Migrate schema post-data

When you have migrated your table and hypertable data, migrate your PostgreSQL schema post-data. This includes information about constraints.

Migrating schema post-data

At the command prompt, dump the schema post-data from your source database into a dump_post_data.dump file, using your source database connection details. Exclude Timescale-specific schemas. If you are prompted for a password, use your source database credentials:
```
pg_dump -U <SOURCE_DB_USERNAME> -W \
-h <SOURCE_DB_HOST> -p <SOURCE_DB_PORT> -Fc -v \
--section=post-data --exclude-schema="_timescaledb*" \
-f dump_post_data.dump <DATABASE_NAME>
```
Restore the dumped schema post-data from the dump_post_data.dump file into your Timescale database, using your connection details. To avoid permissions errors, include the --no-owner flag:
```
pg_restore -U tsdbadmin -W \
-h <HOST> -p <PORT> --no-owner -Fc \
-v -d tsdb dump_post_data.dump
```

Troubleshooting

If you see these errors during the migration process, you can safely ignore them. The migration still occurs successfully.


pg_restore: error: could not execute query: ERROR:  relation "<relation_name>" already exists


pg_restore: error: could not execute query: ERROR:  trigger "ts_insert_blocker" for relation "<relation_name>" already exists

Recreate continuous aggregates

Continuous aggregates aren't migrated by default when you transfer your schema and data separately. You can restore them by recreating the continuous aggregate definitions and recomputing the results on your Timescale database. The recomputed continuous aggregates only aggregate existing data in your Timescale database. They don't include deleted raw data.

Recreating continuous aggregates

Connect to your source database:


psql "postgres://<SOURCE_DB_USERNAME>:<SOURCE_DB_PASSWORD>@<SOURCE_DB_HOST>:<SOURCE_DB_PORT>/<SOURCE_DB_NAME>?sslmode=require"

Get a list of your existing continuous aggregate definitions:


SELECT view_name, view_definition FROM timescaledb_information.continuous_aggregates;

This query returns the names and definitions for all your continuous aggregates. For example:


view_name       |                                            view_definition
----------------+--------------------------------------------------------------------------------------------------------
avg_fill_levels |  SELECT round(avg(fill_measurements.fill_level), 2) AS avg_fill_level,                                +
                |     time_bucket('01:00:00'::interval, fill_measurements."time") AS bucket,                            +
                |     fill_measurements.sensor_id                                                                       +
                |     FROM fill_measurements                                                                            +
                |     GROUP BY (time_bucket('01:00:00'::interval, fill_measurements."time")), fill_measurements.sensor_id;
(1 row)

Connect to your Timescale database:


psql "postgres://tsdbadmin:<PASSWORD>@<HOST>:<PORT>/tsdb?sslmode=require"

Recreate each continuous aggregate definition:


CREATE MATERIALIZED VIEW <VIEW_NAME>
WITH (timescaledb.continuous) AS
<VIEW_DEFINITION>

Recreate policies

By default, policies aren't migrated when you transfer your schema and data separately. Recreate them on your Timescale database.

Recreating policies

Connect to your source database:


psql "postgres://<SOURCE_DB_USERNAME>:<SOURCE_DB_PASSWORD>@<SOURCE_DB_HOST>:<SOURCE_DB_PORT>/<SOURCE_DB_NAME>?sslmode=require"

Get a list of your existing policies. This query returns a list of all your policies, including continuous aggregate refresh policies, retention policies, compression policies, and reorder policies:
```
SELECT application_name, schedule_interval, retry_period,
    config, hypertable_name
    FROM timescaledb_information.jobs WHERE owner = '<SOURCE_DB_USERNAME>';
```

Connect to your Timescale database:


psql "postgres://tsdbadmin:<PASSWORD>@<HOST>:<PORT>/tsdb?sslmode=require"

Recreate each policy. For more information about recreating policies, see the sections on continuous-aggregate refresh policies, retention policies, compression policies, and reorder policies.

Update table statistics

Update your table statistics by running ANALYZE on your entire dataset. Note that this might take some time depending on the size of your database:


ANALYZE;

Troubleshooting

If you see errors of the following form when you run ANALYZE, you can safely ignore them:


WARNING:  skipping "<TABLE OR INDEX>" --- only superuser can analyze it

The skipped tables and indexes correspond to system catalogs that can't be accessed. Skipping them does not affect statistics on your data.

Note

Important

Note

Note

Note

Note

Related Content