The following instructions show you how to move your data from a self-hosted
TimescaleDB Multi-node deployment to a Timescale instance using pg_dump
and psql
.
To not lose any data, applications which connect to the database should be taken
offline. The duration of the migration is proportional to the amount of data
stored in your database. Should you wish to migrate more than 400 GB of data with
this method, open a support request to ensure that enough disk is pre-provisioned
on your Timescale instance.
You can open a support request directly from the Timescale console, or by email to [email protected].
Note
In the context of migrations, your existing production database is referred to as the "source" database, while the new Timescale database that you intend to migrate your data to is referred to as the "target" database.
For minimal downtime, the pg_dump
and psql
commands should be run from a machine
with a low-latency, high-throughput link to the database that they are connected to.
Before you begin, ensure that you have:
- Installed the PostgreSQL client libraries on the machine that you will
perform the migration from, you will require
pg_dump
andpsql
. - Created a database service in Timescale.
- Checked that all PostgreSQL extensions you use are available on Timescale. For more information, see the list of compatible extensions.
- Checked that the version of PostgreSQL in your target database is greater than or equal to that of the source database.
- Checked that you're running the exact same version of Timescale on both your target and source databases (the major/minor/patch version must all be the same). For more information, see the upgrade instructions for self-hosted TimescaleDB.
pg_dumpall -d "$SOURCE" \--quote-all-identifiers \--roles-only \--file=roles.sql
Timescale services do not support roles with superuser access. If your SQL dump includes roles that have such permissions, you'll need to modify the file to be compliant with the security model.
You can use the following sed
command to remove unsupported statements and
permissions from your roles.sql file:
sed -i -E \-e '/CREATE ROLE "postgres";/d' \-e '/ALTER ROLE "postgres"/d' \-e 's/(NO)*SUPERUSER//g' \-e 's/(NO)*REPLICATION//g' \-e 's/(NO)*BYPASSRLS//g' \-e 's/GRANTED BY "[^"]*"//g' \roles.sql
Note
This command works only with the GNU implementation of sed (sometimes referred to as gsed). For the BSD implementation (the default on macOS), you need to add an extra argument to change the -i
flag to -i ''
.
To check the sed version, you can use the command sed --version
. While the
GNU version explicitly identifies itself as GNU, the BSD version of sed
generally doesn't provide a straightforward --version flag and simply outputs
an "illegal option" error.
A brief explanation of this script is:
CREATE ROLE "postgres"
; andALTER ROLE "postgres"
: These statements are removed because they require superuser access, which is not supported by Timescale.(NO)SUPERUSER
|(NO)REPLICATION
|(NO)BYPASSRLS
: These are permissions that require superuser access.GRANTED BY role_specification
: The GRANTED BY clause can also have permissions that require superuser access and should therefore be removed. Note: Per the TimescaleDB documentation, the GRANTOR in the GRANTED BY clause must be the current user, and this clause mainly serves the purpose of SQL compatibility. Therefore, it's safe to remove it.
Note
For the sake of convenience, connection strings to the source and target databases are referred to as $SOURCE
and $TARGET
throughout this guide. This can be set in your shell, for example:
export SOURCE=postgres://<user>:<password>@<source host>:<source port>/<dbname>export TARGET=postgres://<user>:<password>@<target host>:<target port>/<dbname>
Migrate your pre-data from your source database to TimescaleDB instance. This includes table and schema definitions, as well as information on sequences, owners, and settings. This doesn't include Timescale-specific schemas.
Dump the schema pre-data from your source database into a
dump_pre_data.dump
file, using your source database connection details. Exclude Timescale-specific schemas. If you are prompted for a password, use your source database credentials:pg_dump -U <SOURCE_DB_USERNAME> -W \-h <SOURCE_DB_HOST> -p <SOURCE_DB_PORT> -Fc -v \--section=pre-data --exclude-schema="_timescaledb*" \-f dump_pre_data.dump <DATABASE_NAME>Restore the dumped data from the
dump_pre_data.dump
file into your Timescale database, using your Timescale connection details. To avoid permissions errors, include the--no-owner
flag:pg_restore -U tsdbadmin -W \-h <HOST> -p <PORT> --no-owner -Fc \-v -d tsdb dump_pre_data.dump
After pre-data migration, your distributed hypertables from your source database become regular PostgreSQL tables. Recreate them as regular hypertables in TimescaleDB instance to restore them.
Note
Distributed hypertables are typically created with additional space dimensionsspace-partitioning. While they might make sense to allow distribution of data across data nodes these additional dimensions might not be useful in standard single node deployments. So, it might be worthwhile to consider dropping these space dimensions when converting distributed hypertables into regular hypertables.
Connect to your Timescale database:
psql "postgres://tsdbadmin:<PASSWORD>@<HOST>:<PORT>/tsdb?sslmode=require"Restore the hypertable:
SELECT create_hypertable('<TABLE_NAME>', '<TIME_COLUMN_NAME>',chunk_time_interval =>INTERVAL '<CHUNK_TIME_INTERVAL>');
After restoring your hypertables, return to your source database to copy your data, table by table.
Connect to your source database:
psql "postgres://<SOURCE_DB_USERNAME>:<SOURCE_DB_PASSWORD>@<SOURCE_DB_HOST>:<SOURCE_DB_PORT>/<SOURCE_DB_NAME>?sslmode=require"Dump the data from the first table into a
.csv
file:\copy (SELECT * FROM <TABLE_NAME>) TO <TABLE_NAME>.csv CSVRepeat for each table and distributed hypertable you want to migrate.
Note
If your distributed hypertables are very large, you can migrate each distributed hypertable in multiple pieces. Split each distributed hypertable by time range, and copy each range individually. For example:
\copy (SELECT * FROM <TABLE_NAME> WHERE time > '2021-11-01' AND time < '2011-11-02') TO <TABLE_NAME_DATE_RANGE>.csv CSV
When you have copied your data into .csv
files, you can restore it to
TimescaleDB instance by copying from the .csv
files. There are two methods: using
regular PostgreSQL COPY
, or using the TimescaleDB
timescaledb-parallel-copy
function. The timescaledb-parallel-copy
tool is not included by default. You must install the tool.
At the command prompt, install
timescaledb-parallel-copy
:go get github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copyUse
timescaledb-parallel-copy
to import data into your Timescale database. Set<NUM_WORKERS>
to twice the number of CPUs in your database. For example, if you have 4 CPUs,<NUM_WORKERS>
should be8
.timescaledb-parallel-copy \--connection "host=<HOST> \user=tsdbadmin password=<PASSWORD> \port=<PORT> \sslmode=require" \--db-name tsdb \--table <TABLE_NAME> \--file <FILE_NAME>.csv \--workers <NUM_WORKERS> \--reporting-period 30sRepeat for each table and hypertable you want to migrate.
Connect to your Timescale database:
psql "postgres://tsdbadmin:<PASSWORD>@<HOST>:<PORT>/tsdb?sslmode=require"Restore the data to your Timescale database:
\copy <TABLE_NAME> FROM '<TABLE_NAME>.csv' WITH (FORMAT CSV);Repeat for each table and hypertable you want to migrate.
When you have migrated your table and hypertable data, migrate your PostgreSQL schema post-data. This includes information about constraints.
At the command prompt, dump the schema post-data from your source database into a
dump_post_data.dump
file, using your source database connection details. Exclude Timescale-specific schemas. If you are prompted for a password, use your source database credentials:pg_dump -U <SOURCE_DB_USERNAME> -W \-h <SOURCE_DB_HOST> -p <SOURCE_DB_PORT> -Fc -v \--section=post-data --exclude-schema="_timescaledb*" \-f dump_post_data.dump <DATABASE_NAME>Restore the dumped schema post-data from the
dump_post_data.dump
file into your Timescale database, using your connection details. To avoid permissions errors, include the--no-owner
flag:pg_restore -U tsdbadmin -W \-h <HOST> -p <PORT> --no-owner -Fc \-v -d tsdb dump_post_data.dump
If you see these errors during the migration process, you can safely ignore them. The migration still occurs successfully.
pg_restore: error: could not execute query: ERROR: relation "<relation_name>" already exists
pg_restore: error: could not execute query: ERROR: trigger "ts_insert_blocker" for relation "<relation_name>" already exists
Continuous aggregates aren't migrated by default when you transfer your schema and data separately. You can restore them by recreating the continuous aggregate definitions and recomputing the results on your Timescale database. The recomputed continuous aggregates only aggregate existing data in your Timescale database. They don't include deleted raw data.
Connect to your source database:
psql "postgres://<SOURCE_DB_USERNAME>:<SOURCE_DB_PASSWORD>@<SOURCE_DB_HOST>:<SOURCE_DB_PORT>/<SOURCE_DB_NAME>?sslmode=require"Get a list of your existing continuous aggregate definitions:
SELECT view_name, view_definition FROM timescaledb_information.continuous_aggregates;This query returns the names and definitions for all your continuous aggregates. For example:
view_name | view_definition----------------+--------------------------------------------------------------------------------------------------------avg_fill_levels | SELECT round(avg(fill_measurements.fill_level), 2) AS avg_fill_level, +| time_bucket('01:00:00'::interval, fill_measurements."time") AS bucket, +| fill_measurements.sensor_id +| FROM fill_measurements +| GROUP BY (time_bucket('01:00:00'::interval, fill_measurements."time")), fill_measurements.sensor_id;(1 row)Connect to your Timescale database:
psql "postgres://tsdbadmin:<PASSWORD>@<HOST>:<PORT>/tsdb?sslmode=require"Recreate each continuous aggregate definition:
CREATE MATERIALIZED VIEW <VIEW_NAME>WITH (timescaledb.continuous) AS<VIEW_DEFINITION>
By default, policies aren't migrated when you transfer your schema and data separately. Recreate them on your Timescale database.
Connect to your source database:
psql "postgres://<SOURCE_DB_USERNAME>:<SOURCE_DB_PASSWORD>@<SOURCE_DB_HOST>:<SOURCE_DB_PORT>/<SOURCE_DB_NAME>?sslmode=require"Get a list of your existing policies. This query returns a list of all your policies, including continuous aggregate refresh policies, retention policies, compression policies, and reorder policies:
SELECT application_name, schedule_interval, retry_period,config, hypertable_nameFROM timescaledb_information.jobs WHERE owner = '<SOURCE_DB_USERNAME>';Connect to your Timescale database:
psql "postgres://tsdbadmin:<PASSWORD>@<HOST>:<PORT>/tsdb?sslmode=require"Recreate each policy. For more information about recreating policies, see the sections on continuous-aggregate refresh policies, retention policies, compression policies, and reorder policies.
Update your table statistics by running ANALYZE
on your entire
dataset. Note that this might take some time depending on the size of your
database:
ANALYZE;
If you see errors of the following form when you run ANALYZE
, you can safely
ignore them:
WARNING: skipping "<TABLE OR INDEX>" --- only superuser can analyze it
The skipped tables and indexes correspond to system catalogs that can't be accessed. Skipping them does not affect statistics on your data.
Keywords
Found an issue on this page?
Report an issue!