This document provides detailed instructions to migrate data from your PostgreSQL database to a Timescale instance with minimal downtime (on the order of a few minutes) of your production applications, using the live migration strategy. To simplify the migration, we provide you with a docker image containing all the tools and scripts that you need to perform the live migration.
You should provision a dedicated instance to run the migration steps from. This instance should have sufficient space available to contain the buffered changes which occur while the data is being copied. This is approximately proportional to the amount of new data (uncompressed) being written to the database during this period. As a general rule of thumb, something between 100 GB and 500 GB should suffice.
Note
In the context of migrations, your existing production database is referred to as the "source" database, while the new Timescale database that you intend to migrate your data to is referred to as the "target" database.
In detail, the migration process consists of the following steps:
- Set up a target database instance in Timescale.
- Prepare the source database for live migration.
- Run the live migration docker image.
- Validate the data in target database and use it as new primary.
Tip
If you get stuck, you can get help by either opening a support request, or take
your issue to the #migration
channel in the community slack,
where the developers of this migration method are there to help.
You can open a support request directly from the Timescale console, or by email to [email protected].
Create a database service in Timescale.
If you intend on migrating more than 400 GB, open a support request to ensure that enough disk is pre-provisioned on your Timescale instance.
You can open a support request directly from the Timescale console, or by email to [email protected].
Note
For the sake of convenience, connection strings to the source and target databases are referred to as $SOURCE
and $TARGET
throughout this guide. This can be set in your shell, for example:
export SOURCE=postgres://<user>:<password>@<source host>:<source port>/<db_name>export TARGET=postgres://<user>:<password>@<target host>:<target port>/<db_name>
Important
Do not use a Timescale connection pooler connection for live migration. There are a number of issues which can arise when using a connection pooler, and no advantage. Very small instances may not have enough connections configured by default, in which case you should modify the value of max_connections
, in your instance, as shown on Configure database parameters.
It's important to ensure that the old_snapshot_threshold
value is set to the
default value of -1
in your source database. This prevents PostgreSQL from
treating the data in a snapshot as outdated. If this value is set other than
-1
, it might affect the existing data migration step.
To check the current value of old_snapshot_threshold
, run the command:
psql -X -d $SOURCE -c 'show old_snapshot_threshold'
If the query returns something other than -1
, you must change it.
If you have a superuser on a self-hosted database, run the following command:
psql -X -d $SOURCE -c 'alter system set old_snapshot_threshold=-1'
Otherwise, if you are using a managed service, use your cloud provider's
configuration mechanism to set old_snapshot_threshold
to -1
.
Next, you should set wal_level
to logical
so that the write-ahead log (WAL)
records information that is needed for logical decoding.
To check the current value of wal_level
, run the command:
psql -X -d $SOURCE -c 'show wal_level'
If the query returns something other than logical
, you must change it.
If you have a superuser on a self-hosted database, run the following command:
psql -X -d $SOURCE -c 'alter system set wal_level=logical'
Otherwise, if you are using a managed service, use your cloud provider's
configuration mechanism to set wal_level
to logical
.
Restart your database for the changes to take effect, and verify that the settings are reflected in your database.
Next, you need to ensure that your source tables and hypertables have either a primary key
or REPLICA IDENTITY
set. This is important as it is a requirement for replicating DELETE
and
UPDATE
operations. Replica identity assists the replication process in identifying the rows
being modified. It defaults to using the table's primary key.
If a table doesn't have a primary key, you'll have to manually set the replica identity.
One option is to use a unique, non-partial, non-deferrable index that includes only columns
marked as NOT NULL
. This can be set as the replica identity:
ALTER TABLE {table_name} REPLICA IDENTITY USING INDEX {_index_name}
If there's no primary key or viable unique index to use, you have to set REPLICA IDENTITY
to FULL
. If you are expecting a large number of UPDATE
or DELETE
operations on the table,
using FULL
is not recommended. For each UPDATE
or DELETE
statement, PostgreSQL reads the
whole table to find all matching rows, resulting in significantly slower replication.
ALTER TABLE {table_name} REPLICA IDENTITY FULL
Next, download the live migration docker image:
docker run --rm -it --name live-migration \-e PGCOPYDB_SOURCE_PGURI=$SOURCE \-e PGCOPYDB_TARGET_PGURI=$TARGET \--pid=host \-v ~/live-migration:/opt/timescale/ts_cdc \timescale/live-migration:v0.0.15 --helpLive migration moves your PostgreSQL/TimescaleDB to Timescale Cloud with minimal downtime.options:-h, --help Show this help message and exit-v, --version Show the version of live-migration toolSubcommands:{snapshot,clean,migrate}Subcommand helpsnapshot Create a snapshotclean Clean up resourcesmigrate Start the migration
Live-migration contains 3 subcommands:
- Snapshot
- Clean
- Migrate
On a high-level,
the snapshot
subcommand creates a Postgres snapshot connection to the source
database along with a replication slot. This is pre-requisite before running
the migrate
subcommand.
The migrate
subcommand carries out the live-migration process by taking help
of the snapshot and replication slot created by the snapshot
subcommand.
The clean
subcommand is designed to remove resources related to live migration.
It should be run once the migration has successfully completed or, if you need
to restart the migration process from the very start. You should not run clean
if you want to resume the last interrupted live migration.
Execute this command to establish a snapshot connection; do not interrupt the process.
For convenience, consider using a terminal multiplexer such as tmux
or screen
, which
enables the command to run in the background.
docker run --rm -it --name live-migration-snapshot \-e PGCOPYDB_SOURCE_PGURI=$SOURCE \-e PGCOPYDB_TARGET_PGURI=$TARGET \--pid=host \-v ~/live-migration:/opt/timescale/ts_cdc \timescale/live-migration:v0.0.15 snapshot
In addition to creating a snapshot, this process also validates prerequisites on the source and target to ensure the database instances are ready for replication.
For example, it checks if all tables on the source have either a PRIMARY KEY or REPLICA IDENTITY set. If not, it displays a warning message listing the tables without REPLICA IDENTITY and waits for user confirmation before proceeding with the snapshot creation.
2024-03-25T12:40:40.884 WARNING: The following tables in the Source DB have neither a primary key nor a REPLICA IDENTITY (FULL/INDEX)2024-03-25T12:40:40.884 WARNING: UPDATE and DELETE statements on these tables will not be replicated to the Target DB2024-03-25T12:40:40.884 WARNING: - public.metricsPress 'c' and ENTER to continue
Next, open a new terminal and initiate the live migration. Allow it to run uninterrupted.
docker run --rm -it --name live-migration-migrate \-e PGCOPYDB_SOURCE_PGURI=$SOURCE \-e PGCOPYDB_TARGET_PGURI=$TARGET \--pid=host \-v ~/live-migration:/opt/timescale/ts_cdc \timescale/live-migration:v0.0.15 migrate
Note
If the migrate command stops for any reason during execution, you can resume the migration from where it left off by adding a --resume
flag. This is only possible if the snapshot
command is intact and if a volume mount, such as ~/live-migration
, is utilized.
migrate
will utilize the snapshot created in the previous step and migrate the schema
to the target database. After migrating the schema, it will prompt you to create
hypertables in the target database.
Ideally, tables that contain time-series data should be converted to Hypertables.
You need to run create_hypertable()
for each table that you want to convert to
a Hypertable in the target database. For more information, see Hypertable docs.
Once you have finished creating Hypertables, you need to signal "continue" to proceed.
You can do it by pressing the c
key.
Next, the live migration will migrate the existing data in the source database to target database. Then, it will stream live transactions (live replay) received on the source side to the target. During this process, it will display the status of the replay process.
Live-replay will complete in 1 minute 38.631 seconds (source_wal_rate: 106.0B/s, target_replay_rate: 589.0KiB/s, replay_lag: 56MiB)
If the live replay is not able to keep up with the load on the source database, you see a message like:
WARN live-replay not keeping up with source load (source_wal_rate: 3.0MiB/s, target_replay_rate: 462.0KiB/s, replay_lag: 73MiB)
Once the live replay has caught up, live migration surfaces this:
Target has caught up with source (source_wal_rate: 751.0B/s, target_replay_rate: 0B/s, replay_lag: 7KiB)To stop replication, hit 'c' and then ENTER
Important
Application downtime begins here. Ensure that you have a strategy to validate the data in your target database before taking your applications offline, to keep the overall downtime minimal.
Once the live replay has caught up, and you're ready to take your applications offline, stop all applications which are writing to the source database. This marks the beginning of the downtime phase, which lasts until you have validated the data in the target database.
Stopping writes to the source database allows the live migration process to finish replicating data to the target database.
When you see the Target has caught up with source
message, and your
applications are not writing to the database, press c
followed by ENTER to
stop replication.
Target has caught up with source (source_wal_rate: 46.0B/s, target_replay_rate: 0B/s, replay_lag: 221KiB)To stop replication, hit 'c' and then ENTER
The live migration tool continues the remaining work, which includes migrating sequences and cleaning up resources. When the migration completes, you see the following message:
Migration successfully completed
Now that all data has been migrated, the contents of both databases should be the same. How exactly this should best be validated is dependent on your application. You could compare the number of rows or an aggregate of columns to validate that the target database matches with the source.
Important
Application downtime ends here.
Once you are confident with the data validation, the final step is to configure your applications to use the target database.
To clean up resources associated with live migration, use the following command:
docker run --rm -it --name live-migration-clean \-e PGCOPYDB_SOURCE_PGURI=$SOURCE \-e PGCOPYDB_TARGET_PGURI=$TARGET \--pid=host \-v ~/live-migration:/opt/timescale/ts_cdc \timescale/live-migration:v0.0.15 clean --prune
The --prune
flag is used to delete temporary files in the ~/live-migration
directory
that were needed for the migration process. It's important to note that executing the
clean
command means you cannot resume the interrupted live migration.
Keywords
Found an issue on this page?
Report an issue!