This document provides a step-by-step guide to migrating a database from an AWS RDS Postgres instance to Timescale using our live migration strategy to achieve low application downtime (on the order of minutes).

Live migration's replication mechanism is fundamentally based on Postgres' logical decoding feature. However, for the purpose of this guide, an in-depth understanding of logical decoding is not necessary.

Note

In the context of migrations, your existing production database is referred to as the "source" database, while the new Timescale database that you intend to migrate your data to is referred to as the "target" database.

Before you start the migration process, you will need to:

  1. Gather information about your RDS instance.
  2. Prepare your RDS instance for Live migration.
  3. Prepare an intermediate machine.

You will need the following information about your Postgres RDS instance:

  • Endpoint
  • Port
  • Master username
  • Master password
  • VPC
  • DB instance parameter group

To gather the required information, navigate to the "Databases" panel and select your RDS instance.

RDS instances

Note the Endpoint, Port, and VPC details from the "Connectivity & Security" tab, and the Master Username, and DB instance parameter group from the "Configuration" tab. Remember to use the Master Password that was supplied when the Postgres RDS instance was created.

Record endpoint, port, VPC details
Record master username and DB parameter group

To use your RDS instance as the source database for a Live migration, and to ensure the replication process runs smoothly, set the following configuration parameters:

ConfigurationRequired Values
wal_levellogical
old_snapshot_threshold-1

You can check the value of above configuration parameters

postgres=> show wal_level;
wal_level
-----------
minimal
(1 row)
postgres=> show old_snapshot_threshold;
old_snapshot_threshold
------------------------
0
(1 row)

If the values returned do not match the ones required (as in this situation), you will have to adjust them.

Important

Modifying either of these parameters requires restarting PostgreSQL, which will cause your database to be briefly unavailable. Ensure that you make the following configuration changes when you're comfortable with a database restart.

For users to make modifications, RDS requires the creation of a parameter group with the desired configuration values.

A "DB parameter group" can be thought of as a set of key-value pairs that serve as the configuration settings for the database. To find the current DB parameter group that your RDS instance is using, go to the "Configuration" tab of your RDS service and look for the value listed under "DB instance parameter group".

Know your DB parameter group

Let’s create a DB parameter group that modifies the values for "wal_level" and "old_snapshot_threshold" fields. We will base this group on the existing "prod-pr-group" so that we can maintain the other fields as they are while only changing the ones we need.

Important

If your AWS RDS service is a Multi-AZ DB Cluster deployment, then you will need to use "DB Cluster Parameter Group" instead of "DB Parameter Group".

  1. Navigate to the "Parameter groups" panel using the sidebar.

    Navigate to parameter group panel
  2. Copy the parameter group that is active in your RDS instance.

    Important

    If the "DB parameter group" of your RDS instance displays "default.postgres15", you need to create a new DB parameter group. To do this, select "Create parameter group" and opt for "postgres15" in the "Parameter group family" field.

    Copy your existing parameter group
  3. Rename the parameter group, provide a description and click on "Create".

    Rename the new parameter group
  • You will find the new parameter group in the "Parameter group" panel. Open the newly created parameter group by clicking on it.
    Open the new parameter group
  1. Click on "Edit".

    Edit the new parameter group
  2. Locate "rds.logical_replication" using the search box and adjust its value from 0 to 1. Next, locate "old_snapshot_threshold" and set its value to -1. Once you’ve made these changes, remember to click on "Save Changes".

    Modify rds.logical_replication field
    Modify old_snapshot_threshold field
  3. Navigate back to your RDS instance and click on "Modify".

    Navigate back to your RDS instance
  4. Navigate to the "Additional configuration" section and change the "DB parameter group" value to the new DB parameter group that includes the updated parameter values. After making this change, click "Continue" located at the bottom of the page.

    Modify DB parameter group
  5. Once you have verified the new parameter group value, select "Apply immediately" and click on "Modify DB Instance".

    Apply DB parameter group
    Important

    Configuring the new parameter group will not cause your database to be restarted. In the next step you will manually reboot the database.

  6. Your RDS service will be in the "Modifying" state for a short while, after which, you should see that the parameter group is in the "Pending reboot" state and must be rebooted to be applied.

  7. Manually reboot your service in order to apply the changes. In the "Actions" dropdown, click "Reboot" and then "Confirm".

  8. After your service reboots, ensure that the new "DB parameter group" is applied in your RDS instance.

    Ensure the new DB parameter group on your RDS instance
  9. Reconnect to your RDS instance and verify the configuration parameter values.

postgres=> show wal_level;
wal_level
-----------
logical
(1 row)
postgres=> show old_snapshot_threshold ;
old_snapshot_threshold
------------------------
-1
(1 row)

Live Migration must be executed from a dedicated machine. The live migration tools connect to the source and target databases, and all data will flow through the dedicated machine between the source and the target. This instance must be able to access your RDS service and ideally is in the same region as your Timescale instance. We will set up an EC2 instance in the same VPC as the RDS service.

  1. In the AWS search, type "EC2" and select the "EC2" option under services.

    Search for EC2 services
  2. Click on "Launch instance".

    Launch EC2 instance
  3. Configure your EC2 instance.

    a. For "Application and OS image", choose Ubuntu Server LTS.

    b. For "Instance type", use 2 CPU and 8 GB memory at least. If your migration involves a larger database, you should choose accordingly.

    Configure instance type

    c. For "Key pair", you can choose to use an existing key pair or create a new one. This will be necessary when connecting to the EC2 instance from your local machine.

    d. For "Network Settings", select the same VPC that your RDS instance is located in. Also, modify the "Source Type" of the security group to "My IP" so that your local machine can connect to the EC2 instance.

    Configure network

    e. For "Configure Storage" section, adjust the volume size to match the 1.5x the size of your database. If necessary, you should enable encryption on your volume.

    Configure storage
  4. Review the summary of the instance and click "Launch instance".

    Review EC2 instance

To prepare your EC2 instance for a low-downtime migration:

  1. Navigate to your EC2 instance and click on "Connect".

    Navigate to your EC2 instance
  2. Select the "SSH client" tab.

    Connect to your EC2 using SSH client
  3. Connect to your EC2 instance using the "Key pair" you received while creating the EC2 instance.

chmod 400 <key-pair>.pem
ssh -i "<key-pair>.pem" ubuntu@<EC2 instance's Public IPv4>
  1. Install PostgreSQL client and pgcopydb.
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
wget -qO- https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo tee /etc/apt/trusted.gpg.d/pgdg.asc &>/dev/null
sudo apt update
sudo apt install postgresql-client-15 pgcopydb -y
psql --version && pg_dump --version && pgcopydb --version
  1. To allow an EC2 instance to connect to an RDS instance, you need to modify the security group associated with your RDS instance.

    a. Note the Private IPv4 address of your EC2 instance.

    Note the private IPv4 address

    b. Go to the security group associated with your RDS instance and select "Edit inbound rules".

    Edit the security group

    c. Click "Add rule". In the "Type" field, select "PostgreSQL". For "Source", select "Custom" and input the Private IPv4 address of your EC2 instance. Add a suitable description for this rule. Finally, click on "Save rules" to apply the changes.

    Add new rule
  2. Verify the connection to the RDS service from your EC2 instance. Please note, you will be prompted to enter the master password. This should be the same password that you used when creating your AWS RDS service.

# psql -h <rds_endpoint> -p <rds_port> -U postgres -c "select 1"
psql -h aws-rds-low-downtime-migration-db.c8tzn206yp6f.us-west-2.rds.amazonaws.com -p 5432 -U postgres -c "select 1"
Password for user postgres:
?column?
----------
1
(1 row)
Tip

If you get stuck, you can get help by either opening a support request, or take your issue to the #migration channel in the community slack, where the developers of this migration method are there to help.

You can open a support request directly from the Timescale console, or by email to [email protected].

The migration process consists of the following steps:

  1. Set up a target database instance in Timescale.
  2. Prepare the source database for the live migration.
  3. Set up a replication slot and snapshot.
  4. Migrate roles and schema from source to target.
  5. Perform "Live migration".

Create a database service in Timescale.

If you intend on migrating more than 400 GB, open a support request to ensure that enough disk is pre-provisioned on your Timescale instance.

You can open a support request directly from the Timescale console, or by email to [email protected].

In order to replicate UPDATE and DELETE operations on tables in the source database, the tables must have either a primary key or REPLICA IDENTITY. Replica identity assists logical decoding in identifying the rows being modified. It defaults to using the table's primary key.

If a table doesn't have a primary key, you'll have to manually set the replica identity. One option is to use a unique, non-partial, non-deferrable index that includes only columns marked as NOT NULL.

ALTER TABLE {table_name} REPLICA IDENTITY USING INDEX {_index_name};

If there's no primary key or viable unique index to use, you will have to set REPLICA IDENTITY to FULL. If you are expecting a large number of UPDATE or DELETE operations on the table, we do not recommend using FULL. For each UPDATE or DELETE statement, Postgres will have to read the whole table to find all matching rows slowing the replication process.

ALTER TABLE {table_name} REPLICA IDENTITY FULL;

Once you're sure that the tables which will be affected by UPDATE and DELETE queries have REPLICA IDENTITY set, you will need to create a replication slot.

Replication slots keep track of transactions (recorded in Write-Ahead Log files) in the source database that have not been streamed to the target database yet. We will use pgcopydb to create a replication slot in the source database.

pgcopydb follow \
--source "$SOURCE" \
--target "$TARGET" \
--fail-fast \
--plugin wal2json

This command is going to be active during most of the migration process. You should run it in the background or use terminal multiplexers like screen or tmux.

The follow command sets up a replication slot in the source database to stream changes. These changes are held in "intermediate machine" on disk until "apply" is given. We will discuss about apply command in subsequent steps.

Additionally, follow command exports a snapshot ID to /tmp/pgcopydb/snapshot. This ID can be utilized to migrate data that was in the database before the replication slot was created.

Before applying DML operations from the replication slot, the schema and data from the source database need to be migrated. The larger the size of the source database, the more time it takes to perform the initial migration, and the longer the buffered files need to be stored.

pg_dumpall -d "$SOURCE" \
--quote-all-identifiers \
--roles-only \
--no-role-passwords \
--file=roles.sql
Important

AWS RDS does not permit dumping of roles with passwords, which is why the above command is executed with the --no-role-passwords. However, when the migration of roles to your Timescale instance is complete, you need to manually assign passwords to the necessary roles using the following command: ALTER ROLE name WITH PASSWORD 'password';

Timescale services do not support roles with superuser access. If your SQL dump includes roles that have such permissions, you'll need to modify the file to be compliant with the security model.

You can use the following sed command to remove unsupported statements and permissions from your roles.sql file:

sed -i -E \
-e '/CREATE ROLE "postgres";/d' \
-e '/ALTER ROLE "postgres"/d' \
-e '/CREATE ROLE "rds/d' \
-e '/ALTER ROLE "rds/d' \
-e '/TO "rds/d' \
-e '/GRANT "rds/d' \
-e 's/(NO)*SUPERUSER//g' \
-e 's/(NO)*REPLICATION//g' \
-e 's/(NO)*BYPASSRLS//g' \
-e 's/GRANTED BY "[^"]*"//g' \
roles.sql
Note

This command works only with the GNU implementation of sed (sometimes referred to as gsed). For the BSD implementation (the default on macOS), you need to add an extra argument to change the -i flag to -i ''.

To check the sed version, you can use the command sed --version. While the GNU version explicitly identifies itself as GNU, the BSD version of sed generally doesn't provide a straightforward --version flag and simply outputs an "illegal option" error.

A brief explanation of this script is:

  • CREATE ROLE "postgres"; and ALTER ROLE "postgres": These statements are removed because they require superuser access, which is not supported by Timescale.

  • (NO)SUPERUSER | (NO)REPLICATION | (NO)BYPASSRLS: These are permissions that require superuser access.

  • CREATE ROLE "rds, ALTER ROLE “rds, TO "rds, GRANT "rds: Any creation or alteration of rds prefixed roles are removed because of their lack of any use in a Timescale instance. Similarly, any grants to or from "rds" prefixed roles are ignored as well.

  • GRANTED BY role_specification: The GRANTED BY clause can also have permissions that require superuser access and should therefore be removed. Note: Per the TimescaleDB documentation, the GRANTOR in the GRANTED BY clause must be the current user, and this clause mainly serves the purpose of SQL compatibility. Therefore, it's safe to remove it.

pg_dump -d "$SOURCE" \
--format=plain \
--quote-all-identifiers \
--no-tablespaces \
--no-owner \
--no-privileges \
--section=pre-data \
--file=pre-data-dump.sql \
--snapshot=$(cat /tmp/pgcopydb/snapshot)
  • --section=pre-data is used to dump only the definition of tables, sequences, check constraints and inheritance hierarchy. It excludes indexes, foreign key constraints, triggers and rules.

  • --snapshot is used to specified the synchronized snapshot when making a dump of the database.

  • --no-tablespaces is required because Timescale does not support tablespaces other than the default. This is a known limitation.

  • --no-owner is required because Timescale's tsdbadmin user is not a superuser and cannot assign ownership in all cases. This flag means that everything is owned by the user used to connect to the target, regardless of ownership in the source. This is a known limitation.

  • --no-privileges is required because Timescale's tsdbadmin user is not a superuser and cannot assign privileges in all cases. This flag means that privileges assigned to other users must be reassigned in the target database as a manual clean-up task. This is a known limitation.

psql -X -d "$TARGET" \
-v ON_ERROR_STOP=1 \
--echo-errors \
-f roles.sql \
-f pre-data-dump.sql

The remaining steps for migrating data from a RDS Postgres instance to Timescale with low-downtime are the same as the ones mentioned in "Live migration" documentation from Step 5 onwards. You should follow the mentioned steps to successfully complete the migration process.

Keywords

Found an issue on this page?

Report an issue!