You use the Livesync Docker image to synchronize all data, or specific tables, from a PostgreSQL database instance to a Timescale Cloud service in real-time. You run Livesync continuously, turning PostgreSQL into a primary database with a Timescale Cloud service as a logical replica. This enables you to leverage Timescale Cloud’s real-time analytics capabilities on your replica data.

Warning

You use Livesync for for data synchronization, rather than migration. It is in alpha and is not recommended for production use.

Livesync leverages the PostgreSQL logical replication protocol, a well-established and widely understood feature in the PostgreSQL ecosystem. By relying on this protocol, Livesync ensures compatibility, familiarity, and a broader knowledge base, making it easier for you to adopt and integrate.

You use Livesync to:

  • Copy existing data from a PostgreSQL instance to a Timescale Cloud service:
    • Copy data at up to 150 GB/hr. You need at least a 4 CPU/16GB source database, and a 4 CPU/16GB target service.
    • Copy the publication tables in parallel. However, large tables are still copied using a single connection. Parallel copying is in the backlog.
    • Forget foreign key relationships. Livesync disables foreign key validation during the sync. For example, if a metrics table refers to the id column on the tags table, you can still sync only the metrics table without worrying about their foreign key relationships.
    • Track progress. PostgreSQL expose COPY progress under in pg_stat_progress_copy.
  • Synchronize real-time changes from a PostgreSQL instance to a Timescale Cloud service.
  • Add and remove tables on demand using the PostgreSQL PUBLICATION interface.
  • Enable features such as hypertables, columnstore, and continuous aggregates on your logical replica.

If you have an questions or feedback, talk to us in #livesync in Timescale Community.

Best practice is to use an Ubuntu EC2 instance hosted in the same region as your Timescale Cloud service to move data. That is, the machine you run the commands on to move your data from your source database to your target Timescale Cloud service.

Before you move your data:

  • Create a target Timescale Cloud service.

    Each Timescale Cloud service has a single database that supports the most popular extensions. Timescale Cloud services do not support tablespaces, and there is no superuser associated with a service. Best practice is to create a Timescale Cloud services with at least 8 CPUs for a smoother experience. A higher-spec instance can significantly reduce the overall migration window.

  • To ensure that maintenance does not run while migration is in progress, best practice is to adjust the maintenance window.

  • Install Docker on your sync machine. You need a minimum of a 4 CPU/16GB EC2 instance to run Livesync

  • Install the PostgreSQL client tools on your sync machine.

    This includes psql, pg_dump, and pg_dumpall.

  • The Schema is not migrated by Livesync, you use pg_dump/restore to migrate schema
  • Schema changes must be co-ordinated. Make compatible changes to the schema in your Timescale Cloud service first, then make the same changes to the source PostgreSQL instance.
  • There is WAL volume growth on the source PostgreSQL instance during large table copy.
  • This works for PostgreSQL databases only as source. Timescaledb is not yet supported.

The <user> in the SOURCE connection must have the replication role granted in order to create a replication slot.

These variables hold the connection information for the source database and target Timescale Cloud service. In Terminal on your migration machine, set the following:

export SOURCE="postgres://<user>:<password>@<source host>:<source port>/<db_name>"
export TARGET="postgres://tsdbadmin:<PASSWORD>@<HOST>:<PORT>/tsdb?sslmode=require"

You find the connection information for your Timescale Cloud service in the configuration file you downloaded when you created the service.

Important

Avoid using connection strings that route through connection poolers like PgBouncer or similar tools. This tool requires a direct connection to the database to function properly.

You need to tune the Write Ahead Log (WAL) on the PostgreSQL source database:

To do this:

psql $SOURCE -c "SET wal_level=’logical’;"
psql $SOURCE -c "SET max_wal_sender=10;"

Replica identity assists data replication by identifying the rows being modified. By default each table and hypertable in the source database defaults to the primary key of the table being replicated. However, you can also have:

  • A viable unique index: each table has a unique, non-partial, non-deferrable index that includes only columns marked as NOT NULL. If a UNIQUE index does not exists, create one to assist the migration. You can delete it after live sync. For each table, set REPLICA IDENTITY to the viable unique index:

    psql -X -d $SOURCE -c 'ALTER TABLE <table name> REPLICA IDENTITY USING INDEX <_index_name>'
  • No primary key or viable unique index: use brute force. For each table, set REPLICA IDENTITY to FULL:

    psql -X -d $SOURCE -c 'ALTER TABLE <table name> REPLICA IDENTITY FULL'

    For each UPDATE or DELETE statement, PostgreSQL reads the whole table to find all matching rows. This results in significantly slower replication. If you are expecting a large number of UPDATE or DELETE operations on the table, best practice is to not use FULL

To capture only INSERT and ignore UPDATEs and DELETEs, use a publish config while creating the publication.

Use pg_dump to:

  1. Download the schema from the source database

    pg_dump $SOURCE \
    --no-privileges \
    --no-owner \
    --no-publications \
    --no-subscriptions \
    --no-table-access-method \
    --no-tablespaces \
    --schema-only \
    --file=schema.sql
  2. Apply the schema on the target service

    psql $TARGET -f schema.sql

For efficient querying and analysis, you can convert tables which contain time-series or events data, and tables that are already partitioned using PostgreSQL declarative partition into hypertables.

  1. Convert tables to hyperatables

    Run the following on each table in the target Timescale Cloud service to convert it to a hypertable:

    psql -X -d $TARGET -c "SELECT create_hypertable('<table>', by_range('<partition column>', '<chunk interval>'::interval));"

    For example, to convert the metrics table into a hypertable with time as a partition column and 1 day as a partition interval:

    psql -X -d $TARGET -c "SELECT create_hypertable('public.metrics', by_range('time', '1 day'::interval));"
  2. Convert PostgreSQL partitions to hyperatables

    Rename the partition and create a new normal table with the same name as the partitioned table, then convert to a hypertable:

    psql $TARGET -f - <<EOF
    BEGIN;
    ALTER TABLE public.events RENAME TO events_part;
    CREATE TABLE public.events(LIKE public.events_part INCLUDING ALL);
    SELECT create_hypertable('public.events', by_range('time', '1 day'::interval));
    COMMIT;
    EOF

You use the Livesync docker image to synchronize changes in real-time from a PostgreSQL database instance to a Timescale Cloud service:

  1. Start Livesync

    As you run Livesync continuously, best practice is to run it as a background process.

    docker run -d --rm --name livesync timescale/live-sync:v0.0.0-alpha.1-amd64 start --publication analytics --subscription livesync --source $SOURCE --target $TARGET
  2. Trace progress

    Once Livesync is running as a docker daemon, you can also capture the logs:

    docker logs -f livesync
  3. View the tables being synchronized

    psql $TARGET -c "SELECT * FROM _ts_live_sync.subscription_rel"
    subname | schemaname | tablename | rrelid | state | lsn
    ----------+------------+-----------+--------+-------+-----
    livesync | public | metrics | 17261 | d |

    Possible values for state are:

    • d: initial table data sync
    • f: initial table data sync completed
    • s: catching up with the latest change
    • r: table is ready, synching live changes
  4. Stop Livesync

    docker stop live-sync
  5. Cleanup

    You need to manually execute a SQL snippet to cleanup replication slots created by the live-migration.

    psql $SOURCE -f - <<EOF
    select pg_drop_replication_slot(slot_name) from pg_stat_replication_slots where slot_name like 'livesync%';
    select pg_drop_replication_slot(slot_name) from pg_stat_replication_slots where slot_name like 'ts%';
    EOF

    A command to clean up is coming shortly.

After the Livesync docker is up and running, you CREATE PUBLICATION on the SOURCE database to specify the list of tables which you intend to synchronize. Once you create a PUBLICATION, it is automatically picked by Livesync, which starts synching the tables expressed as part of it.

For example:

  1. Create a publication named analytics which publishes metrics and tags tables

    PUBLICATION enables you to add all the tables in the schema or even all the tables in the database. However, it requires superuser privileges on most of the managed PostgreSQL offerings.

    CREATE PUBLICATION analytics FOR TABLE metrics, tags;
  2. Add tables after to an existing publication with a call to ALTER PUBLICATION

    ALTER PUBLICATION analytics ADD TABLE events;
  3. Publish PostgreSQL declarative partitioned table

    To publish declaratively partitioned table changes to your Timescale Cloud service, set the publish_via_partition_root special PUBLICATION config to true:

    ALTER PUBLICATION analytics SET(publish_via_partition_root=true);
  4. Stop synching a table in the PUBLICATION with a call to DROP TABLE

    ALTER PUBLICATION analytics DROP TABLE tags;

Keywords

Found an issue on this page?Report an issue or Edit this page in GitHub.