Livesync from S3 to Timescale Cloud

Timescale Cloud: Performance, Scale, Enterprise

Self-hosted products

MST

You use livesync to synchronize CSV and Parquet files from an S3 bucket to your Timescale Cloud service in real time. Livesync runs continuously, enabling you to leverage Timescale Cloud as your analytics database with data constantly synced from S3. This lets you take full advantage of Timescale Cloud's real-time analytics capabilities without having to develop or manage custom ETL solutions between S3 and Timescale Cloud.

You can use livesync to synchronize your existing and new data. Here's what livesync can do:

Sync data from an S3 bucket instance to a Timescale Cloud service:
- Use glob patterns to identify the objects to sync.
- Livesync uses the objects returned for subsequent queries. This efficient approach means files are synced in lexicographical order.
- Livesync watches an S3 bucket for new files and imports them automatically. It runs on a configurable schedule and tracks processed files.
- For large backlogs, livesync checks every minute until caught up.
Sync data from multiple file formats:
- CSV: files are checked for compression in .gz and .zip format, then processed using timescaledb-parallel-copy.
- Parquet: files are converted to CSV, then processed using timescaledb-parallel-copy.
Livesync offers an option to enable a hypertable during the file-to-table schema mapping setup. You can enable columnstore and continuous aggregates through the SQL editor once livesync has started.
Livesync offers a default 1-minute polling interval. This means that Timescale Cloud checks the S3 source every minute for new data. You can customize this interval by setting up a cron expression.

Livesync for S3 continuously imports data from an Amazon S3 bucket into your database. It monitors your S3 bucket for new files matching a specified pattern and automatically imports them into your designated database table.

Note: livesync for S3 currently only syncs existing and new files—it does not support updating or deleting records based on updates and deletes from S3 to tables in a Timescale Cloud service.

Early access: livesync is not supported for production use. If you have any questions or feedback, talk to us in #livesync in Timescale Community

Prerequisites

To follow the steps on this page:

Create a target Timescale Cloud service with time-series and analytics enabled.
You need your connection details.

Ensure access to a standard Amazon S3 bucket containing your data files.
Directory buckets are not supported.
Configure access credentials for the S3 bucket.
- The following credentials are supported:
  - IAM Role.
    - Configure the trust policy. Set the:
      - Principal: arn:aws:iam::142548018081:role/timescale-s3-connections.
      - ExternalID: set to the Timescale Cloud project and Timescale Cloud service ID of the service you are syncing to in the format <projectId>/<serviceId>.
        This is to avoid the confused deputy problem.
    - Give the following access permissions:
      - s3:GetObject.
      - s3:ListBucket.
  - Public anonymous user.

Limitations

CSV:
- Maximum file size: 1 GB
  To increase this limit, contact sales@timescale.com
- Maximum row size: 2 MB
- Supported compressed formats:
  - .gz
  - .zip
- Advanced settings:
  - Delimiter: the default character is ,, you can choose a different delimiter
  - Skip header: skip the first row if your file has headers
Parquet:
- Maximum file size: 1 GB
- Maximum row size: 2 MB
Sync iteration:
To prevent system overload, livesync tracks up to 100 files for each sync iteration. Additional checks only fill empty queue slots.

Synchronize data to your Timescale Cloud service

To sync data from your S3 bucket to your Timescale Cloud service using Timescale Console:

Connect to your Timescale Cloud service
In Timescale Console, select the service to sync live data to.
Start livesync
1. Click Actions > Livesync for S3.
2. Click New livesync for S3.
Connect the source S3 bucket to the target service
1. In Livesync for S3, set the Bucket name and Authentication method, then click Continue.
  For instruction on creating the IAM role you need to connect your S3 bucket, click Learn how:
  Timescale Console connects to the source bucket.
2. In Define files to sync, choose the File type and set the Glob pattern.
  Use the following patterns:
  - <folder name>/*: match all files in a folder. Also, any pattern ending with / is treated as /*.
  - <folder name>/**: match all recursively.
  - <folder name>/**/*.csv: match a specific file type.
  Livesync uses prefix filters where possible, place patterns carefully at the end of your glob expression. AWS S3 doesn't support complex filtering. If your expression filters too many files, the list operation may timeout.
3. Click the search icon, you see files to sync. Click Continue.
Optimize the data to synchronize in hypertables
Timescale Console checks the file schema and, if possible, suggests the column to use as the time dimension in a hypertable.
1. Choose the Data type for each column, then click Continue.
2. Choose the interval. This can be a minute, an hour, or use a cron expression.
3. Repeat this step for each table you want to sync.
4. Click Start Livesync.
  Timescale Console starts livesync between the source database and the target service and displays the progress.
Monitor syncronization
1. To view the progress of the livesync, click the name of the livesync process.
  You see the status of the file being synced. Only one file runs at a time.
2. To pause and restart livesync, click the buttons on the right of the livesync process and select an action.
  During pauses, you can edit the configuration before resuming.

And that is it, you are using livesync to synchronize all the data, or specific files, from an S3 bucket to your Timescale Cloud service in real time.

Livesync from S3 to Timescale Cloud

Prerequisites

Limitations

Synchronize data to your Timescale Cloud service

Related Content