Apache Parquet is a free and open-source column-oriented data storage format in the Apache Hadoop ecosystem. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk.

This page shows you how to import data into your Timescale Cloud service from a Parquet file.

Before you import your data:

  • Install DuckDB on the source machine where the Parquet file is located.
  • Ensured that the time column in the Parquet file uses the TIMESTAMP data type.

For faster data transfer, best practice is that your target service, and the system running the data import are in the same region.

To import data from a Parquet file:

  1. Setup your service connection string

    This variable hold the connection information for the target Timescale Cloud service.

    In Terminal on the source machine, set the following:

    export TARGET=postgres://tsdbadmin:<PASSWORD>@<HOST>:<PORT>/tsdb?sslmode=require

    You find the connection information for your Timescale Cloud service in the configuration file you downloaded when you created the service.

  2. Create a hypertable to hold your data

    1. Create a new empty table with a schema that is compatible with the data in your parquet file.

      For example, if your parquet file contains the columns ts, location, and temperature with types TIMESTAMP, STRING, and DOUBLE:

      psql $TARGET -c "CREATE TABLE <TABLE_NAME> ( \
      ts TIMESTAMPTZ NOT NULL, \
      location TEXT NOT NULL, \
      temperature DOUBLE PRECISION NULL \
      );"

      If you prefer using a secure UI to the command line, use Data mode in Timescale Console.

    2. Convert the empty table to a hypertable:

      In the following command, replace <TABLE NAME> with the name of the table you just created, and <COLUMN_NAME> with the partitioning column in <TABLE NAME>.

      psql $TARGET -c "SELECT create_hypertable('<TABLE_NAME>', by_range('<COLUMN_NAME>'))"
  3. Setup a DuckDB connection to your service

    1. In Terminal on the source machine with your Parquet files, start a new DuckDB interactive session:

      duckdb
    2. Connect to your service in your DuckDB session:

      ATTACH '<Paste the value of $TARGET here' AS db (type postgres);

      $TARGET is the connection string you used to connect to your service using psql.

  4. Import data from Parquet to your service

    1. In DuckDB, upload the table data to your service

      COPY db.<TABLE_NAME> FROM '<FILENAME>.parquet' (FORMAT parquet);

      Where:

      • <TABLE_NAME>: the hypertable you created to import data to
      • <FILENAME>: the Parquet file to import data from
    2. Exit the DuckDB session:

      EXIT;
  5. Verify the data was imported correctly into your service

    In your psql session, or using Data mode in Timescale Console, view the data in <TABLE_NAME>

    SELECT * FROM <TABLE_NAME>;

And that is it, you have imported your data from a Parquet file to your Timescale Cloud service.

Keywords

Found an issue on this page?Report an issue or Edit this page in GitHub.