Real-time analytics refers to the process of collecting, analyzing, and interpreting data instantly as it is generated. This approach enables you track and monitor activity, and make decisions based on real-time insights on data stored in a Timescale Cloud service.

Real-time analytics geolocation

This page shows you how to integrate Grafana with a Timescale Cloud service and make insights based on visualization of data optimized for size and speed in the columnstore.

To follow the steps on this page:

Time-series data represents how a system, process, or behavior changes over time. Hypertables are PostgreSQL tables that help you improve insert and query performance by automatically partitioning your data by time, speeding up queries for real-time analytics and other challenging workloads. Each hypertable is made up of child tables called chunks. Each chunk is assigned a range of time, and only contains data from that range.

  1. Import time-series data into a hypertable

    1. Unzip

      to a <local folder>.

      This test dataset contains historical data from New York's yellow taxi network.

      To import up to 100GB of data directly from your current PostgreSQL based database, migrate with downtime using native PostgreSQL tooling. To seamlessly import 100GB-10TB+ of data, use the live migration tooling supplied by Timescale. To add data from non-PostgreSQL data sources, see Import and ingest data.

    2. In Terminal, navigate to <local folder> and update the following string with your connection details to connect to your service.

      psql -d "postgres://<username>:<password>@<host>:<port>/<database-name>?sslmode=require"
    3. Create tables to import time-series data:

      1. In your sql client, create a normal PostgreSQL table:

        CREATE TABLE "rides"(
        vendor_id TEXT,
        pickup_datetime TIMESTAMP WITHOUT TIME ZONE NOT NULL,
        dropoff_datetime TIMESTAMP WITHOUT TIME ZONE NOT NULL,
        passenger_count NUMERIC,
        trip_distance NUMERIC,
        pickup_longitude NUMERIC,
        pickup_latitude NUMERIC,
        rate_code INTEGER,
        dropoff_longitude NUMERIC,
        dropoff_latitude NUMERIC,
        payment_type INTEGER,
        fare_amount NUMERIC,
        extra NUMERIC,
        mta_tax NUMERIC,
        tip_amount NUMERIC,
        tolls_amount NUMERIC,
        improvement_surcharge NUMERIC,
        total_amount NUMERIC
        );
      2. Convert rides to a hypertable and partitioned on time:

        SELECT create_hypertable('rides', by_range('pickup_datetime'), create_default_indexes=>FALSE);
        SELECT add_dimension('rides', by_hash('payment_type', 2));

        To more fully understand how hypertables work, and how to optimize them for performance by tuning chunk intervals and enabling chunk skipping, see the hypertables documentation.

      3. Create an index to support efficient queries by vendor, rate code, and passenger count:

        CREATE INDEX ON rides (vendor_id, pickup_datetime DESC);
        CREATE INDEX ON rides (rate_code, pickup_datetime DESC);
        CREATE INDEX ON rides (passenger_count, pickup_datetime DESC);
    4. Create tables for relational data:

      1. Add a table to store the payment types data:

        CREATE TABLE IF NOT EXISTS "payment_types"(
        payment_type INTEGER,
        description TEXT
        );
        INSERT INTO payment_types(payment_type, description) VALUES
        (1, 'credit card'),
        (2, 'cash'),
        (3, 'no charge'),
        (4, 'dispute'),
        (5, 'unknown'),
        (6, 'voided trip');
      2. Add a table to store the rates data:

        CREATE TABLE IF NOT EXISTS "rates"(
        rate_code INTEGER,
        description TEXT
        );
        INSERT INTO rates(rate_code, description) VALUES
        (1, 'standard rate'),
        (2, 'JFK'),
        (3, 'Newark'),
        (4, 'Nassau or Westchester'),
        (5, 'negotiated fare'),
        (6, 'group ride');
    5. Upload the dataset to your service

      \COPY rides FROM nyc_data_rides.csv CSV;

      To more fully understand how hypertables work, and how to optimize them for performance by tuning chunk intervals and enabling chunk skipping, see the hypertables documentation.

  2. Have a quick look at your data

    You query hypertables in exactly the same way as you would a relational PostgreSQL table. Use one of the following SQL editors to run a query and see the data you uploaded:

    • Data mode: write queries, visualize data, and share your results in Timescale Console for all your Timescale Cloud services.
    • SQL editor: write, fix, and organize SQL faster and more accurately in Timescale Console for a Timescale Cloud service.
    • psql: easily run queries on your Timescale Cloud services or self-hosted TimescaleDB deployment from Terminal.

    For example:

    • Display the number of rides for each fare type:

      SELECT rate_code, COUNT(vendor_id) AS num_trips
      FROM rides
      WHERE pickup_datetime < '2016-01-08'
      GROUP BY rate_code
      ORDER BY rate_code;

      This simple query runs in 3 seconds. You see something like:

      rate_codenum_trips
      12266401
      254832
      34126
      4967
      57193
      617
      9942
    • To select all rides taken in the first week of January 2016, and return the total number of trips taken for each rate code:

      SELECT rates.description, COUNT(vendor_id) AS num_trips
      FROM rides
      JOIN rates ON rides.rate_code = rates.rate_code
      WHERE pickup_datetime < '2016-01-08'
      GROUP BY rates.description
      ORDER BY LOWER(rates.description);

      On this large amount of data, this analytical query on data in the rowstore takes about 59 seconds. You see something like:

      descriptionnum_trips
      group ride17
      JFK54832
      Nassau or Westchester967
      negotiated fare7193
      Newark4126
      standard rate2266401

Hypercore is the Timescale hybrid row-columnar storage engine used by hypertables. Hypertables partition your data in chunks. Chunks stored in the rowstore use a row-oriented data format optimized for high-speed inserts and updates. Chunks stored in the columnstore use a columnar data format optimized for analytics. You ingest hot data into the rowstore. As data cools and becomes more suited for analytics, Timescale Cloud automatically converts these chunks of data to the columnstore. You define the moment when data is converted using a columnstore policy.

When Timescale Cloud converts a chunk to the columnstore, TimescaleDB automatically creates a different schema for your data. TimescaleDB creates and uses custom indexes to incorporate the segmentby and orderby parameters when you write to and read from the columstore.

To increase the speed of your analytical queries by a factor of 10 and reduce storage costs by up to 90%, convert data to the columnstore:

  1. Connect to your Timescale Cloud service

    In Timescale Console open an SQL editor. The in-Console editors display the query speed. You can also connect to your service using psql.

  2. Enable columnstore on a hypertable

    Create a job that automatically moves chunks in a hypertable to the columnstore at a specific time interval. By default, your table is orderedby the time column. For efficient queries on columnstore data, remember to segmentby the column you will use most often to filter your data:

    ALTER TABLE rides SET (
    timescaledb.enable_columnstore = true,
    timescaledb.segmentby = 'vendor_id',
    timescaledb.orderby = 'pickup_datetime DESC'
    );
  3. Add a policy to convert chunks to the columnstore at a specific time interval

    For example, convert data older than 8 days old to the columstore:

    CALL add_columnstore_policy('rides', INTERVAL '8 days');

    See add_columnstore_policy.

  4. View your data space saving

    When you convert data to the columnstore, as well as being optimized for analytics, it is compressed by more than 90%. This saves on storage costs and keeps your queries operating at lightning speed. To see the amount of space saved:

    SELECT
    pg_size_pretty(before_compression_total_bytes) as before,
    pg_size_pretty(after_compression_total_bytes) as after
    FROM hypertable_compression_stats('rides');

    You see something like:

    beforeafter
    2818 MB673 MB
  5. Faster analytical queries on data in the columnstore

    Now run the analytical query again:

    SELECT rates.description, COUNT(vendor_id) AS num_trips
    FROM rides
    JOIN rates ON rides.rate_code = rates.rate_code
    WHERE pickup_datetime < '2016-01-08'
    GROUP BY rates.description
    ORDER BY LOWER(rates.description);

    On this large amount of data, this analytical query on data in the columnstore takes about 6 seconds.

Just to hit this one home, by converting cooling data to the columnstore, you have increased the speed of your analytical queries by a factor of 10, and reduced storage by up to 90%.

To visualize the results of your queries, enable Grafana to read the data in your service:

  1. Log in to Grafana

    In your browser, log in to either:

    • Self-hosted Grafana: at http://localhost:3000/. The default credentials are admin, admin.
    • Grafana Cloud: use the URL and credentials you set when you created your account.
  2. Add your service as a data source

    1. Open Connections > Data sources, then click Add new data source.

    2. Select PostgreSQL from the list.

    3. Configure the connection:

      • Host URL, Database name, Username, and Password

        Configure using your connection details. Host URL is in the format <host>:<port>.

      • TLS/SSL Mode: select require.

      • PostgreSQL options: enable TimescaleDB.

      • Leave the default setting for all other fields.

    4. Click Save & test.

      Grafana checks that your details are set correctly.

A Grafana dashboard represents a view into the performance of a system, and each dashboard consists of one or more panels, which represent information about a specific metric related to that system.

To visually monitor the volume of taxi rides over time:

  1. Create the dashboard

    1. On the Dashboards page, click New and select New dashboard.

    2. Click Add visualization.

    3. Select the data source that connects to your Timescale Cloud service. The Time series visualization is chosen by default.

      Grafana create dashboard

    4. In the Queries section, select Code, then select Time series in Format.

    5. Select the data range for your visualization: the data set is from 2016. Click the date range above the panel and set:

      • From: 2016-01-01 01:00:00
      • To: 2016-01-30 01:00:00
  2. Combine TimescaleDB and Grafana functionality to analyze your data

    Combine a TimescaleDB time_bucket, with the Grafana $__timefilter() function to set the pickup_datetime column as the filtering range for your visualizations.

    SELECT
    time_bucket('1 day', pickup_datetime) AS "time",
    COUNT(*)
    FROM rides
    WHERE $__timeFilter(pickup_datetime)
    GROUP BY time
    ORDER BY time;

    This query groups the results by day and orders them by time.

    Grafana real-time analytics

  3. Click Save dashboard

Having all this data is great but how do you use it? Monitoring data is useful to check what has happened, but how can you analyse this information to your advantage? This section explains how to create a visualization that shows how you can maximize potential revenue.

To add geospatial analysis to your ride count visualization, you need geospatial data to work out which trips originated where. As TimescaleDB is compatible with all PostgreSQL extensions, use PostGIS to slice data by time and location.

  1. Connect to your Timescale Cloud service and add the PostGIS extension:

    CREATE EXTENSION postgis;
  2. Add geometry columns for pick up and drop off locations:

    ALTER TABLE rides ADD COLUMN pickup_geom geometry(POINT,2163);
    ALTER TABLE rides ADD COLUMN dropoff_geom geometry(POINT,2163);
  3. Convert the latitude and longitude points into geometry coordinates that work with PostGIS:

    UPDATE rides SET pickup_geom = ST_Transform(ST_SetSRID(ST_MakePoint(pickup_longitude,pickup_latitude),4326),2163),
    dropoff_geom = ST_Transform(ST_SetSRID(ST_MakePoint(dropoff_longitude,dropoff_latitude),4326),2163);

    This updates 10,906,860 rows of data on both columns, it takes a while. Coffee is your friend.

In this section you visualize a query that returns rides longer than 5 miles for trips taken within 2 km of Times Square. The data includes the distance travelled and is GROUP BY trip_distance and location so that Grafana can plot the data properly.

This enables you to see where a taxi driver is most likely to pick up a passenger who wants a longer ride, and make more money.

  1. Create a geolocalization dashboard

    1. In Grafana, create a new dashboard that is connected to your Timescale Cloud service data source with a Geomap visualization.

    2. In the Queries section, select Code, then select the Time series Format.

      Real-time analytics geolocation

    3. To find rides longer than 5 miles in Manhattan, paste the following query:

      SELECT time_bucket('5m', rides.pickup_datetime) AS time,
      rides.trip_distance AS value,
      rides.pickup_latitude AS latitude,
      rides.pickup_longitude AS longitude
      FROM rides
      WHERE rides.pickup_datetime BETWEEN '2016-01-01T01:41:55.986Z' AND '2016-01-01T07:41:55.986Z' AND
      ST_Distance(pickup_geom,
      ST_Transform(ST_SetSRID(ST_MakePoint(-73.9851,40.7589),4326),2163)
      ) < 2000
      GROUP BY time,
      rides.trip_distance,
      rides.pickup_latitude,
      rides.pickup_longitude
      ORDER BY time
      LIMIT 500;

      You see a world map with a dot on New York.

    4. Zoom into your map to see the visualization clearly.

  2. Customize the visualization

    1. In the Geomap options, under Map Layers, click + Add layer and select Heatmap. You now see the areas where a taxi driver is most likely to pick up a passenger who wants a longer ride, and make more money.

      Real-time analytics geolocation

You have integrated Grafana with a Timescale Cloud service and made insights based on visualization of your data.

Keywords

Found an issue on this page?Report an issue or Edit this page in GitHub.