Join time-series data with relational data for deeper analysis

TimescaleDB is packaged as a PostgreSQL extension. As such, TimescaleDB is PostgreSQL with super-powers. You can do anything in TimescaleDB that you can in PostgreSQL, including joining tables and combining data for further analysis.

The Mile-High advantage

In professional football, it's a well-understood phenomenon that teams that visit Denver's Mile-High stadium are at a disadvantage because unlike the home team (the Denver Broncos), they are not accustomed to playing in high altitude.

Earlier we ingested stadium data. Now we can run a query to see the performance of players when they are playing at Mile High Stadium.

Like many of the queries in our analysis section, for this example you will utilize the relational nature of this data. You will join the tracking, player, and game tables to compare the average acceleration and yards run of individual players when they are performing in stadiums outside of Denver versus when they are playing within Denver. The columns avg_acc_den and avg_yards_den represent the acceleration and yard statistics while in Denver.

WITH stat_vals AS (
-- This table collects the summed yard and avg acceleration data of a player during one game
  SELECT a.player_id, displayname, SUM(yards) AS yards, AVG(acc) AS acc, team, gameid 
  FROM player_yards_by_game a
  LEFT JOIN player p ON a.player_id = p.player_id 
  GROUP BY a.player_id, displayname, gameid, team
), team_data AS (
-- This table gets us the team information so that we can filter on teams
    SELECT a.player_id, acc, yards, a.gameid,
        CASE 
            WHEN a.team = 'away' THEN g.visitor_team 
            WHEN a.team = 'home' THEN g.home_team 
            ELSE NULL 
            END AS team_name,
        g.home_team
    FROM stat_vals a
    LEFT JOIN game g ON a.gameid = g.game_id 
), avg_stats AS (
-- This table takes the avg acceleration and yards run for players when they are not in denver
-- and then when they are in denver
SELECT p.player_id, p.displayname, 
    AVG(acc) FILTER (WHERE team_name != 'DEN' AND home_team !='DEN') AS avg_acc, 
    AVG(acc) FILTER (WHERE team_name != 'DEN' AND home_team = 'DEN') AS avg_acc_den,
    AVG(yards) FILTER (WHERE team_name != 'DEN' AND home_team !='DEN') AS avg_yards, 
    AVG(yards) FILTER (WHERE team_name != 'DEN' AND home_team = 'DEN') AS avg_yards_den, 
    COUNT(gameid) FILTER (WHERE team_name != 'DEN' AND home_team !='DEN') AS games,
    COUNT(gameid) FILTER (WHERE team_name != 'DEN' AND home_team ='DEN') AS games_den
FROM team_data t
LEFT JOIN player p ON t.player_id = p.player_id 
GROUP BY p.player_id, p.displayname
)
SELECT * FROM avg_stats
WHERE avg_acc IS NOT NULL AND avg_acc_den IS NOT NULL
ORDER BY avg_acc DESC, avg_acc_den DESC

You should see this:

player_iddisplaynameavg_accavg_acc_denavg_yardsavg_yards_dengamesgames_den
2552597Breshad Perriman2.322.06408.79461.0891
2560988Antonio Callaway2.272.30778.91741.06151
2556214Tyreek Hill2.272.191007.311004.26141
2558194Josh Reynolds2.26]2.40]527.80]529.16151
2543498Brandin Cooks2.262.26975.61875.90151

You can see that generally, it appears many players may have worse acceleration and average number of yards run per game while playing in Denver. However, it is good to note that you only have one sample point showing Denver averages which effects statistical significance.

Grass vs. turf, the eternal (football) question

Players often say they "feel" faster on artificial turf. How much faster are they in reality?

Using this query you will join the tracking, stadium_info, game, and player tables, to extract the average acceleration that a player has while using turf verses grass. The column avg_acc_turf represents the players average acceleration while using artificial turf, and avg_acc_grass represents their average acceleration while on grass.

WITH acceleration AS (
-- This table gets the average acceleration for a player over each game
    SELECT a.player_id, avg(acc) AS acc, a.gameid, a.team
    FROM player_yards_by_game a
    GROUP BY a.player_id, a.gameid, a.team
), team_data AS (
-- This table gets us the surface information so that we can filter on turf type
    SELECT a.player_id, acc, g.game_id, si."location", si.surface 
    FROM acceleration a
    LEFT JOIN game g ON a.gameid = g.game_id 
    LEFT JOIN stadium_info si on g.home_team = si.team_abbreviation 
), avg_stats AS (
-- This table takes the avg acceleration and yards run for players when they are not in denver
-- and then when they are in denver
SELECT p.player_id, p.displayname, 
    AVG(acc) FILTER (WHERE surface LIKE '%Turf%') AS avg_acc_turf, 
    AVG(acc) FILTER (WHERE surface NOT LIKE '%Turf%') AS avg_acc_grass
FROM team_data t
LEFT JOIN player p ON t.player_id = p.player_id 
GROUP BY p.player_id, p.displayname
)
SELECT * FROM avg_stats
WHERE avg_acc_turf IS NOT NULL AND avg_acc_grass IS NOT NULL AND player_id IS NOT NULL
ORDER BY avg_acc_turf DESC, avg_acc_grass DESC

You should see this:

player_iddisplaynameavg_acc_turfavg_acc_grass
2559066Gehrig Dieter3.252.29
2507948Frank Zombo2.932.30
2555173Eric Murray2.781.86
2552374Ameer Abdullah2.762.48
2552408Darren Waller2.692.83

For many players, it appears that they are indeed faster on artificial turf. This 'feeling' of increased speed may in fact be grounded in reality.

We're going to overtime!

Sometimes it's helpful to visualize time-series data in order to fully understand how a system is performing. The NFL dataset includes play-by-play data. It is possible to visualize this information so that you can see how a play unfolds.

Found an issue on this page?

Report an issue!

Keywords

Related Content