ML Feature Engineering: from Pandas to SQL (BigQuery) — Time Resampling

This article is a section of the article ML Feature Engineering: from Pandas to SQL (BigQuery).

Time Resampling

When working with time series, resampling time on a regular step is common: it allows data to be aligned for shifting rows usage, doing consistent aggregation operations, graphics, and so on. With Pandas, it is almost straight forward with the resample method:

Time Resampling with Pandas
Time Resampling with Pandas

The code above realigns data each 30 minutes, using a mean aggregation. For instance, for the apple product, between 8:00 and 8:30, there were two sales rows, with respectively values 10 and 5. As these values stand in the same time interval, the average result is 7.5 .

Using BigQuery SQL to achieve that is not so easy (not specific to BQ):

We need first to create a time step referential (ts_steps_ref) which contains each time step (in that case each 30 mins) and each category product.

The final query joins the time step referential with the initial data on the rounded timestamp using integer division (DIVfunction), and applies the AVG function to obtain the sales mean value per time step/category product.

Time Resampling using BigQuery

Note: as I used CROSS JOIN to build the time step referential, I obtain all the possible combinations between time steps and category product: it results in an extra line compared to Pandas.

For more feature engineering techniques from Pandas to SQL, check others sections from ML Feature Engineering: from Pandas to SQL (BigQuery).

--

--

--

Lead Data Scientist @ Prisma Media. Freelance consultant in data science and AI architectures. Computer science PhD. https://www.linkedin.com/in/vlevorato/

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Guide to spam classification using nltk library, stemming, and bag of words.

Computational Intelligence

Introduction to Machine Learning

Super-resolution for satellite imagery analysis

Scrape & Summarize News Articles Using Python

Automatic Content Tagging using NLP and Machine Learning

7 Key Requirements For Successful MLOps Deployment

How Machine Learning is Overcoming Content Overwhelm in Streaming Video

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Vincent Levorato

Vincent Levorato

Lead Data Scientist @ Prisma Media. Freelance consultant in data science and AI architectures. Computer science PhD. https://www.linkedin.com/in/vlevorato/

More from Medium

ML Feature Engineering: from Pandas to SQL (BigQuery) — Rolling Windows

Overview of Big Query’s Unique feature, BQML with a regression model example

BI Engineer projects: Stokcs & Coin market portfolio analytics

RFM Segmentation with Python