ML Feature Engineering: from Pandas to SQL (BigQuery) — Label Encoding

This article is a section of the article ML Feature Engineering: from Pandas to SQL (BigQuery).

Label Encoding

With Python, the simplest way to achieve label encoding, it is either to use Pandas category codes, or to use scikit-learn LabelEncoder class.

Category numerical encoding using Pandas

With BigQuery SQL, the numbering functions can be used in the same manner:

Category numerical encoding using BigQuery

A good practice is to pre-build the categorical referential (or using an existing one) instead of applying the rank function directly on the main table: depending on the volume, it can crash, and it optimizes in some way the query (cost and efficiency).

For more feature engineering techniques from Pandas to SQL, check others sections from ML Feature Engineering: from Pandas to SQL (BigQuery).

--

--

Vincent Levorato

Lead Data Scientist @ Prisma Media. Freelance consultant in data science and AI architectures. Computer science PhD. https://www.linkedin.com/in/vlevorato/