ML Feature Engineering: from Pandas to SQL (BigQuery) — Label Encoding
This article is a section of the article ML Feature Engineering: from Pandas to SQL (BigQuery).
Label Encoding
With Python, the simplest way to achieve label encoding, it is either to use Pandas category codes, or to use scikit-learn LabelEncoder class.
With BigQuery SQL, the numbering functions can be used in the same manner:
A good practice is to pre-build the categorical referential (or using an existing one) instead of applying the rank function directly on the main table: depending on the volume, it can crash, and it optimizes in some way the query (cost and efficiency).
For more feature engineering techniques from Pandas to SQL, check others sections from ML Feature Engineering: from Pandas to SQL (BigQuery).