Skip to content

BigQuery Cheat Sheet

Estimated time to read: 8 minutes

Functions

String Functions

Function Name Description
CONCAT Concatenates multiple strings into a single string.
LENGTH Returns the length of a string.
LOWER Converts a string to lowercase.
UPPER Converts a string to uppercase.
TRIM Removes leading and trailing spaces from a string.
REPLACE Replaces occurrences of a substring within a string.
SUBSTR Extracts a substring from a string.
REGEXP_REPLACE Replaces parts of a string that match a regular expression.
LPAD() Pads a string with a specified character to a given length on the left side.
RPAD() Pads a string with a specified character to a given length on the right side.
SPLIT() Splits a string into an array based on a specified delimiter.
REGEXP_CONTAINS() Tests whether a string contains a substring matching a regular expression.
REGEXP_EXTRACT() Extracts the first matching substring from a string based on a regular expression.
REPEAT() Repeats a string a specified number of times.
REVERSE() Reverses the characters in a string.

Date and Time Functions

Function Name Description
CURRENT_DATE Returns the current date.
CURRENT_TIMESTAMP Returns the current timestamp.
DATE_ADD Adds an interval to a date.
DATE_DIFF Computes the difference between two dates.
DATE_TRUNC Truncates a date to a specified granularity.
EXTRACT Extracts a specific date part from a date, timestamp, or datetime.
FORMAT_DATETIME Formats a datetime value as a string.
PARSE_DATE Parses a string into a date using a specified format.
DATETIME_ADD() Adds an interval to a datetime.
DATETIME_DIFF() Computes the difference between two datetimes.
DATETIME_TRUNC() Truncates a datetime to a specified granularity.
FORMAT_TIMESTAMP() Formats a timestamp value as a string.
PARSE_TIMESTAMP() Parses a string into a timestamp using a specified format.
TIMESTAMP_ADD() Adds an interval to a timestamp.
TIMESTAMP_DIFF() Computes the difference between two timestamps.
TIMESTAMP_TRUNC() Truncates a timestamp to a specified granularity.

Numeric Functions

Function Name Description
ABS Returns the absolute value of a number.
CEIL Returns the smallest integer greater than or equal to a given number.
FLOOR Returns the largest integer less than or equal to a given number.
MOD Returns the remainder of one number divided by another.
ROUND Rounds a number to the nearest integer or a specified number of decimal places.

Aggregate Functions

Function Name Description
AVG Calculates the average value of a set of numbers.
COUNT Counts the number of non-null values in a column.
MAX Returns the maximum value in a column.
MIN Returns the minimum value in a column.
SUM Calculates the sum of all non-null values in a column.

Window Functions

Function Name Description
ROW_NUMBER Assigns a unique number to each row within a result set.
RANK Assigns a unique rank to each row within a result set, with the same rank assigned to rows with equal values.
DENSE_RANK Assigns a unique rank to each row within a result set, without gaps in the ranking sequence for equal values.
NTILE Divides a result set into a specified number of roughly equal groups and assigns a unique group number to each row. It doesn't provide the percentile value!

Mathematical Functions

Function Name Description
RAND() Returns a random floating-point value between 0 and 1.
SQRT() Returns the square root of a number.
LOG() Computes the natural logarithm of a number.
LOG10() Computes the base-10 logarithm of a number.
POWER() Raises a number to the power of another number.
SIN(), COS(), TAN() Trigonometric functions.
ASIN(), ACOS(), ATAN() Inverse trigonometric functions.
ATAN2() Returns the angle between the X-axis and the line segment connecting the origin and the point (x, y).

Geospatial Functions

Function Name Description
ST_DISTANCE() Calculates the distance between two geospatial points.
ST_CONTAINS() Tests whether a geometry contains another geometry.
ST_INTERSECTS() Tests whether two geometries intersect.
ST_AREA() Computes the area of a polygon.
ST_LENGTH() Computes the length of a linestring.
ST_CENTROID() Computes the centroid of a geometry.
ST_BUFFER() Creates a buffer around a geometry.
ST_MAKELINE() Creates a linestring from a sequence of points.
ST_MAKEPOLYGON() Creates a polygon from a sequence of points.
ST_MAKEPOINT() Creates a point from X and Y coordinates.
ST_UNION() Computes the union of multiple geometries.

JSON Functions

Function Name Description
JSON_EXTRACT() Extracts data from a JSON string.
JSON_QUERY() Queries JSON data using JSONPath expressions.
JSON_EXTRACT_SCALAR() Extracts scalar data from a JSON string.
JSON_ARRAY_LENGTH() Returns the length of a JSON array.

Array Functions

Function Name Description
ARRAY_LENGTH() Returns the number of elements in an array.
ARRAY_CONCAT() Concatenates two or more arrays.
ARRAY_REVERSE() Reverses the order of elements in an array.
ARRAY_TO_STRING() Converts an array to a string with a specified delimiter.

Other Functions

Function Name Description
COALESCE() Returns the first non-null value in a list of expressions.
NULLIF() Returns NULL if the two given expressions are equal. Otherwise returns the first expression.
GREATEST() Returns the greatest value from a list of expressions.
LEAST() Returns the smallest value from a list of expressions.
CAST() Converts a value from one data type to another.
SAFE_CAST() Safely converts a value from one data type to another, returning NULL if the conversion fails.
IF() Evaluates a conditional expression and returns the value of one of the two specified expressions, depending on whether the condition is true or false.
CASE() Evaluates a list of conditions and returns the first true result or a default value if no conditions are met.
ARRAY_AGG() Creates an array from a set of input values.
STRING_AGG() Concatenates non-null input values into a single string, separated by a specified delimiter.
ARRAY(SELECT ...) Creates an array from the results of a subquery.
STRUCT() Creates a structure with named fields from a list of expressions.
GENERATE_ARRAY() Generates an array of integers within a specified range.

Bit Functions

Function Name Description
BIT_COUNT() Counts the number of bits set to 1 in an integer.
BIT_XOR() Computes the bitwise exclusive OR (XOR) of an integer column.
BIT_AND() Computes the bitwise AND of an integer column.
BIT_OR() Computes the bitwise OR of an integer column.

BigQuery ML supports the following machine learning models

Find below the merged table with extra columns for use cases aligned:

Model Name Description Use Cases
Linear Regression (BQML.LINEAR_REG) A regression model used to predict the value of a variable based on the value of another variable. Predicting the price of a house in relation to the square meters, the number of sales in relation to the location, or the number of clicks on an ad in relation to the time of day.
Logistic Regression (BQML.LOGISTIC_REG) A classification model used to predict the probability of an event happening based on a set of known, independent variables. Predicting whether a customer will churn, whether an email is spam, or whether a tumour is malignant based on specific predefined characteristics.
K-means Clustering (BQML.KMEANS) An unsupervised learning model used to group n observations in K clusters in which each observation belongs to the cluster with the nearest mean value. Grouping customers into segments based on their purchase history, website activity, or demographics.
Matrix Factorization (BQML.MATRIX_FACTORIZATION) A recommendation model that finds relations between items' and users' entities. Recommending products to customers, movies to viewers, or songs to listeners.
Time Series (ARIMA, ARIMA_PLUS, and BQML.AUTO_ARIMA) ARIMA stands for autoregressive integrated moving average. Models for time series forecasting or describing past events in a time period. Predicting future sales, website traffic, or customer churn.
Deep Neural Networks (BQML.DNN) A deep learning model consisting of multiple layers of interconnected neurons. Classifying images, translating languages, or generating text.
TensorFlow Model Import (BQML.TF_IMPORT) Enables you to deploy pre-trained TensorFlow models in BigQuery. Deploying pre-trained TensorFlow models in BigQuery.
Boosted Tree Models (BQML.BOOSTED_TREE and BQML.BOOSTED_TREE_CLASSIFIER) Ensemble learning models built from decision trees to improve the accuracy of classification and regression models. Improving the accuracy of classification and regression models.
XGBoost (BQML.XGB_REG and BQML.XGB_CLASSIFIER) Extreme Gradient Boosting model. The algorithm is trained on smaller decision trees, and the final outcome is the combination of these results. Regression and classification tasks, known for their high performance and flexibility.
AutoML Tables (BQML.AUTOML_TABLES) A fully managed machine learning service that automatically builds and tunes machine learning models for tabular data. Automatically building and tuning machine learning models for tabular data.
Autoencoder (BQML.AUTOENCODER) A type of neural network that can be used to reduce the dimensionality of data, denoise images, or detect fraud. Reducing the dimensionality of data, denoising images, or detecting fraud.

ML model comparison

Model Name Type Problem Type Complexity Computational Cost Ease of Use
Linear Regression Supervised Learning Regression Simple Can be expensive for large datasets Easy to understand and implement
Logistic Regression Supervised Learning Categorical Classification Simple Can be expensive for large datasets Easy to understand and implement
K-means Clustering Unsupervised Learning Clustering Simple Can be computationally expensive for large datasets Easy to understand and implement
Matrix Factorization Unsupervised Learning Recommendation Complex Can be computationally expensive for large datasets More complex to use, but can save time and effort
Time Series Supervised Learning Forecasting Complex Can be computationally expensive for large datasets More complex to use, but can save time and effort
Deep Neural Networks Supervised Learning Regression, Classification, Image Recognition, Natural Language Processing, etc. Complex Can be computationally expensive for large datasets More complex to use, but can achieve state-of-the-art results
TensorFlow Model Import Supervised Learning Regression, Classification, Image Recognition, Natural Language Processing, etc. Complex Can be computationally expensive for large datasets More complex to use, but can save time and effort
Boosted Tree Models Supervised Learning Regression, Classification Complex Can be computationally expensive for large datasets More complex to use, but can achieve state-of-the-art results
XGBoost Supervised Learning Regression, Classification Complex Can be computationally expensive for large datasets More complex to use, but can achieve state-of-the-art results
AutoML Tables Supervised Learning Regression, Classification, Clustering, Recommendation, Forecasting, etc. Complex Can be computationally expensive for large datasets More complex to use, but can save time and effort
Autoencoder Unsupervised Learning Dimensionality Reduction, Image Denoising, Fraud Detection, etc. Complex Can be computationally expensive for large datasets More complex to use, but can save time and effort