Skip to content

BigQuery ML for starters

Estimated time to read: 5 minutes

BigQuery ML enables you to create and execute machine learning models directly within BigQuery, using standard SQL queries. Simple, standard ML algorithms are already available in BigQuery ML, such as regression, logistic regression or k-means. This offers the data analysts an easy way to integrate ML algorithms in their everyday analytics work without the support of data scientists or data engineers.

Here are a few use cases involving user analytics, along with example SQL queries to create and use machine learning models in BigQuery. In the following examples, we assume that the business is an online shop and we want to predict:

  • churn (ML algo: logistic regression)
  • customer segmentation (ML algo: k-means)
  • customer clustering (ML algo: )
  • customer lifetime value (ML algo: linear regression)

Use Case 1: Churn Prediction

Customer churn is one of the main 'headaches' of businesses online or physical; as it is a lot harder to gain new customers than retain existing ones. Hence, most companies pay a lot of attention to their churn and the period before a customer starts losing interest in their product/service.

Predicting customer churn is a common use case for businesses looking to retain users and minimise lost revenue. Using BigQuery ML, you can create a binary logistic regression model to predict user churn based on historical user behaviour data.

Assuming you have a dataset called user_analytics with a table called user_data containing columns user_id, days_since_last_login, total_logins, average_session_duration, and churned (1 if the user has churned, 0 otherwise), you can create a model as follows:

SQL
CREATE OR REPLACE MODEL `user_analytics.churn_prediction_model`
OPTIONS(model_type='logistic_reg') AS
SELECT
  days_since_last_login,
  total_logins,
  average_session_duration,
  churned
FROM
  `user_analytics.user_data`;

After the model is trained, you can use it to predict user churn for new users:

SQL
SELECT
  user_id,
  predicted_churned_probs[OFFSET(1)].prob AS churn_probability
FROM
  ML.PREDICT(MODEL `user_analytics.churn_prediction_model`,
    (SELECT
      user_id,
      days_since_last_login,
      total_logins,
      average_session_duration
    FROM
      `user_analytics.new_user_data`));

Use Case 2: Customer Segmentation

When moving away from traditional marketing and advertising, i.e. away from addressing all customers that live in the same area (geographical) or have the same age (demographics) with the same message, but instead recognise that people are more multifaceted. In the latter case, the analyst will combine data from several different sources to create a behavioural profile for the customers and segment them based on a combination of attributes.

Segmentation algorithms - k means - random forest - naive Bayes - etc.

Using BigQuery ML, it becomes easy to apply the k-means clustering algorithm to group users based on similar characteristics and perform a segmentation analysis.

Assuming you have a table called user_behavior containing columns user_id, total_page_views, average_session_duration, and total_purchases, you can create a k-means clustering model as follows:

SQL
CREATE OR REPLACE MODEL `user_analytics.user_segmentation_model`
OPTIONS(model_type='kmeans',
        num_clusters=4) AS
SELECT
  total_page_views,
  average_session_duration,
  total_purchases
FROM
  `user_analytics.user_behavior`;

Once the model is trained, you can use it to assign new users to clusters based on their behavior:

SQL
SELECT
  user_id,
  nearest_centroids_distance,
  centroid_id AS cluster_id
FROM
  ML.PREDICT(MODEL `user_analytics.user_segmentation_model`,
    (SELECT
      user_id,
      total_page_views,
      average_session_duration,
      total_purchases
    FROM
      `user_analytics.new_user_behavior`));

Use Case 3: Customer Clustering

When trying to discover new ways to group your customers in your current database, e.g. when launching a new product and want to identify which customers would be interested in it, then running a clustering algorithm is the best way. Clustering is very similar to segmentation, only that clustering allows us to split our users in new ways that we hadn't thought of, while in segmentation, the groups are defined by humans based on their knowledge of the market and the business.

Clustering algorithms = unsupervised learning

  • hierarchical
  • k means
  • DBSCAN
  • Gaussian mixture
  • BIRCH
  • Mean shift
  • etc.

Clustering and segmentation are often used interchangeably, however there is value in separating the two. Especially, when we are trying to reposition our business, launching new products or new services. In that case we want to 'see' our customer base in a different perspective and instead of spending thousands or millions to acquire new users, leverage our existing base.

Use Case 4: Lifetime Value Prediction

Finally, a business should know the customer Lifetime Value (LTV), so that they can optimise marketing and advertising spend, as well as business strategy. As mentioned in the churn section, acquiring a new customer is a lot more costlier than retaining or re-enganging an existing one. That can be calculated with the LTV. Knowing the LTV, we can estimate how much is worth spending in, for example, a new campaign for acquiring new customers instead of in a campaign to re-engage.

Using BigQuery ML, you can create a linear regression model to predict LTV based on user behaviour data.

Assuming you have a table called user_lifetime_data containing columns user_id, days_since_signup, total_sessions, total_revenue, and lifetime_value, you can create a linear regression model as follows:

SQL
CREATE OR REPLACE MODEL `user_analytics.lifetime_value_model`
OPTIONS(model_type='linear_reg') AS
SELECT
  days_since_signup,
  total_sessions,
  total_revenue,
  lifetime_value
FROM
  `user_analytics.user_lifetime_data`;

After training the model, you can use it to predict the lifetime value of new users:

SQL
SELECT
  user_id,
  predicted_lifetime_value
FROM
  ML.PREDICT(MODEL `user_analytics.lifetime_value_model`,
    (SELECT
      user_id,
      days_since_signup,
      total_sessions,
      total_revenue
    FROM
      `user_analytics.new_user_lifetime_data`));

By predicting customer lifetime value using BigQuery ML, you can make more informed decisions about where to allocate resources, such as marketing and customer support. Identifying high-value customers can help you tailor your offerings to their needs, ultimately leading to increased customer satisfaction, loyalty, and revenue.

These are just a few examples of how you can leverage BigQuery ML to incorporate machine learning models into your user analytics workflow. By using BigQuery ML, you can seamlessly integrate machine learning models with your data analysis pipeline, enabling you to derive deeper insights and make data-driven decisions.