Drawing From a Random Distribution in SQL

There are different methods for updating a reinforcement learning agent's policy at each iteration. A few weeks ago we started experimenting with replacing our current method with a Bayesian inference step. Some of the data workloads within our agent are written in SQL that is executed on GCP's BigQuery engine. We use this stack because it provides scalable computational capabilities, ML packages and a straightforward SQL interface.
The Bayesian inference step we looked to implement uses one of the alternative parametrizations of a beta distribution. Which means that we will need to be able to draw from a beta distribution in SQL. While working on this, I discovered that drawing from the random distribution in SQL is a topic with very few well documented examples. So I'm writing about it here..
Step 1: How hard could it be?
BigQuery doesn't have a beta distribution. It doesn't have the capability to draw from any random distribution. So my first intuition was to take the definition of the beta distribution, write it in SQL, set the parameters using a CTA, draw a random number between 0 and 1 and compute the value of that function.
But it's 2024, so I asked ChatGPT how it would do it:
Me: "How do you create random draws from a beta distribution in BigQuery?
ChatGPT:
WITH beta_draws AS (
SELECT
(alpha - 1) / (alpha + beta - 2) AS a,
(beta - 1) / (alpha + beta - 2) AS b
FROM
(SELECT 2 AS alpha, 5 AS beta) - Set your desired alpha and beta values
)
SELECT
(a * POW(RAND(), alpha - 1) * POW(1 - RAND(), beta - 1) + b) AS beta_draw
FROM
beta_draws,
UNNEST(GENERATE_ARRAY(1, 1000)) AS rand_num
Me thinking to myself: Right, so that clearly won't work.
Do you see the problem in the code? ChatGPT draws two different x values for the presumed beta distribution PDF. I fixed this, cleaned up the query a little and sampled 1,000 values. And here's the SQL code for doing that:
WITH raw_data AS (
SELECT
'user_a' AS contact_id
, 0.674 AS Probability
, 72 AS lifetime_messages_received
), parameters AS (
SELECT
contact_id
, probability*lifetime_messages_received AS alpha
, (1.0 - probability)*lifetime_messages_received AS beta
, RAND() AS x
FROM raw_data
CROSS JOIN UNNEST(GENERATE_ARRAY(1, 1000)) AS draw_id
)
SELECT
contact_id
, ARRAY_AGG(POW(x, alpha - 1.0) * POW(1.0 - x, beta - 1)) AS beta_x
FROM parameters
GROUP BY contact_id
Thank you all, that's a wrap