請。我有兩個最常見的名字和姓氏的表。每個表基本上有兩個欄位:
表
CREATE TABLE "common_first_name" (
"first_name" text PRIMARY KEY, --The text representing the name
"ratio" numeric NOT NULL, -- the % of how many times it occurs compared to the other names.
"inserted_at" timestamp WITH time zone DEFAULT timezone('utc'::text, now()) NOT NULL,
"updated_at" timestamp WITH time zone DEFAULT timezone('utc'::text, now()) NOT NULL
);
CREATE TABLE "common_last_name" (
"last_name" text PRIMARY KEY, --The text representing the name
"ratio" numeric NOT NULL, -- the % of how many times it occurs compared to the other names.
"inserted_at" timestamp WITH time zone DEFAULT timezone('utc'::text, now()) NOT NULL,
"updated_at" timestamp WITH time zone DEFAULT timezone('utc'::text, now()) NOT NULL
);
PS:TOP 1 名稱僅出現約 1.8% 的時間。每個表有 1000 行。
函式(偽,未就緒)
CREATE OR REPLACE FUNCTION create_sample_data(p_number_of_records INT)
RETURNS VOID
AS $$
DECLARE
SUM_OF_WEIGHTS CONSTANT INT := 100;
BEGIN
FOR i IN 1..coalesce(p_number_of_records, 0) LOOP
--Get the random first and last name but taking in consideration their probability (RATIO)round(random()*SUM_OF_WEIGHTS);
--create_person (random_first_name || ' ' || random_last_name);
END LOOP;
END
$$
LANGUAGE plpgsql VOLATILE;
PS:每個名稱(每表)的所有比率總和為 100%。
我想運行一個函式 N 次并獲取一個名字和一個姓氏來創建示例資料......兩個表每個都有 1000 行。
樣本大小可以是 1000 個全名到 1000000 個名字,所以如果有一種“快速”的方法來執行這個隨機加權函式,那就更好了。
關于如何在 PL/PGSQL 中執行此操作的任何建議?
我在 SUPABASE.IO 上使用 PG 13.3。
謝謝
uj5u.com熱心網友回復:
鑒于輸入資料集較小,在純 SQL 中執行此操作很簡單。使用 CTE 為每個 common_FOO_name 表中的每一行構建下限和上限列,然后使用 generate_series() 生成亂數集。將所有內容連接在一起,并使用邊界之間的隨機值作為 WHERE 子句。
with first_names_weighted as (
select first_name,
sum(ratio) over (order by first_name) - ratio as lower_bound,
sum(ratio) over (order by first_name) as upper_bound
from common_first_name
),
last_names_weighted as (
select last_name,
sum(ratio) over (order by last_name) - ratio as lower_bound,
sum(ratio) over (order by last_name) as upper_bound
from common_last_name
),
randoms as (
select random() * (select sum(ratio) from common_first_name) as f_random,
random() * (select sum(ratio) from common_last_name) as l_random
from generate_series(1, 32)
)
select r, first_name, last_name
from randoms r
cross join first_names_weighted f
cross join last_names_weighted l
where f.lower_bound <= r.f_random and r.f_random <= f.upper_bound
and l.lower_bound <= r.l_random and r.l_random <= l.upper_bound;
更改傳遞給的值generate_series()以控制要生成的名稱數量。如果它是一個函式很重要,您可以只使用 LANGAUGE SQL 函式定義來引數化該數字:
https://www.db-fiddle.com/f/mmGQRhCP2W1yfhZTm1yXu5/3
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/349168.html
標籤:PostgreSQL 算法 plpgsql 概率分布 超基地
上一篇:使用Doobie映射多對多關系
