我需要根據優先級對來自多個來源的大量資料進行分組,但這些來源的資料質量不同——它們可能會丟失一些資料。任務是以盡可能完整的方式將該資料分組到一個單獨的表中。
例如:
create table grouped_data (
id serial primary key,
type text,
a text,
b text,
c int
);
create table raw_data (
id serial primary key,
type text,
a text,
b text,
c int,
priority int
);
insert into raw_data
(type, a, b, c, priority)
values
('one', null, '', 123, 1),
('one', 'foo', '', 456, 2),
('one', 'bar', 'baz', 789, 3),
('two', null, 'two-b', 11, 3),
('two', '', '', 33, 2),
('two', null, 'two-bbb', 22, 1);
現在我需要將記錄分組type,排序priority,取第一個非空和非空值,并將其放入grouped_data. 在這種情況下,afor group 的值one將是foo因為包含該值的行比具有 的行具有更高的優先級bar。并且c應該是123,因為它具有最高的優先級。與 group 相同two,對于每一列,我們采用非空、非空且具有最高優先級的null資料,如果沒有實際資料存在,則回退到。
最后,grouped_data預計會有以下內容:
('one', 'foo', 'baz', 123),
('two', null, 'two-bbb', 22)
我嘗試過分組、子選擇、合并、交叉連接……唉,我對 PostgreSQL 的了解還不夠好,無法讓它作業。我也想避免的一件事 - 是一個一個地瀏覽列,因為在現實世界中只有幾十個列可以使用......
一個指向我一直用來弄亂這個小提琴的鏈接:http : //sqlfiddle.com/#!17/76699/1
uj5u.com熱心網友回復:
你應該試試這個:
SELECT
type,
(array_agg(a ORDER BY priority ASC) FILTER (WHERE a IS NOT NULL AND a != ''))[1] as a,
(array_agg(b ORDER BY priority ASC) FILTER (WHERE b IS NOT NULL AND b != ''))[1] as b,
(array_agg(c ORDER BY priority ASC) FILTER (WHERE c IS NOT NULL))[1] as c
FROM raw_data GROUP BY type ORDER BY type;
uj5u.com熱心網友回復:
您可以使用視窗函式first_value:
select distinct
type
, first_value(a) over (partition by type order by nullif(a,'') is null, priority) as a
, first_value(b) over (partition by type order by nullif(b,'') is null, priority) as b
, first_value(c) over (partition by type order by priority) as c
from raw_data
uj5u.com熱心網友回復:
select distinct on (type) type,
first_value(a) over (partition by type order by (nullif(a, '') is null), priority) a,
first_value(b) over (partition by type order by (nullif(b, '') is null), priority) b,
first_value(c) over (partition by type order by (c is null), priority) c
from raw_data;
uj5u.com熱心網友回復:
這也應該有效。
WITH types(type) AS (
SELECT DISTINCT
type
FROM raw_data
)
SELECT
type,
(SELECT a FROM raw_data WHERE a > '' AND raw_data.type = types.type ORDER BY priority LIMIT 1) AS a,
(SELECT b FROM raw_data WHERE b > '' AND raw_data.type = types.type ORDER BY priority LIMIT 1) AS b,
(SELECT c FROM raw_data WHERE c IS NOT NULL AND raw_data.type = types.type ORDER BY priority LIMIT 1) AS c
FROM types
ORDER BY type;
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/391469.html
標籤:sql PostgreSQL的 分组
下一篇:參考的表中不存在鍵,但似乎是的
