我需要一些關于 RegEx 的幫助。我有一個網址:
article/life/food/woman-who-called-911-over-pink-bbq-tells-her-side-of-the-story/275-88a8538f-4c62-4204-aed8-4381d618208d
此 URL 不是固定的,并且每次都會更改。我只需要將其提取woman-who-called-911-over-pink-bbq-tells-her-side-of-the-story并存盤在表中。
這是我的嘗試:
select parse_url(c1:uri):path as PATH,
count(1) as CNT
from "OPE_RAW_DATA"."TAGGER_DEFAULT"."INTERACTION_20221107"
where c1:uri like'%www.kens5.com%'
group by PATH
order by CNT desc
uj5u.com熱心網友回復:
一種方法是使用 JavaScript UDF 將字串拆分為一個陣列并找到最長的一個:
create or replace function PARSE_LONGEST_TEXT_PATH(URL string)
returns string
language javascript
strict immutable
as
$$
let a = URL.split('/');
let longest = a.reduce(
function (a, b) {
return a.length > b.length ? a : b;
});
return longest;
$$;
select parse_longest_text_path
('article/life/food/woman-who-called-911-over-pink-bbq-tells-her-side-of-the-story/275-88a8538f-4c62-4204-aed8-4381d618208d');
如果最長的字串是十六進制值,這不會處理這種情況,但是如果您需要,我可以修改代碼。
uj5u.com熱心網友回復:
如果您想獲得第 3 項和第 4 項之間的最長字串,可以使用以下內容:
IFF( length(split(c1,'/')[2]) > length(split(c1,'/')[3]), split(c1,'/')[2], split(c1,'/')[3] )
例如:
select IFF( length(split(c1,'/')[2]) > length(split(c1,'/')[3]), split(c1,'/')[2], split(c1,'/')[3] )
from values
('article/life/food/woman-who-called-911-over-pink-bbq-tells-her-side-of-the-story/275-88a8538f-4c62-4204-aed8-4381d618208d'),
('article/life/woman-who-called-911-over-pink-bbq-tells-her-side-of-the-story/275-88a8538f-4c62-4204-aed8-4381d618208d')
tmp(c1);
uj5u.com熱心網友回復:
我建議展平 url,以便您可以更好地控制決議主題的標準。
set str='article/life/food/woman-who-called-911-over-pink-bbq-tells-her-side-of-the-story/275-88a8538f-4c62-4204-aed8-4381d618208d';
select distinct
url,
first_value(b.value) over (partition by url order by length(translate(b.value,'0123456789-','')) desc) as topic --pick the one that's longest after removing numbers and hyphens
from (select $str as url) a,
lateral split_to_table(url,'/') b
where b.index between 3 and 4 --change as needed
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/529267.html
標籤:sql雪花云数据平台
