如何在雪花中使用正則運算式提取URL的子字串-有解無憂

我需要一些關于 RegEx 的幫助。我有一個網址：

article/life/food/woman-who-called-911-over-pink-bbq-tells-her-side-of-the-story/275-88a8538f-4c62-4204-aed8-4381d618208d

此 URL 不是固定的，并且每次都會更改。我只需要將其提取woman-who-called-911-over-pink-bbq-tells-her-side-of-the-story并存盤在表中。

這是我的嘗試：

select parse_url(c1:uri):path as PATH, 
       count(1) as CNT 
from "OPE_RAW_DATA"."TAGGER_DEFAULT"."INTERACTION_20221107" 
where c1:uri like'%www.kens5.com%'
group by PATH
order by CNT desc

uj5u.com熱心網友回復：

一種方法是使用 JavaScript UDF 將字串拆分為一個陣列并找到最長的一個：

create or replace function PARSE_LONGEST_TEXT_PATH(URL string)
returns string
language javascript
strict immutable
as
$$
    let a = URL.split('/');
    let longest = a.reduce(
        function (a, b) {
            return a.length > b.length ? a : b;
        });
    return longest;
$$;

select parse_longest_text_path
    ('article/life/food/woman-who-called-911-over-pink-bbq-tells-her-side-of-the-story/275-88a8538f-4c62-4204-aed8-4381d618208d');

如果最長的字串是十六進制值，這不會處理這種情況，但是如果您需要，我可以修改代碼。

uj5u.com熱心網友回復：

如果您想獲得第 3 項和第 4 項之間的最長字串，可以使用以下內容：

IFF( length(split(c1,'/')[2]) > length(split(c1,'/')[3]), split(c1,'/')[2], split(c1,'/')[3] )

例如：

select IFF( length(split(c1,'/')[2]) > length(split(c1,'/')[3]), split(c1,'/')[2], split(c1,'/')[3] )
from values 
('article/life/food/woman-who-called-911-over-pink-bbq-tells-her-side-of-the-story/275-88a8538f-4c62-4204-aed8-4381d618208d'),
('article/life/woman-who-called-911-over-pink-bbq-tells-her-side-of-the-story/275-88a8538f-4c62-4204-aed8-4381d618208d')
tmp(c1);

uj5u.com熱心網友回復：

我建議展平 url，以便您可以更好地控制決議主題的標準。

set str='article/life/food/woman-who-called-911-over-pink-bbq-tells-her-side-of-the-story/275-88a8538f-4c62-4204-aed8-4381d618208d';

select distinct 
       url,
       first_value(b.value) over (partition by url order by length(translate(b.value,'0123456789-','')) desc) as topic --pick the one that's longest after removing numbers and hyphens
from (select $str as url) a, 
     lateral split_to_table(url,'/') b
where b.index between 3 and 4 --change as needed

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/529267.html

標籤：sql雪花云数据平台

上一篇：SQLRedshift-從表中符合特定條件的相似行中選擇行

下一篇：OracleSQL-將列的資料默認值更改為NULL-NULL和(null)之間有什么區別嗎？