我有一個 TSQL (MSSQL) 表,其中包含以下格式的記錄
| ID | 第 1 列 | 第 2 列 |
|---|---|---|
| 1 | a/b/c | 蘋果/香蕉/黃瓜 |
我想以以下格式拆分記錄
| ID | 第 1 列 | 第 2 列 |
|---|---|---|
| 1 | 一個 | 蘋果 |
| 1 | b | 香蕉 |
| 1 | C | 黃瓜 |
Column1 和 Column2 使用“/”分隔符保持關系,并以相同的順序相互關聯。
我試圖在 CHARINDEX & SUBSTRING 的幫助下拆分列,但我無法維持兩列之間的關系。
uj5u.com熱心網友回復:
您可以添加一個函式來拆分字串。
然后交叉應用于Column1和Column2的分割部分。
create table test ( Id int identity primary key, Column1 varchar(30), Column2 varchar(30) ); insert into test (Column1, Column2) values ('a/b/c', 'apple/banana/cucumber'), ('d/e/f', 'orange/prune/onion');
(從這里復制的UDF )
CREATE FUNCTION dbo.fnString_Split ( @str nvarchar(4000), @delim nchar(1) ) RETURNS TABLE WITH SCHEMABINDING AS RETURN ( WITH RCTE AS ( SELECT 1 AS ordinal , ISNULL(NULLIF(CHARINDEX(@delim, @str),0), LEN(@str)) AS pos , LEFT(@str, ISNULL(NULLIF(CHARINDEX(@delim, @str),0)-1, LEN(@str))) AS value UNION ALL SELECT ordinal 1 , ISNULL(NULLIF(CHARINDEX(@delim, @str, pos 1), 0), LEN(@str)) , SUBSTRING(@str, pos 1, ISNULL(NULLIF(CHARINDEX(@delim, @str, pos 1),0)-pos-1, LEN(@str)-pos )) FROM RCTE WHERE pos < LEN(@str) ) SELECT ordinal, value FROM RCTE );
select t.Id , ca.Column1 , ca.Column2 from test t cross apply ( select s1.ordinal , s1.value as Column1 , s2.value as Column2 from dbo.fnString_Split(t.Column1,'/') as s1 join dbo.fnString_Split(t.Column2,'/') as s2 on s1.ordinal = s2.ordinal ) ca;
| ID | 第 1 列 | 第 2 列 |
|---|---|---|
| 1 | 一個 | 蘋果 |
| 1 | b | 香蕉 |
| 1 | C | 黃瓜 |
| 2 | d | 橘子 |
| 2 | e | 修剪 |
| 2 | F | 洋蔥 |
關于db<>fiddle的演示在這里
uj5u.com熱心網友回復:
請嘗試以下解決方案。
它基于 JSON,將從 SQL Server 2016 開始作業。
SQL
-- DDL and sample data population, start
DECLARE @tbl TABLE (ID INT, ColB varchar(8000), ColC varchar(8000));
INSERT INTO @tbl VALUES
(1,'a/b/c','apple/banana/cucumber');
-- DDL and sample data population, end
DECLARE @separator CHAR(1) = '/';
WITH rs AS
(
SELECT *
, ar1 = '["' REPLACE(ColB, @separator, '","') '"]'
, ar2 = '["' REPLACE(ColC, @separator, '","') '"]'
FROM @tbl
)
SELECT ID, ColB.[value] AS [ColB], ColC.[value] AS ColC
FROM rs
CROSS APPLY OPENJSON (ar1, N'$') AS ColB
CROSS APPLY OPENJSON (ar2, N'$') AS ColC
WHERE ColB.[key] = ColC.[key];
輸出
---- ------ ----------
| ID | ColB | ColC |
---- ------ ----------
| 1 | a | apple |
| 1 | b | banana |
| 1 | c | cucumber |
---- ------ ----------
uj5u.com熱心網友回復:
- 首先創建下面的函式來拆分字串。
- 然后,執行函式代碼后面的代碼。
-- Function Code
CREATE FUNCTION [dbo].[udf_SplitList]
(
@InputString varchar(MAX)
, @Separator varchar(1)
)
RETURNS @ValuesList TABLE ( ID int IDENTITY(1,1), Value varchar(MAX))
AS
BEGIN
DECLARE @ListValue NVARCHAR(max)
SET @InputString = @InputString @Separator
WHILE (LEN(@InputString) > 0)
BEGIN
SELECT @ListValue = SUBSTRING(@InputString , 1, CHARINDEX(@Separator, @InputString) - 1)
INSERT INTO @ValuesList
SELECT LTRIM(@ListValue)
SELECT @InputString = SUBSTRING(@InputString, CHARINDEX(@Separator, @InputString) 1 , LEN(@InputString) - CHARINDEX(@Separator, @InputString))
END
RETURN
END
-- Execution Code
DECLARE @YourTable TABLE (ID int, CodeList varchar(MAX), ValueList varchar(MAX));
INSERT INTO @YourTable VALUES ( 1, 'a/b/c', 'apple/banana/cucumber');
SELECT X.*
FROM @YourTable Y
CROSS APPLY
(
SELECT
Code = C.Value
, Value = V.Value
FROM dbo.udf_SplitList(Y.CodeList , '/') C
JOIN dbo.udf_SplitList(Y.ValueList, '/') V ON V.ID = C.ID
) X
;
uj5u.com熱心網友回復:
不使用函式,你可以使用一個計數/數字表來做同樣的事情,如下所示
在此處查看作業演示
; with nums as
(
select 1 as num
union all
select num 1 as num
from
nums
where num <80
)
select
X.id,
substring(X.column1,X.b,X.e-X.b),
substring(Y.column2,Y.b,Y.e-Y.b)
from
(
select
t.*,
e=N.Num,
r=row_number() over(partition by Id order by N.num),
b=isnull(lag(N.num) over (partition by Id order by N.num),0) 1
from t
left join nums N
on charindex('/',column1 '/',N.num)=N.num
)X
left join
(
select
t.*,
e= N.Num,
r=row_number() over(partition by Id order by N.num),
b=isnull(lag(N.num) over (partition by Id order by N.num),0) 1
from t
left join nums N
on charindex('/', column2 '/',N.num)=N.num
)Y
on X.id=Y.id and X.r=Y.r
uj5u.com熱心網友回復:
我知道已經有一些答案,其中一個已被接受,但有一些重要的性能因素需要考慮。如果值的數量總是三個然后(或者總是很低,比如小于 5 或?? 6),那么 Cascading APPLY 技術將是迄今為止最快的。此解決方案假定始終有 3 個專案。可以輕松修改它以處理可變但數量很少的專案。
級聯應用解決方案:
DECLARE @table TABLE
(
SomeId INT IDENTITY,
S1 VARCHAR(1000),
s2 VARCHAR(1000)
);
INSERT @table VALUES ('a/b/c','apple/banana/cucumber'),
('d/d2/f','dog/donkey/fish'),('x/y/z','x-ray/yo-yo/zeta');
SELECT
SomeId = f.SomeId,
Col1 = f2.C1,
Col2 = f2.C2
FROM
(
SELECT
t.SomeId,
SUBSTRING(t.S1, 1, c1.P-1),
SUBSTRING(t.S1, c1.P 1, c2.P - c1.P-1),
SUBSTRING(t.S1, c2.P 1, 8000),
SUBSTRING(t.S2, 1, c1.P2 - 1),
SUBSTRING(t.S2, c1.P2 1, c2.P2 - c1.P2-1),
SUBSTRING(t.S2, c2.P2 1, 8000)
FROM @table AS t
CROSS APPLY (VALUES(CHARINDEX('/',t.S1),CHARINDEX('/',t.S2))) AS c1(P,P2)
CROSS APPLY (VALUES(CHARINDEX('/',t.S1,c1.P 1),CHARINDEX('/',t.S2,c1.P2 1))) AS c2(P,P2)
CROSS APPLY (VALUES(CHARINDEX('/',t.S1,c2.P 1),CHARINDEX('/',t.S2,c2.P2 1))) AS c3(P,P2)
) AS f(SomeId,c1_1,c1_2,c1_3,c2_1,c2_2,c2_3)
CROSS APPLY (VALUES (c1_1, c2_1), (c1_2, c2_2),(c1_3,c2_3)) AS f2(c1,c2);
如果您使用的是 SQL Azure,您可以使用帶有序號選項的STRING_SPLIT 。
DECLARE @table TABLE
(
SomeId INT IDENTITY,
S1 VARCHAR(1000),
s2 VARCHAR(1000)
);
INSERT @table VALUES ('a/b/c','apple/banana/cucumber'),
('d/d2/f','dog/donkey/fish'),('x/y/z','x-ray/yo-yo/zeta');
SELECT TOP(1) WITH TIES
t.SomeId, t.S1, t.S2, Col1 = split1.[value], Col2 = split2.[value]
FROM @table AS t
CROSS APPLY STRING_SPLIT(t.S1 ,'/') AS split1
CROSS APPLY STRING_SPLIT(t.S2 ,'/') AS split2
ORDER BY ABS(
ROW_NUMBER() OVER (PARTITION BY t.SomeId, split1.value ORDER BY split1.value)-
ROW_NUMBER() OVER (PARTITION BY t.SomeId, split2.value ORDER BY split1.value));
^^^ 這僅適用于按字母順序排列的專案(不實用)但是,在 Azure 上,您可以將 ORDER BY 更改為:
ORDER BY ABS(
ROW_NUMBER() OVER (PARTITION BY t.SomeId, split1.value ORDER BY split1.ordianal) -
ROW_NUMBER() OVER (PARTITION BY t.SomeId, split2.value ORDER BY split1.ordianal));
比較發布的所有技術
現在讓我們比較迄今為止發布的所有解決方案,以了解顯著的性能差異。我構建了一個基本的測驗工具并運行它,首先是 100K 行,然后是一百萬行。解決方案包括:
- Andy3B 的 dbo.udf_SplitList
- LukStorms 的 dbo.fnString_Split
- 遞回 CTE - DhruvJoshi 的數字/計數表解決方案
- DhruvJoshi 解決方案的改進版本,具有更快的計數表
- Yitzhak Khabinsky 的 JSON 解決方案
- 我的級聯應用解決方案
The dbo.udf_SplitList solution is the slowest at 246 seconds for a million rows. Scalar functions are dreadfully slow but the recursion is making things much worse here.
The dbo.fnString_Split solution does much better at 85 seconds. To do better we need to lose the scalar udf's.
DhruvJoshi's recursive CTE solution gets us down to 65 seconds, a 50% improvement. I improved that his solution by rewriting the numbers table to not use recursion; this gets us another 50% speed increase, down to 45 seconds.
Yitzhak Khabinsky's is the first set-based solution; he is leveraging JSON. Here we have a 200% performance boost, down to 16 seconds. The Cascading APPLY solution improves on Yitzhak's solution by another 400% ; down to under three seconds.
注意標量 UDF、計數和回圈的遞回。基于集合的始終規則。
IF OBJECT_ID('tempdb..#table') IS NOT NULL DROP TABLE #table;
GO
SELECT TOP(100000)
SomeId = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
S1 = 'A/B/C',
S2 = REPLACE(LEFT(NEWID(),18),'-','/')
INTO #table
FROM sys.all_columns, sys.all_columns a
GO
PRINT CHAR(10) 'dbo.udf_SplitList -Andy3B' CHAR(10) REPLICATE('-',90);
GO
DECLARE @st DATETIME = GETDATE(), @ID INT, @C1 VARCHAR(1000), @C2 VARCHAR(1000);
SELECT @ID = t.SomeId, @C1 = x.Code, @C2 = x.Value
FROM #table AS t
CROSS APPLY
(
SELECT Code = C.Value,
Value = V.Value
FROM dbo.udf_SplitList(t.S1, '/') AS C
JOIN dbo.udf_SplitList(t.S2, '/') AS V ON V.ID = C.ID
) AS X;
PRINT DATEDIFF(MS,@st,GETDATE())
GO 3
PRINT CHAR(10) 'dbo.fnString_Split - LukStorms' CHAR(10) REPLICATE('-',90);
GO
DECLARE @st DATETIME = GETDATE(), @ID INT, @C1 VARCHAR(1000), @C2 VARCHAR(1000);
select @ID = t.SomeId, @C1 = ca.Column1, @C2 = ca.Column2
from #table AS t
cross apply (
select
s1.ordinal
, s1.value as Column1
, s2.value as Column2
from dbo.fnString_Split(t.S1,'/') as s1
join dbo.fnString_Split(t.S2,'/') as s2
on s1.ordinal = s2.ordinal) AS ca;
PRINT DATEDIFF(MS,@st,GETDATE())
GO 3
PRINT CHAR(10) 'Recursive CTE - TALLY TABLE DhruvJoshi' CHAR(10) REPLICATE('-',90);
GO
DECLARE @st DATETIME = GETDATE(), @ID INT, @C1 VARCHAR(1000), @C2 VARCHAR(1000);
;with nums as
(
select 1 as num union all
select num 1 as num
from nums
where num <80
)
select
@ID = X.SomeId,
@C1 = substring(X.S1,X.b,X.e-X.b),
@C2 = substring(Y.S2,Y.b,Y.e-Y.b)
from
(
select
t.*,
e=N.Num,
r=row_number() over (partition by t.SomeId order by N.num),
b=isnull(lag(N.num) over (partition by t.SomeId order by N.num),0) 1
from #table AS t
left join nums AS N
on charindex('/',t.S1 '/',N.num)=N.num
) AS X
left join
(
select
t.*,
e= N.Num,
r=row_number() over (partition by t.SomeId order by N.num),
b=isnull(lag(N.num) over (partition by t.SomeId order by N.num),0) 1
from #table AS t
left join nums AS N
on charindex('/', t.S2 '/',N.num)=N.num
) AS Y
ON X.SomeId = Y.SomeId and X.r=Y.r;
PRINT DATEDIFF(MS,@st,GETDATE())
GO 3
PRINT CHAR(10) 'TALLY TABLE without Recursion DhruvJoshi' CHAR(10) REPLICATE('-',90);
GO
DECLARE @st DATETIME = GETDATE(), @ID INT, @C1 VARCHAR(1000), @C2 VARCHAR(1000);
;with nums as
(
SELECT num = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS e1(x),
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS e2(x),
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS e3(x),
(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS e4(x)
)
select
@ID = X.SomeId,
@C1 = substring(X.S1,X.b,X.e-X.b),
@C2 = substring(Y.S2,Y.b,Y.e-Y.b)
from
(
select
t.*,
e=N.Num,
r=row_number() over (partition by t.SomeId order by N.num),
b=isnull(lag(N.num) over (partition by t.SomeId order by N.num),0) 1
from #table AS t
left join nums AS N
on charindex('/',t.S1 '/',N.num)=N.num
WHERE n.num <= LEN(t.S1)
) AS X
left join
(
select
t.*,
e= N.Num,
r=row_number() over (partition by t.SomeId order by N.num),
b=isnull(lag(N.num) over (partition by t.SomeId order by N.num),0) 1
from #table AS t
left join nums AS N
on charindex('/', t.S2 '/',N.num)=N.num
WHERE n.num <= LEN(t.S2)
) AS Y
ON X.SomeId = Y.SomeId and X.r=Y.r;
PRINT DATEDIFF(MS,@st,GETDATE())
GO 3
PRINT CHAR(10) 'JSON - Yitzhak Khabinsky' CHAR(10) REPLICATE('-',90);
GO
DECLARE @st DATETIME = GETDATE(), @ID INT, @C1 VARCHAR(1000), @C2 VARCHAR(1000);
DECLARE @separator CHAR(1) = '/';
WITH rs AS
(
SELECT *
, ar1 = '["' REPLACE(t.S1, @separator, '","') '"]'
, ar2 = '["' REPLACE(t.S2, @separator, '","') '"]'
FROM #table AS t
)
SELECT @id = SomeID, @C1 = ColB.[value], @C2 = ColC.[value]
FROM rs AS rs
CROSS APPLY OPENJSON (ar1, N'$') AS ColB
CROSS APPLY OPENJSON (ar2, N'$') AS ColC
WHERE ColB.[key] = ColC.[key];
PRINT DATEDIFF(MS,@st,GETDATE())
GO 3
PRINT CHAR(10) 'CROSS APPLY TECHNIQUE' CHAR(10) REPLICATE('-',90);
GO
DECLARE @st DATETIME = GETDATE(), @ID INT, @C1 VARCHAR(1000), @C2 VARCHAR(1000);
SELECT
@ID = f.SomeId,
@C1 = f2.C1,
@C2 = f2.C2
FROM
(
SELECT
t.SomeId,
SUBSTRING(t.S1, 1, c1.P-1),
SUBSTRING(t.S1, c1.P 1, c2.P - c1.P-1),
SUBSTRING(t.S1, c2.P 1, 8000),
SUBSTRING(t.S2, 1, c1.P2 - 1),
SUBSTRING(t.S2, c1.P2 1, c2.P2 - c1.P2-1),
SUBSTRING(t.S2, c2.P2 1, 8000)
FROM #table AS t
CROSS APPLY (VALUES(CHARINDEX('/',t.S1),CHARINDEX('/',t.S2))) AS c1(P,P2)
CROSS APPLY (VALUES(CHARINDEX('/',t.S1,c1.P 1),CHARINDEX('/',t.S2,c1.P2 1))) AS c2(P,P2)
CROSS APPLY (VALUES(CHARINDEX('/',t.S1,c2.P 1),CHARINDEX('/',t.S2,c2.P2 1))) AS c3(P,P2)
) AS f(SomeId,c1_1,c1_2,c1_3,c2_1,c2_2,c2_3)
CROSS APPLY (VALUES (c1_1, c2_1), (c1_2, c2_2),(c1_3,c2_3)) AS f2(c1,c2);
PRINT DATEDIFF(MS,@st,GETDATE())
GO 3
100K 行測驗結果:
dbo.udf_SplitList -Andy3B
------------------------------------------------------------------------------------------
Beginning execution loop
24593
24566
24530
Batch execution completed 3 times.
dbo.fnString_Split - LukStorms
------------------------------------------------------------------------------------------
Beginning execution loop
8147
8260
8257
Batch execution completed 3 times.
Recursive CTE - TALLY TABLE DhruvJoshi
------------------------------------------------------------------------------------------
Beginning execution loop
6867
6733
6850
Batch execution completed 3 times.
TALLY TABLE without Recursion DhruvJoshi
------------------------------------------------------------------------------------------
Beginning execution loop
4677
4630
4620
Batch execution completed 3 times.
JSON - Yitzhak Khabinsky
------------------------------------------------------------------------------------------
Beginning execution loop
1667
1653
1670
Batch execution completed 3 times.
CROSS APPLY TECHNIQUE
------------------------------------------------------------------------------------------
Beginning execution loop
283
280
284
Batch execution completed 3 times.
100萬行測驗結果:
dbo.udf_SplitList -Andy3B
------------------------------------------------------------------------------------------
Beginning execution loop
246057
245296
247017
Batch execution completed 3 times.
dbo.fnString_Split - LukStorms
------------------------------------------------------------------------------------------
Beginning execution loop
85340
83010
83674
Batch execution completed 3 times.
Recursive CTE - TALLY TABLE -DhruvJoshi
------------------------------------------------------------------------------------------
Beginning execution loop
67226
64910
64740
Batch execution completed 3 times.
TALLY TABLE without Recursion DhruvJoshi
------------------------------------------------------------------------------------------
Beginning execution loop
46777
44630
44623
Batch execution completed 3 times.
JSON - Yitzhak Khabinsky
------------------------------------------------------------------------------------------
Beginning execution loop
16710
16830
16520
Batch execution completed 3 times.
CROSS APPLY TECHNIQUE
------------------------------------------------------------------------------------------
Beginning execution loop
2846
2793
2850
Batch execution completed 3 times.
Completion time: 2022-01-18T22:08:50.3264912-06:00
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/414974.html
標籤:
下一篇:多級位置-輸出為分隔系列
