我一直在互聯網上四處尋找,似乎沒有與我的情況相符的答案。
我正在努力計算 SQL Server 中的確切下四分位數和上四分位數。我知道 SQL Server 有一個便于計算四分位數的內置函式,即 NTILE 函式。但這對我的情況來說還不夠。
給出下表的值(請注意,該表包含的產品和價格比下表中的要多):
| 平均價格 | 產品編號 | 年 |
|---|---|---|
| 45.7820 | 2 | 2015年 |
| 46.0142 | 2 | 2016年 |
| 59.0133 | 2 | 2017年 |
| 60.1707 | 2 | 2018年 |
| 62.6600 | 2 | 2019年 |
我正在運行以下查詢:
SELECT
AveragePrice
,NTILE(4) OVER (
PARTITION BY ProductNumber ORDER BY AveragePrice
) AS Quartile
FROM products
這給出了以下結果:
| 平均價格 | 四分位數 |
|---|---|
| 45.7820 | 1 |
| 46.0142 | 1 |
| 59.0133 | 2 |
| 60.1707 | 3 |
| 62.6600 | 4 |
對于完整的背景關系,整個查詢如下所示:
SELECT ProductNumber
,MIN(AveragePrice) Minimum
,MAX(CASE
WHEN Quartile = 1
THEN AveragePrice
END) AS Quartile_1
,
MAX(CASE
WHEN Quartile = 3
THEN AveragePrice
END) AS Quartile_3
,MAX(AveragePrice) Maximum
,COUNT(Quartile) AS 'Number of items'
FROM (
SELECT ProductNumber
,AveragePrice
,NTILE(4) OVER (
PARTITION BY ProductNumber ORDER BY ProductNumber
) AS Quartile
FROM #temp_products
) Vals
GROUP BY ProductNumber
ORDER BY ProductNumber
但是當我手動計算四分位數時,第一個四分位數應該是:45.8981(在這種特殊情況下第一行和第二行的平均值)而不是 46.0142。
The third quartile should be 61.41535 (average of the third and second quartile in this particular case) not 60.1707 .
So to make it clear. This is a part of a stored procedure where multiple pricegroups is calculated and aggregated into groups containing average prices. I need to calculate the upper and lower quartiles from these average prices grouped by product number. The result set should contain the productnumber, lower quartile and upper quartile. Can someone help me or guide me in the correct direction?
uj5u.com熱心網友回復:
NTILE()在某些情況下,正在以一些奇怪的方式進行匯總。我寧愿使用帶有等級的整數除法進行分組。此解決方案適用于任意數量的值,并在需要時考慮平均值。
LEAD 是捕捉下一行值的魔法視窗函式
select *
,[Q] = case when [rank] in ((N 3)/4 ,(N 1)/2, (3*N 1)/4) then
case [decimal]
when 0 then AveragePrice
when 0.25 then /*pondered avg*/(3*AveragePrice LEAD(AveragePrice,1,null)over(PARTITION BY ProductNumber ORDER BY AveragePrice)) / 4
when 0.5 then /*simple avg*/( AveragePrice LEAD(AveragePrice,1,null)over(PARTITION BY ProductNumber ORDER BY AveragePrice)) / 2
when 0.75 then /*pondered avg*/( AveragePrice 3*LEAD(AveragePrice,1,null)over(PARTITION BY ProductNumber ORDER BY AveragePrice)) / 4
end
end
from(
select *
,[rank] = ROW_NUMBER()over(PARTITION BY ProductNumber ORDER BY AveragePrice)
,[N] = SUM(1)over()
,[group4] = ((ROW_NUMBER()over(PARTITION BY ProductNumber ORDER BY AveragePrice)-1 )*4 / SUM(1)over())
,[decimal] = case /*rank*/ROW_NUMBER()over(PARTITION BY ProductNumber ORDER BY AveragePrice)
when /*Q1*/ (SUM(1)over() 3)/4 then (SUM(1)over() 3)/4.0 - FLOOR((SUM(1)over() 3)/4.0)
when /*Q2*/ (SUM(1)over() 1)/2 then (SUM(1)over() 1)/2.0 - FLOOR((SUM(1)over() 1)/2.0)
when /*Q3*/(3*SUM(1)over() 1)/4 then (3*SUM(1)over() 1)/4.0 - FLOOR((3*SUM(1)over() 1)/4.0)
end
from
(values(45.7820,2,2015),(46.0142,2,2016),(59.0133,2,2017),(60.1707,2,2018),(62.6600,2,2019))a(AveragePrice,ProductNumber,Year)
)a
| 平均價格 | 產品編號 | 年 | 秩 | N | 第4組 | 十進制 | 問 |
|---|---|---|---|---|---|---|---|
| 45.7820 | 2 | 2015年 | 1 | 5 | 0 | 空值 | 空值 |
| 46.0142 | 2 | 2016年 | 2 | 5 | 0 | 0.000000 | 46.014200 |
| 59.0133 | 2 | 2017年 | 3 | 5 | 1 | 0.000000 | 59.013300 |
| 60.1707 | 2 | 2018年 | 4 | 5 | 2 | 0.000000 | 60.170700 |
| 62.6600 | 2 | 2019年 | 5 | 5 | 3 | 空值 | 空值 |
uj5u.com熱心網友回復:
好的,受這篇文章的啟發,我設法構建了一個實際計算精確四分位數的查詢:
-- ; since it is being used in a sp
;WITH quartile_data AS (
SELECT Price,
ProductNumber
FROM (values(29.4785,2,2015),(30.0000,2,2016),(33.4762,2,2017),(35.2917,2,2018),(35.8731,2,2018),(36.2475,2,2018),(37.9790,2,2018),(39.5846,2,2018),(67.4443,2,2018))sales(Price,ProductNumber)
)
--Aggregate into a single record for each group, using MAX to select the non-null
--detail value for each column
-- ISNULL check to include groups with three values as well
SELECT ProductNumber,
(Max(Q1NextVal) - MAX(Q1Val)) * Max(Q1Frac) Max(Q1Val) as [Q1],
(Max(MidVal1) Max(MidVal2)) / 2 [Median],
(ISNULL(Max(Q3NextVal),0) - MAX(Q3Val)) * Max(Q3Frac) Max(Q3Val) as [Q3]
-- save the result into a temp table
INTO #my_temp_table
FROM (
--Expose the detail values for only the records at the index values
--generated by the summary subquery. All other values are left as NULL. som NULL.
SELECT detail.ProductNumber,
CASE WHEN RowNum = Q1Idx THEN Price ELSE NULL END Q1Val,
CASE WHEN RowNum = Q1Idx 1 THEN Price ELSE NULL END Q1NextVal,
CASE WHEN RowNum = Q3Idx THEN Price ELSE NULL END Q3Val,
CASE WHEN RowNum = Q3Idx 1 THEN Price ELSE NULL END Q3NextVal,
Q1Frac,
Q3Frac,
CASE WHEN RowNum = MidPt1 THEN Price ELSE NULL END MidVal1,
CASE WHEN RowNum = MidPt2 THEN Price ELSE NULL END MidVal2
FROM
--Calculate a row number sorted by measure for each group.
(SELECT *, ROW_NUMBER() OVER (PARTITION BY ProductNumber ORDER BY Price) RowNum
FROM quartile_data) AS detail
INNER JOIN (
--Summarize to find index numbers and fractions we need to use to locate
--the values at the quartile points.
-- The modulus operator is used to sum the correct number if the number of rows in the group is even or uneven
SELECT ProductNumber,
FLOOR((COUNT(*) IIF((COUNT(*) % 2 = 0), 2,1)) / 4.0) Q1Idx,
((COUNT(*) IIF((COUNT(*) % 2 = 0), 2,1)) / 4.0) - FLOOR((COUNT(*) IIF((COUNT(*) % 2 = 0), 2,1)) / 4.0) Q1Frac,
(COUNT(*) 1) / 2 AS MidPt1,
(COUNT(*) 2) / 2 AS Midpt2,
FLOOR((COUNT(*) * 3 IIF((COUNT(*) % 2 = 0), 2,3)) / 4.0) Q3Idx,
((COUNT(*) * 3 IIF((COUNT(*) % 2 = 0), 2,3)) / 4.0) - FLOOR((COUNT(*) * 3 IIF((COUNT(*) % 2 = 0), 2,3)) / 4.0) Q3Frac
FROM quartile_data
GROUP BY ProductNumber
HAVING COUNT(*) > 1
) AS summary ON detail.ProductNumber = summary.ProductNumber
) AS step_two
GROUP BY ProductNumber
-- Include only groups with more than 2 rows
HAVING count(*) > 2
以下價格: 29.4785 30.0000 33.4762 35.2917 35.8731 36.2475 37.9790 39.5846 67.4443
給出正確的值:Q1 = 31.7381000000 和 Q3 = 38.7818000000
使用此在線工具進行驗證
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/353318.html
標籤:sql sql-server algorithm tsql quartile
上一篇:SSIS如何在IN子句中使用引數
