獲取在“位置”列中具有相同不同值組合的ID數量-有解無憂

我有一張桌子，上面有他們去過的 id 和位置。

ID	地點
1	馬里蘭
1	愛荷華州
2	馬里蘭
2	德克薩斯州
3	喬治亞州
3	愛荷華州
4	馬里蘭
4	愛荷華州
5	馬里蘭
5	愛荷華州
5	德克薩斯州

我想執行一個查詢，允許我獲取每個組合的 id 數。

在此示例表中，輸出將是 -

Maryland, Iowa - 2
Maryland, Texas - 1
Georgia, Iowa - 1
Maryland, Iowa, Texas - 1

我最初的想法是添加每個 id 不同位置的 ASCII 值，并查看每個值有多少，以及與該值對應的組合是什么。我無法做到這一點，因為 SQL 服務器不允許我將 nvarchar 轉換為數字資料型別。有沒有其他方法可以使用 SQL 來獲取每個組合的設備數量？使用 python 來獲取每個組合的 id 數量也是可以接受的，但是最好使用 SQL。

uj5u.com熱心網友回復：

如果你想在 SQL 中解決這個問題并且你正在運行 SQL Server 2017 或更高版本，你可以使用 CTE 來聚合每個idusing的位置STRING_AGG，然后計算每個聚合字串的出現次數：

WITH all_locations AS (
  SELECT STRING_AGG(Location, ', ') WITHIN GROUP (ORDER BY Location) AS aloc
  FROM locations
  GROUP BY id
)
SELECT aloc, COUNT(*) AS cnt
FROM all_locations
GROUP BY aloc
ORDER BY cnt, aloc

輸出：

aloc                    cnt
Georgia, Iowa           1
Iowa, Maryland, Texas   1
Maryland, Texas         1
Iowa, Maryland          2

請注意，我已對應用了命令以STRING_AGG確保訪問馬里蘭州和愛荷華州的人與訪問愛荷華州然后訪問馬里蘭州的人受到相同的對待。如果這不是所需的行為，只需洗掉該WITHIN GROUP子句。

dbfiddle 上的演示

uj5u.com熱心網友回復：

使用groupby agg value_counts：

new_df = df.groupby('id')['Location'].agg(list).str.join(', ').value_counts().reset_index()

輸出：

>>> new_df
                   index  Location
0         Maryland, Iowa         2
1        Maryland, Texas         1
2          Georgia, Iowa         1
3  Maryland, Iowa, Texas         1

uj5u.com熱心網友回復：

讓我們這樣groupby做joinvalue_counts

df.groupby('id')['Location'].agg(', '.join).value_counts()
Out[938]: 
join                 
Maryland, Iowa           2
Georgia, Iowa            1
Maryland, Iowa, Texas    1
Maryland, Texas          1
dtype: int64

uj5u.com熱心網友回復：

使用 afrozenset進行聚合以確保具有唯一組：

df.groupby('id')['Location'].agg(', '.join).value_counts()

輸出：

(Maryland, Iowa)           2
(Texas, Maryland)          1
(Georgia, Iowa)            1
(Texas, Maryland, Iowa)    1
Name: Location, dtype: int64

或排序字串連接：

df.groupby('id')['Location'].agg(lambda x: ', '.join(sorted(x))).value_counts()

輸出：

Iowa, Maryland           2
Maryland, Texas          1
Georgia, Iowa            1
Iowa, Maryland, Texas    1
Name: Location, dtype: int64

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/446231.html

標籤：Python sql sql服务器熊猫

上一篇：從python中的檔案夾讀取csv檔案時動態設定資料框名稱

下一篇：Pandas：通過將函式應用于另一列的值過濾列？