如何使用條件獲取熊貓資料框中的出現次數-有解無憂

我有一個如下所示的資料框：

 product  stars
  10717  4
  10717  5
  10717  5
  10717  5
  10717  3
  10717  2
  10717  2
  10711  2
  10711  1
  10711  5
  10711  1
  10711  1
  10711  5
  10711  2

我有數千行。
我想為每個不同的產品計算每顆星的出現次數（從 1 到 5）。
我怎樣才能做到這一點？

我試圖通過以下方式獲取不同產品的串列： dp = df.product.unique()
比我遍歷它：

for key in dd:    
    sf_1[(sf_1['product'] == key)].value_counts()

結果如下所示：

product  stars
10717        5         3
             4         1
             3         1
             2         2
dtype: int64

product  stars
10711        5        2
             2        3
             1        2
dtype: int64

我需要的是一個新的資料框，看起來像

product     stars    number_stars
10717         5        3
10717         4        1
10717         3        1
10717         2        2 
10717         1        0
10711         5        2
10711         4        0
10711         3        0
10711         2        3
10711         1        2

uj5u.com熱心網友回復：

并計算每個產品groupby的.size星數。第 2 行和第 3 行只是格式化資料以使其看起來像您在問題中顯示的那樣，您可能實際上并不需要它們。

df.groupby(["product", "stars"]).size() \
  .unstack(fill_value=0).stack() \
  .to_frame("number_stars").reset_index()

   product  stars  number_stars
0    10711      1             3
1    10711      2             2
2    10711      3             0
3    10711      4             0
4    10711      5             2
5    10717      1             0
6    10717      2             2
7    10717      3             1
8    10717      4             1
9    10717      5             3

uj5u.com熱心網友回復：

這應該作業：

import pandas as pd

df = pd.DataFrame([
  [10717,  4],
  [10717,  5],
  [10717,  5],
  [10717,  5],
  [10717,  3],
  [10717,  2],
  [10717,  2],
  [10711,  2],
  [10711,  1],
  [10711,  5],
  [10711,  1],
  [10711,  1],
  [10711,  5],
  [10711,  2]
  ],columns=['product','stars']
  )
  
newdf = df.groupby(['product','stars']).size()
newdf = newdf.reset_index()
newdf = newdf.rename(columns={0:'number_stars'})

結果：

>>> newdf
   product  stars  number_stars
0    10711      1             3
1    10711      2             2
2    10711      5             2
3    10717      2             2
4    10717      3             1
5    10717      4             1
6    10717      5             3

uj5u.com熱心網友回復：

一個可能的解決方案，基于pandas.DataFrame.value_counts, pandas.DataFrame.reindexand pandas.MultiIndex.from_product（只需要reindexandMultiIndex部分來獲得零計數）：

stars = list(range(1,6))
cols = ['product', 'stars']

(df.value_counts(cols)
 .reindex(pd.MultiIndex.from_product([df['product'].unique(),stars], names=cols),
          fill_value=0)
 .rename('number_stars').reset_index())

輸出：

   product  stars  number_stars
0    10717      1             0
1    10717      2             2
2    10717      3             1
3    10717      4             1
4    10717      5             3
5    10711      1             3
6    10711      2             2
7    10711      3             0
8    10711      4             0
9    10711      5             2

轉載請註明出處，本文鏈接：https://www.uj5u.com/ruanti/519494.html

標籤：python-3.x熊猫数据框

上一篇：將str標題應用于字典值中的df列值

下一篇：使用字典函式呼叫df列