根據資料框中的另一列計算出現次數-有解無憂

在資料庫中，我有如下所示的資料（只是一個片段）：

itemId   userId     action       likeorDislike     timestamp
i1       u1         rate         0                 2021-06-09 10:43:57.827 UTC
i1       u1         rate         1                 2021-06-10 10:43:57.827 UTC
i1       u2         rate         1                 2021-06-09 11:43:57.827 UTC
i1       u3         rate         1                 2021-06-09 12:43:57.827 UTC
i2       u6         rate         1                 2021-06-09 10:43:57.827 UTC
i2       u6         rate         0                 2021-08-09 10:43:57.827 UTC
i2       u1         rate         0                 2021-06-12 10:43:57.827 UTC
i4       u1         rate         1                 2021-06-09 10:45:57.827 UTC
i4       u1         rate         1                 2021-06-09 10:48:57.827 UTC
i4       u3         rate         1                 2021-06-09 10:45:58.827 UTC
i1       u5         select                         2021-06-09 10:45:58.827 UTC

我想為每個計算一個分數itemId，它是1's的數量超過該likeorDislike特定專案的總和，但只考慮用戶對特定專案進行評分的第一次嘗試。

itemId     score
i1         0.66                 
i2         0.5
i4         1

我做了什么：只保留我感興趣的資料（按時間戳排序，清理等），但我現在不知道如何計算上述分數：

itemId     likeOrDislike     
i1         0      
i1         1 
i2         1
i1         1
i2         0
i4         1
i4         1

我還撰寫了一個 SQL 查詢來計算這些分數，但不考慮時間戳。

%%bigquery counting_answers_new
SELECT trueAnswers.itemId, IFNULL(falseAnswers.answerCount, 0) as falseAnswerCount, trueAnswers.answerCount as trueAnswerCount, IFNULL((trueAnswers.answerCount/(falseAnswers.answerCount trueAnswers.answerCount)), 1) as score
FROM 
 ( SELECT itemId, likeorDislike, action, COUNT(*) as answerCount
 FROM `mydata` 
 GROUP BY itemId, likeorDislike, action
 HAVING action='rate' AND likeorDislike=false
 ORDER BY itemId) as falseAnswers
 RIGHT JOIN
 ( SELECT itemId, likeorDislike, acton, COUNT(*) as answerCount
 FROM `mydata` 
 GROUP BY itemId, likeorDislike, action
 HAVING action='rate' AND likeorDislike=true
 ORDER BY itemId) as trueAnswers
 ON falseAnswers.itemId = trueAnswers.itemId

我想在pandas上面的兩列資料框中進行操作，因為我已經對其進行了清理/過濾。我知道如何計算 0 和 1 的出現次數，但是我如何做那個專業專案，以及如何計算這些分數？

uj5u.com熱心網友回復：

似乎您只想要 itemId 的 likeOrDislike 的平均值。在熊貓你可以這樣做：

df[["itemId", "likeOrDislike"]].groupby("itemId").mean()

轉載請註明出處，本文鏈接：https://www.uj5u.com/shujuku/366503.html

標籤：Python sql 熊猫数据框谷歌大查询

上一篇：Sql多重嵌套選擇

下一篇：在PostgreSQL中從多個表中搜索文本和字串的優化方法是什么