在資料庫中,我有如下所示的資料(只是一個片段):
itemId userId action likeorDislike timestamp
i1 u1 rate 0 2021-06-09 10:43:57.827 UTC
i1 u1 rate 1 2021-06-10 10:43:57.827 UTC
i1 u2 rate 1 2021-06-09 11:43:57.827 UTC
i1 u3 rate 1 2021-06-09 12:43:57.827 UTC
i2 u6 rate 1 2021-06-09 10:43:57.827 UTC
i2 u6 rate 0 2021-08-09 10:43:57.827 UTC
i2 u1 rate 0 2021-06-12 10:43:57.827 UTC
i4 u1 rate 1 2021-06-09 10:45:57.827 UTC
i4 u1 rate 1 2021-06-09 10:48:57.827 UTC
i4 u3 rate 1 2021-06-09 10:45:58.827 UTC
i1 u5 select 2021-06-09 10:45:58.827 UTC
我想為每個計算一個分數itemId,它是1's的數量超過該likeorDislike特定專案的總和,但只考慮用戶對特定專案進行評分的第一次嘗試。
itemId score
i1 0.66
i2 0.5
i4 1
我做了什么:只保留我感興趣的資料(按時間戳排序,清理等),但我現在不知道如何計算上述分數:
itemId likeOrDislike
i1 0
i1 1
i2 1
i1 1
i2 0
i4 1
i4 1
我還撰寫了一個 SQL 查詢來計算這些分數,但不考慮時間戳。
%%bigquery counting_answers_new
SELECT trueAnswers.itemId, IFNULL(falseAnswers.answerCount, 0) as falseAnswerCount, trueAnswers.answerCount as trueAnswerCount, IFNULL((trueAnswers.answerCount/(falseAnswers.answerCount trueAnswers.answerCount)), 1) as score
FROM
( SELECT itemId, likeorDislike, action, COUNT(*) as answerCount
FROM `mydata`
GROUP BY itemId, likeorDislike, action
HAVING action='rate' AND likeorDislike=false
ORDER BY itemId) as falseAnswers
RIGHT JOIN
( SELECT itemId, likeorDislike, acton, COUNT(*) as answerCount
FROM `mydata`
GROUP BY itemId, likeorDislike, action
HAVING action='rate' AND likeorDislike=true
ORDER BY itemId) as trueAnswers
ON falseAnswers.itemId = trueAnswers.itemId
我想在pandas上面的兩列資料框中進行操作,因為我已經對其進行了清理/過濾。我知道如何計算 0 和 1 的出現次數,但是我如何做那個專業專案,以及如何計算這些分數?
uj5u.com熱心網友回復:
似乎您只想要 itemId 的 likeOrDislike 的平均值。在熊貓你可以這樣做:
df[["itemId", "likeOrDislike"]].groupby("itemId").mean()
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/366503.html
上一篇:Sql多重嵌套選擇
