我的資料集有以下列:
Voted? Political Category
Yes Right
No Left
Not Answered Center
Yes Right
Yes Right
No Right
我需要計算卡方來查看哪個類別與投票的人最相關。兩列都包含字串。為了應用卡方,我怎樣才能給每個值一個數字表示?
uj5u.com熱心網友回復:
您可以使用pd.factorize對分類變數進行編碼:
df['nVoted?'] = pd.factorize(df['Voted?'])[0]
df['nCategory'] = pd.factorize(df['Political Category'])[0]
print(df)
# Output
Voted? Political Category nVoted? nCategory
0 Yes Right 0 0
1 No Left 1 1
2 Not Answered Center 2 2
3 Yes Right 0 0
4 Yes Right 0 0
5 No Right 1 0
之后你可以使用scipy.stats.chisquare
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/420093.html
標籤:
