我試圖提出相關性的熱圖,但我意識到有些是錯誤的。
下面是我的熱圖。如您所見,該操作的編號并未出現。

這是我的資料框
all_gen_cols = steamUniqueTitleGenre[['action', 'adventure','casual', 'indie','massively_multiplayer','rpg','racing','simulation','sports','strategy']]
action adventure casual indie massively_multiplayer rpg racing simulation sports strategy
0 1 0 0 0 0 0 0 0 0 0
1 1 1 0 0 1 0 0 0 0 0
2 1 1 0 0 0 0 0 0 0 1
3 1 1 0 0 1 0 0 0 0 0
4 1 0 0 0 1 1 0 0 0 1
這是生成熱圖的代碼
def plot_correlation_heatmap(df):
corr = df.corr()
sb.set(style='white')
mask = np.zeros_like(corr, dtype=np.bool)
mask[np.triu_indices_from(mask)] = True
f, ax = plt.subplots(figsize=(11,9))
cmap = sb.diverging_palette(220, 10, as_cmap=True)
sb.heatmap(corr, mask=mask, cmap=cmap, vmax=0.3, center=0,
square=True, linewidths=.5, cbar_kws={"shrink": .5}, annot=True)
plt.yticks(rotation=0)
plt.show()
plt.rcdefaults()
plot_correlation_heatmap(all_gen_cols)
我不確定是什么錯誤。
print(all_gen_cols.corr())
相關的結果如下。我看到 NaN 采取行動,但我不確定為什么是 Nan。
action adventure casual indie massively_multiplayer rpg racing simulation sports strategy
action NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
adventure NaN 1.000000 0.007138 0.135392 0.023964 0.239136 -0.039846 0.036345 -0.064489 0.001435
casual NaN 0.007138 1.000000 0.235474 0.003487 -0.057726 0.079943 0.161448 0.149549 0.084417
indie NaN 0.135392 0.235474 1.000000 -0.082661 0.023372 0.045006 0.064723 0.056297 0.076749
massively_multiplayer NaN 0.023964 0.003487 -0.082661 1.000000 0.160078 0.036685 0.139929 0.018444 0.074683
rpg NaN 0.239136 -0.057726 0.023372 0.160078 1.000000 -0.046970 0.044506 -0.051714 0.097123
racing NaN -0.039846 0.079943 0.045006 0.036685 -0.046970 1.000000 0.127511 0.308864 -0.012170
simulation NaN 0.036345 0.161448 0.064723 0.139929 0.044506 0.127511 1.000000 0.212622 0.208754
sports NaN -0.064489 0.149549 0.056297 0.018444 -0.051714 0.308864 0.212622 1.000000 0.020048
strategy NaN 0.001435 0.084417 0.076749 0.074683 0.097123 -0.012170 0.208754 0.020048 1.000000
下面是列印出來的 print(all_gen_cols.describe())
action adventure casual indie massively_multiplayer rpg racing simulation sports strategy
count 14570.0 14570.000000 14570.000000 14570.000000 14570.000000 14570.000000 14570.000000 14570.000000 14570.000000 14570.000000
mean 1.0 0.362663 0.232189 0.657241 0.050927 0.165202 0.040288 0.121826 0.044269 0.127111
std 0.0 0.480785 0.422244 0.474648 0.219855 0.371376 0.196641 0.327096 0.205699 0.333108
min 1.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
25% 1.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
50% 1.0 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
75% 1.0 1.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
max 1.0 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
資料
這是下載資料框的
由于action = [1,1,...,1] => var(action) = 0. 因此,rho(action, Y)(其中Y是任何其他列)的分母為零=> rho(action, Y)是未定義的 (NaN)。
正如其他用戶所建議的那樣,您應該在計算相關矩陣之前洗掉 'action' 列,因為它不會添加資訊。
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/350775.html
上一篇:在Pandas資料框中查找段
