如何在python中添加兩個不同csv列的條目時避免得到Nan-有解無憂

我正在添加兩個不同的 csv 列來制作 2D hist plot。我有兩種不同型別的資料，它們是dense和collision。

此外，每個資料都包含我的案例研究的資訊，type其中包含我有type=0（大）和type=1（小）的列。csv 看起來像這樣（來自碰撞）：

TIMESTEP id     type    a      |f|     |v|  
20000   4737     0     9.81  1.31495  4.18007   
40000  11991     1     9.81  4.43794  4.17909   
50000  15725     1     9.81  4.43794  4.17810     
30000   8209     0     9.81  4.43794  4.17810     
15000   3545     0     9.81  1.31495  4.17810   
30000   8269     0     9.81  4.43794  4.17810    
10000   2077     1     9.81  1.31495  4.17712   
20000   5079     0     9.81  1.31495  4.17712

所有資料都是float帶有正條目的型別。

當我分別繪制（大）和a（小）|f|兩種型別時，我沒有任何問題。情節也很有意義。然而，繪圖和從（即每個部分的每個條目的總和）看起來很奇怪。type=0type=1a|f|small big

small big我意識到90%雖然原始資料沒有任何Nan.

在制作完美的歷史情節Nan時如何避免？small big

我意識到collision:Small Big. 我期待有這樣的情節Dense:Small Big。

我的代碼在這里：

from cProfile import label
from matplotlib.colors import LogNorm

df_collision_big = df_collision[df_collision['type'] == 0]
df_collision_small = df_collision[df_collision['type'] == 1]

df_dense_big =   df_dense[df_dense['type'] == 0]
df_dense_small = df_dense[df_dense['type'] == 1]


plt.subplots(figsize=(14, 6))
#make space between subplots
plt.subplots_adjust(wspace=0.5, hspace=0.6)
plt.subplot(231)
plt.hist2d(df_collision_small['a'], df_collision_small['|f|'], bins=np.linspace(0,70,15), norm=LogNorm())
plt.colorbar()
plt.xlabel('a')
plt.ylabel('|f|')
plt.title('Collision: Small')
plt.subplot(232)
plt.hist2d(df_collision_big['a'], df_collision_big['|f|'], bins=np.linspace(0,70,15), norm=LogNorm())
plt.colorbar()
plt.xlabel('a')
plt.ylabel('|f|')
plt.title('Collision: Big')
plt.subplot(233)
plt.hist2d(df_collision_big['a']   df_collision_small['a'], df_collision_big['|f|']   df_collision_small['|f|'], bins=np.linspace(0,70,15), norm=LogNorm())
plt.colorbar()
plt.xlabel('a')
plt.ylabel('|f|')
plt.title('Collision: Small   Big')

plt.subplot(234)
plt.hist2d(df_dense_small['a'], df_dense_small['|f|'], bins=np.linspace(0,70,15), norm=LogNorm())
plt.colorbar()
plt.xlabel('a')
plt.ylabel('|f|')
plt.title('Dense: Small')
plt.subplot(235)
plt.hist2d(df_dense_big['a'], df_dense_big['|f|'], bins=np.linspace(0,70,15), norm=LogNorm())
plt.colorbar()
plt.xlabel('a')
plt.ylabel('|f|')
plt.title('Dense: Big')
plt.subplot(236)
plt.hist2d(df_dense_big['a']   df_dense_small['a'], df_dense_big['|f|']   df_dense_small['|f|'], bins=np.linspace(0,70,15), norm=LogNorm())
plt.colorbar()
plt.xlabel('a')
plt.ylabel('|f|')
plt.title('Dense: Big   Small')
plt.savefig('hist2d.png', dpi=300)
plt.show()

列印df_collision['a]給了我這個：

175761    9.810009
409899    9.810058
429591    9.810058
358086    9.810009
89079     9.810009
            ...   
243866    9.810058
125778    9.810009
185374    9.810009
496586    9.810058
234942    9.810058
Name: a, Length: 27832, dtype: float64

中的大多數值a是相似的。

列印df_collision_big['a'] df_collision_small['a']給了我這個：

0         19.620067
1         19.620067
2         19.620067
3         19.620067
4         19.620067
            ...    
504208          NaN
504209          NaN
504210          NaN
504211          NaN
504212          NaN
Name: a, Length: 18639, dtype: float64

還有一件事：

列印大小的 len 給了我這個：

print(len(df_collision_small['a']))
print(len(df_collision_big['a']))

# Output
13772
14060

希望有一些建議來解決這個問題。

謝謝。

uj5u.com熱心網友回復：

您可以將NaN問題總結為：

pd.merge(df_collision_small["a"], df_dense_big["a"], how="outer", left_index=True, right_index=True).sum(axis=1)

解釋

outer joinon index 會產生一個資料幀，其中兩個資料幀中的“a”放在一起（索引/位置明智）：

df_collision_small = pd.DataFrame({"a" : [1,2]})
>>    a
>> 0  1
>> 1  2

df_dense_big = pd.DataFrame({"a" : [10,20,30,40]})
>>     a
>> 0  10
>> 1  20
>> 2  30
>> 3  40

pd.merge(df_collision_small, df_dense_big, how="outer", left_index=True, right_index=True)
>>    a_x  a_y
>> 0  1.0   10
>> 1  2.0   20
>> 2  NaN   30
>> 3  NaN   40

sum(axis=1)NaN對值求和時忽略：

pd.merge(df_collision_small["a"], df_dense_big["a"], how="outer", left_index=True, right_index=True).sum(axis=1)
>> 0    11.0
>> 1    22.0
>> 2    30.0
>> 3    40.0

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/515542.html

標籤：Pythonpython-3.xCSVmatplotlib直方图

上一篇：使用哈希輸出CSV的資料

下一篇：查找多行或所有行的pandas資料框列的平均值