我有一個串列中的圖元字典,想把它們轉換為pandas資料框架,但有一些困難。
我的資料如下:
{0: [('A1'/span>, 0.0037505763997138838),
('A2', 0.0036963076240675245),
('A3', 0.0035451257931104485),
('A4', 0.003501467316849233),
('A5', 0.00343229837150675),
('A6', 0.0033731723637910062),
('A7', 0.0033713118048861465),
('A8', 0.003325231288305062),
('A9', 0.002885164987475754),
('A10', 0.0028834984584371797)],
1: [('B1', 0.011094831353420088)。
('B2', 0.009526049091086916),
('B3', 0.007002935827927014),
('B4', 0.00511673700015512),
('B5', 0.004870300921667765),
('B6', 0.004496108376557714),
('B7', 0.004230892962061271),
('B8', 0.004137434850455194),
('B9', 0.003958335393193675),
('B10', 0.0038285145788315993)]}
而我想在Pandas中把它轉化成以下內容
num label probs
0 A10.0037505763997138838
0 A20.0036963076240675245
0 A30.0035451257931104485
0 A4 0.003501467316849233[/span]。
0 A5 0.00343229837150675[/span
0 A60.0033731723637910062
0 A7 0.0033713118048861465[/span
0 A8 0.003325231288305062[/span]。
0 A9 0.002885164987475754[/span]。
0 A10 0.0028834984584371797[/span]。
1 B10.011094831353420088
1 B20.009526049091086916
1 B3 0.007002935827927014[/span]。
1B4 0.00511673700015512
1 B50.004870300921667765
1 B60.004496108376557714
1 B7 0.004230892962061271[/span
1 B8 0.004137434850455194[/span
1 B9 0.003958335393193675[/span]。
1B10 0.0038285145788315993
uj5u.com熱心網友回復:
你可以試試:
(假設data是dict的名稱:)
df = (pd.Series(data)
.explode()
.apply(pd.Series)
.reset_index()
)
df.columns = ['num'/span>, 'label'/span>, 'probs'/span>]
結果:
print(df)
num label probs
0 0 A1 0.003751
1 0 A2 0.0036962 0 A3 0.003545
3 0 A4 0.003501
4 0 A5 0.003432
5 0 A6 0.0033736 0 A7 0.003371
7 0 A8 0.0033258 0 A9 0.0028859 0 A10 0.00288310 1 B1 0.011095[/span
11 1 B2 0.00952612 1 B3 0.007003
13 1 B4 0.005117
14 1 B5 0.004870
15 1 B6 0.004496
16 1 B7 0.00423117 1 B8 0.00413718 1 B9 0.00395819 1 B10 0.003829
另外,你也可以使用pd.DataFrame()來代替第2個pd.Series()以獲得更好的性能(感謝@anky的建議),如下所示:
s = pd.Series(data).explode()
df = (pd.DataFrame(s.tolist(),columns=['label', 'prob'], index=s.index)
.rename_axis(index='num')
.reset_index()
)
結果:
print(df)
num label probs
0 0 A1 0.003751
1 0 A2 0.0036962 0 A3 0.003545
3 0 A4 0.003501
4 0 A5 0.003432
5 0 A6 0.0033736 0 A7 0.003371
7 0 A8 0.0033258 0 A9 0.0028859 0 A10 0.00288310 1 B1 0.011095[/span
11 1 B2 0.00952612 1 B3 0.007003
13 1 B4 0.005117
14 1 B5 0.004870
15 1 B6 0.004496
16 1 B7 0.00423117 1 B8 0.00413718 1 B9 0.00395819 1 B10 0.003829
uj5u.com熱心網友回復:
我們可以使用理解語法來創建一個三聯體的串列(名字、標簽和probs),然后你可以很容易地從這個串列中創建資料框架
c = ['name', 'label', 'prob']
pd.DataFrame([(k, *t) for k, v in d.items() for t in v], columns=c)
name label probs
0 0 A1 0.003751
1 0 A2 0.0036962 0 A3 0.003545
3 0 A4 0.003501
4 0 A5 0.003432
5 0 A6 0.0033736 0 A7 0.003371
7 0 A8 0.0033258 0 A9 0.0028859 0 A10 0.00288310 1 B1 0.011095[/span
11 1 B2 0.00952612 1 B3 0.007003
13 1 B4 0.005117
14 1 B5 0.004870
15 1 B6 0.004496
16 1 B7 0.00423117 1 B8 0.00413718 1 B9 0.00395819 1 B10 0.003829
uj5u.com熱心網友回復:
你需要重新修改一下你的字典。在這里,我使用itertools.chain來組合這些值:
from itertools import chain
import pandas as pd
import numpy as np
df = (pd.DataFrame(list(chain(*d.values()))。
columns=['label', 'probs'] 。
index=np.repeat(list(d), list(map(len, d.values<)))))
.rename_axis('num')
.reset_index()
)
輸出:
num label probs
0 0 A1 0.003751
1 0 A2 0.0036962 0 A3 0.003545
3 0 A4 0.003501
4 0 A5 0.003432[/span
...
17 1 B8 0.004137 ...
18 1 B9 0.00395819 1 B10 0.003829
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/324004.html
標籤:
上一篇:熊貓的轉變
