我正在嘗試在表中進行資料標記,我需要以這樣一種方式來做,即在每一行中,索引都會重復,但是,在每一列中都有另一個 Enum 類。
到目前為止,我所做的是使用相同的列舉器類進行此表示。
將列單獨用作串列的解決方案也是可能的。但是,解決此問題的最佳方法是什么?
import pandas as pd
from enum import Enum
df = pd.DataFrame({'first': ['product and other', 'product2 and other', 'price'], 'second':['product and prices', 'price2', 'product3 and price']})
df
class Tipos(Enum):
B = 1
I = 2
L = 3
for index, row in df.iterrows():
sentencas = row.values
for sentenca in sentencas:
for pos, palavra in enumerate(sentenca.split()):
print(f"{palavra} {Tipos(pos 1).name}")
結果:
first second
0 product and other product and prices
1 product2 and other price2
2 price product3 and price
product B
and I
other L
product B
and I
prices L
product2 B
and I
other L
price2 B
price B
product3 B
and I
price L
預期結果:
Word Ent
0 product B_first
1 and I_first
2 other L_first
3 product B_second
4 and I_second
5 prices L_second
6 product2 B_first
7 and I_first
8 other L_first
9 price2 B_second
10 price B_first
11 product3 B_second
12 and I_second
13 price L_second
# In that case, the sequence is like that: (B_first, I_first, L_first, L_first...) and if changes the column gets B_second, I_second, L_second...
uj5u.com熱心網友回復:
Enum您可以使用dict映射代替使用。如果您展平資料框,則可以避免回圈:
out = df.unstack().str.split().explode().sort_index(level=1).to_frame('Word')
out['Ent'] = out.groupby(level=[0, 1]).cumcount().map(Tipos) \
'_' out.index.get_level_values(0)
out = out.reset_index(drop=True)
輸出:
>>> out
Word Ent
0 product B_first
1 and I_first
2 other L_first
3 product B_second
4 and I_second
5 prices L_second
6 product2 B_first
7 and I_first
8 other L_first
9 price2 B_second
10 price B_first
11 product3 B_second
12 and I_second
13 price L_second
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/399123.html
下一篇:在資料幀上計算標準偏差時的值錯誤
