我有一個資料框,其中有一列包含字串串列。
id sentence category
0 "I love basketball and dunk to the basket" ['basketball']
1 "I am playing football and basketball tomorrow " ['football', 'basketball']
我想做兩件事:
-
- 轉換類別列,其中前一個串列中的每個元素都變成一個字串,每個字串都有一行,并且具有相同的 id 和句子
-
- 按類別擁有一個資料框
步驟 1) 的預期輸出:
id sentence category
0 "I love basketball and dunk to the basket" 'basketball'
1 "I am playing football and tomorrow basketball" 'football'
1 "I am playing football and tomorrow basketball" 'basketball'
步驟 2) 的預期輸出:
DF_1
id sentence category
0 "I love basketball and dunk to the basket" 'basketball'
1 "I am playing football and tomorrow basketball" 'basketball'
DF_2
id sentence category
1 "I am playing football and tomorrow basketball" 'football'
我怎樣才能做到這一點 ?對于每個串列并檢查每個串列的 len 都可以作業,但是有沒有更快/更優雅的方法?
uj5u.com熱心網友回復:
你可以explode“分類”;然后groupby:
out = [g for _, g in df.explode('category').groupby('category')]
然后,如果您列印以下專案out:
for i in out:
print(i, end='\n\n')
你會看到的:
id sentence category
0 0 I love basketball and dunk to the basket basketball
1 1 I am playing football and basketball tomorrow basketball
id sentence category
1 1 I am playing football and basketball tomorrow football
uj5u.com熱心網友回復:
您將需要兩個工具:explode和groupby。
首先讓我們準備我們的資料,并確保 explode 將與literal_eval一起使用:
import pandas as pd
from io import StringIO
from ast import literal_eval
csvfile = StringIO(
"""id\tsentence\tcategory
0\t"I love basketball and dunk to the basket"\t["basketball"]
1\t"I am playing football and basketball tomorrow "\t["football", "basketball"]""")
df = pd.read_csv(csvfile, sep = '\t', engine='python')
df.loc[:, 'category'] = df.loc[:, 'category'].apply(literal_eval)
然后爆炸關于您的類別列:
df = df.explode('category')
最后,您可以將groupby其用作字典并將子資料幀存盤在其他地方:
dg = df.groupby('category')
list_dg = []
for n, g in dg:
list_dg.append(g)
dgImo,如果可能的話,我會堅持下去
轉載請註明出處,本文鏈接:https://www.uj5u.com/qianduan/449011.html
標籤:Python python-3.x 熊猫 数据框
上一篇:格式化條件陳述句的最佳方法
下一篇:檢查df中的字典是否相等
