如何轉換列中的字串串列并通過相同的字串拆分資料框以具有多個？-有解無憂

我有一個資料框，其中有一列包含字串串列。

id sentence                                            category
0  "I love basketball and dunk to the basket"          ['basketball']
1  "I am playing football and basketball tomorrow "    ['football', 'basketball']

我想做兩件事：

1. 轉換類別列，其中前一個串列中的每個元素都變成一個字串，每個字串都有一行，并且具有相同的 id 和句子
1. 按類別擁有一個資料框

步驟 1) 的預期輸出：

id sentence                                            category
0  "I love basketball and dunk to the basket"          'basketball'
1  "I am playing football and tomorrow basketball"     'football'
1  "I am playing football and tomorrow basketball"     'basketball'

步驟 2) 的預期輸出：

DF_1

id sentence                                            category
0  "I love basketball and dunk to the basket"          'basketball'
1  "I am playing football and tomorrow basketball"     'basketball'

DF_2

id sentence                                            category
1  "I am playing football and tomorrow basketball"     'football'

我怎樣才能做到這一點？對于每個串列并檢查每個串列的 len 都可以作業，但是有沒有更快/更優雅的方法？

uj5u.com熱心網友回復：

你可以explode“分類”；然后groupby：

out = [g for _, g in df.explode('category').groupby('category')]

然后，如果您列印以下專案out：

for i in out:
    print(i, end='\n\n')

你會看到的：

   id                                        sentence    category
0   0        I love basketball and dunk to the basket  basketball
1   1  I am playing football and basketball tomorrow   basketball

   id                                        sentence  category
1   1  I am playing football and basketball tomorrow   football

uj5u.com熱心網友回復：

您將需要兩個工具：explode和groupby。

首先讓我們準備我們的資料，并確保 explode 將與literal_eval一起使用：

import pandas as pd
from io import StringIO
from ast import literal_eval

csvfile = StringIO(
"""id\tsentence\tcategory
0\t"I love basketball and dunk to the basket"\t["basketball"]
1\t"I am playing football and basketball tomorrow "\t["football", "basketball"]""")

df = pd.read_csv(csvfile, sep = '\t', engine='python')

df.loc[:, 'category'] = df.loc[:, 'category'].apply(literal_eval)

然后爆炸關于您的類別列：

df = df.explode('category')

最后，您可以將groupby其用作字典并將子資料幀存盤在其他地方：

dg = df.groupby('category')

list_dg = []

for n, g in dg:
    list_dg.append(g)

dgImo，如果可能的話，我會堅持下去

轉載請註明出處，本文鏈接：https://www.uj5u.com/qianduan/449011.html

標籤：Python python-3.x 熊猫数据框

上一篇：格式化條件陳述句的最佳方法

下一篇：檢查df中的字典是否相等