將一列中的串列拆分為多列-有解無憂

我有兩個 DataFrame，如下所示：

df1.shape = (4,2)

文本	話題
今晚的派對在哪里？	派對
讓我們跳舞吧	派對
你好世界	其他
今天下雨	天氣

df2.shape(4,2)

0	1
今晚的派對在哪里？	[-0.011570500209927559, -0.010117080062627792,….,0.062448356]
讓我們跳舞吧	[-0.08268199861049652, -0.0016140303341671824,….,0.02094201]
你好世界	[-0.0637684240937233，-0.01590338535606861，……，0.02094201]
今天下雨	[0.06379614025354385，-0.02878064103424549，……，0.056790903]

基本上df2是每個句子的嵌入，df1其中有一個與之相關的主題。嵌入在“第 1 列”中，df2其中有一串大小為 512 的正整數或負整數串列。

我想要的輸出 DataFrame 是：

df_output.shape = (4,514)

文本	話題	特征_0	Feature_2	……	Feature_511
今晚的派對在哪里？	派對	-0.0115705	-0.01011708	……	0.0624484
讓我們跳舞吧	派對	-0.082681999	-0.00161403	……	0.020942
你好世界	其他	-0.063768424	-0.01590338535606861,	……	0.020942
今天下雨	天氣	0.06379614	-0.028780641	……	0.056790903

How can I get this done. I was trying to split the embeddings in the DataFrame df2 into columns but it doesn't work for me. This is what I have done so far:

df2.join(pd.DataFrame(df2["1"].values.tolist()).add_prefix('feature_'))

It just created a duplicate column 1 as feature_0. I haven't even reached to the stage where I can work to join df1 and df2.

uj5u.com熱心網友回復：

您可以映射ast.literal_eval到df2["1"]; 構建一個 DataFrame 并將join其用于df1：

import ast
out = df1.join(pd.DataFrame(map(ast.literal_eval, df2["1"].tolist())).add_prefix('feature_'))

輸出：

                          Text    Topic  feature_0  feature_1  feature_2
0  Where is the party tonight?    Party  -0.011571  -0.010117   0.062448
1                  Let's dance    Party  -0.082682  -0.001614   0.020942
2                  Hello world    Other  -0.063768  -0.015903   0.020942
3            It is rainy today  Weather   0.063796  -0.028781   0.056791

uj5u.com熱心網友回復：

這個怎么樣？

import pandas as pd

data = {0: ['Where is the party tonight?', "Let's dance", 'Hello world', 'It is rainy today'],
1:[[-0.011570500209927559, -0.010117080062627792,0.062448356],[-0.08268199861049652, -0.0016140303341671824,0.02094201],[-0.0637684240937233, -0.01590338535606861,0.02094201],[0.06379614025354385, -0.02878064103424549,0.056790903]]}
df = pd.DataFrame(data)

# mind NO values here
exploded_df = pd.DataFrame(df[1].to_list(), index=df.index)

print(exploded_df.head())

輸出：

          0         1         2
0 -0.011571 -0.010117  0.062448
1 -0.082682 -0.001614  0.020942
2 -0.063768 -0.015903  0.020942
3  0.063796 -0.028781  0.056791

然后，您可以加入兩個 dfs。

轉載請註明出處，本文鏈接：https://www.uj5u.com/ruanti/440131.html

標籤：python pandas list dataframe data-manipulation

上一篇：POSTMANPUT請求未更新值

下一篇：微服務工程中，基礎組件應用