我有兩個 DataFrame,如下所示:
df1.shape = (4,2)
| 文本 | 話題 |
|---|---|
| 今晚的派對在哪里? | 派對 |
| 讓我們跳舞吧 | 派對 |
| 你好世界 | 其他 |
| 今天下雨 | 天氣 |
df2.shape(4,2)
| 0 | 1 |
|---|---|
| 今晚的派對在哪里? | [-0.011570500209927559, -0.010117080062627792,….,0.062448356] |
| 讓我們跳舞吧 | [-0.08268199861049652, -0.0016140303341671824,….,0.02094201] |
| 你好世界 | [-0.0637684240937233,-0.01590338535606861,……,0.02094201] |
| 今天下雨 | [0.06379614025354385,-0.02878064103424549,……,0.056790903] |
基本上df2是每個句子的嵌入,df1其中有一個與之相關的主題。嵌入在“第 1 列”中,df2其中有一串大小為 512 的正整數或負整數串列。
我想要的輸出 DataFrame 是:
df_output.shape = (4,514)
| 文本 | 話題 | 特征_0 | Feature_2 | …… | Feature_511 |
|---|---|---|---|---|---|
| 今晚的派對在哪里? | 派對 | -0.0115705 | -0.01011708 | …… | 0.0624484 |
| 讓我們跳舞吧 | 派對 | -0.082681999 | -0.00161403 | …… | 0.020942 |
| 你好世界 | 其他 | -0.063768424 | -0.01590338535606861, | …… | 0.020942 |
| 今天下雨 | 天氣 | 0.06379614 | -0.028780641 | …… | 0.056790903 |
How can I get this done. I was trying to split the embeddings in the DataFrame df2 into columns but it doesn't work for me. This is what I have done so far:
df2.join(pd.DataFrame(df2["1"].values.tolist()).add_prefix('feature_'))
It just created a duplicate column 1 as feature_0. I haven't even reached to the stage where I can work to join df1 and df2.
uj5u.com熱心網友回復:
您可以映射ast.literal_eval到df2["1"]; 構建一個 DataFrame 并將join其用于df1:
import ast
out = df1.join(pd.DataFrame(map(ast.literal_eval, df2["1"].tolist())).add_prefix('feature_'))
輸出:
Text Topic feature_0 feature_1 feature_2
0 Where is the party tonight? Party -0.011571 -0.010117 0.062448
1 Let's dance Party -0.082682 -0.001614 0.020942
2 Hello world Other -0.063768 -0.015903 0.020942
3 It is rainy today Weather 0.063796 -0.028781 0.056791
uj5u.com熱心網友回復:
這個怎么樣?
import pandas as pd
data = {0: ['Where is the party tonight?', "Let's dance", 'Hello world', 'It is rainy today'],
1:[[-0.011570500209927559, -0.010117080062627792,0.062448356],[-0.08268199861049652, -0.0016140303341671824,0.02094201],[-0.0637684240937233, -0.01590338535606861,0.02094201],[0.06379614025354385, -0.02878064103424549,0.056790903]]}
df = pd.DataFrame(data)
# mind NO values here
exploded_df = pd.DataFrame(df[1].to_list(), index=df.index)
print(exploded_df.head())
輸出:
0 1 2
0 -0.011571 -0.010117 0.062448
1 -0.082682 -0.001614 0.020942
2 -0.063768 -0.015903 0.020942
3 0.063796 -0.028781 0.056791
然后,您可以加入兩個 dfs。
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/440131.html
標籤:python pandas list dataframe data-manipulation
上一篇:POSTMANPUT請求未更新值
下一篇:微服務工程中,基礎組件應用
