我正在嘗試創建一個足夠靈活的函式來輸出string基于現有 DataFrame 列中的搜索詞的函式。我得到了一個輸出,但似乎第一個輸出之后的每個輸出都與以前的輸出鏈接在一起(以前的輸出與新的輸出重復)。我該如何糾正?我計劃擴展功能以包含更多for loops. 也許有一種更有效的方法可以做到這一點。
# declarations
search_words = ['one', 'two', 'three']
l1 = []
#Function
def concat(text):
for i in search_words[0:1]:
if i in text:
a = 'four'
l1.append(a)
for i in search_words[1:3]:
if i in text:
b = 'five'
l1.append(b)
listToStr = ' '.join(map(str, l1))
return listToStr
# Test Dataframe
dftest = pd.DataFrame(data =['one filler two','two','filler','three one'],
columns = ['col1'])
# Test output
dftest['col2'] = dftest['col1'].apply(lambda x: concat(x))
dftest
給出錯誤的輸出:
col1 col2
0 one filler two four five
1 two four five five
2 filler four five five
3 three one four five five four five
期望的輸出:
col1 col2
0 one filler two four five
1 two five
2 filler
3 three one five four
uj5u.com熱心網友回復:
l1每次呼叫時都必須定義一個新的concat:
def concat(text):
l1 = []
for i in search_words[0:1]:
if i in text:
l1.append('four')
for i in search_words[1:3]:
if i in text:
l1.append('five')
listToStr = ' '.join(l1)
return listToStr
此外,當您申請時concat,您不需要 lambda:
dftest['col2'] = dftest['col1'].apply(concat)
輸出:
col1 col2
0 one filler two four five
1 two five
2 filler
3 three one four five
uj5u.com熱心網友回復:
一種更簡單的方法可能是:
dict_assignment = {
"one": "four",
"two": "five",
"three": "five",
}
dftest["col2"] = dftest.col1.apply(
lambda p: ' '.join(dict_assignment[w] for w in p.split() if w in search_words)
)
print(dftest)
# col1 col2
# 0 one filler two four five
# 1 two five
# 2 filler
# 3 three one five four
uj5u.com熱心網友回復:
使用apply呼叫包含回圈的函式將非常低效。
改用矢量代碼和映射單詞的字典,然后加入原始資料幀。
d = {'one':'four', 'two':'five', 'three':'five'}
df2 = dftest.join(
dftest['col1']
.str.extractall(f"({'|'.join(d)})")[0]
.map(d)
.groupby(level=0).agg(' '.join)
.rename('col2')
)
注意。我在這里使用了一個簡單的正則運算式,但是如果您的查詢詞中有特殊字符,您可能需要使用re.escape. 請更新這個例子是這樣的。
輸出:
col1 col2
0 one filler two four five
1 two five
2 filler NaN
3 three one five four
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/424720.html
上一篇:創建遍歷兩個串列的新串列
