如何創建一個基于現有列中存在的搜索詞創建新資料框列的函式？-有解無憂

我正在嘗試創建一個足夠靈活的函式來輸出string基于現有 DataFrame 列中的搜索詞的函式。我得到了一個輸出，但似乎第一個輸出之后的每個輸出都與以前的輸出鏈接在一起（以前的輸出與新的輸出重復）。我該如何糾正？我計劃擴展功能以包含更多for loops. 也許有一種更有效的方法可以做到這一點。

# declarations
search_words = ['one', 'two', 'three']
l1 = []

#Function
def concat(text):
    for i in search_words[0:1]:
        if i in text:
            a = 'four'
            l1.append(a)
    for i in search_words[1:3]:
        if i in text:
            b = 'five'
            l1.append(b)
    listToStr = ' '.join(map(str, l1))
    return listToStr

# Test Dataframe
dftest = pd.DataFrame(data =['one filler two','two','filler','three one'], 
                      columns = ['col1'])

# Test output
dftest['col2'] = dftest['col1'].apply(lambda x: concat(x))
dftest

給出錯誤的輸出：

    col1               col2
0   one filler two     four five
1   two                four five five
2   filler             four five five
3   three one          four five five four five

期望的輸出：

    col1               col2
0   one filler two     four five
1   two                five
2   filler             
3   three one          five four

uj5u.com熱心網友回復：

l1每次呼叫時都必須定義一個新的concat：

def concat(text):
    l1 = []
    for i in search_words[0:1]:
        if i in text:
            l1.append('four')
    for i in search_words[1:3]:
        if i in text:
            l1.append('five')
    listToStr = ' '.join(l1)
    return listToStr

此外，當您申請時concat，您不需要 lambda：

dftest['col2'] = dftest['col1'].apply(concat)

輸出：

             col1       col2
0  one filler two  four five
1             two       five
2          filler           
3       three one  four five

uj5u.com熱心網友回復：

一種更簡單的方法可能是：

dict_assignment = {
    "one": "four",
    "two": "five",
    "three": "five",
}

dftest["col2"] = dftest.col1.apply(
    lambda p: ' '.join(dict_assignment[w] for w in p.split() if w in search_words)
)

print(dftest)

#              col1       col2
# 0  one filler two  four five
# 1             two       five
# 2          filler           
# 3       three one  five four

uj5u.com熱心網友回復：

使用apply呼叫包含回圈的函式將非常低效。

改用矢量代碼和映射單詞的字典，然后加入原始資料幀。

d = {'one':'four', 'two':'five', 'three':'five'}

df2 = dftest.join(
 dftest['col1']
 .str.extractall(f"({'|'.join(d)})")[0]
 .map(d)
 .groupby(level=0).agg(' '.join)
 .rename('col2')
 )

注意。我在這里使用了一個簡單的正則運算式，但是如果您的查詢詞中有特殊字符，您可能需要使用re.escape. 請更新這個例子是這樣的。

輸出：

             col1       col2
0  one filler two  four five
1             two       five
2          filler        NaN
3       three one  five four

轉載請註明出處，本文鏈接：https://www.uj5u.com/shujuku/424720.html

標籤：Python 熊猫数据框功能 for循环

上一篇：創建遍歷兩個串列的新串列

下一篇：帶有兩個嵌套for回圈的回圈輸出無限回圈，我做錯了什么？