使用numpy.vectorize或DataFrame.apply通過列串列傳遞函式？-有解無憂

我有以下資料框

df = pd.DataFrame(data= {'Product_JP': ['????- ??? C225G','????????','?????????????-','???????-?','???????????????'],
                  'Value1': [1,12313,1.123,0.112,0],
                  'Metric1_JP': ['?-???????(販売金額(x1000))','加重販売率(販売金額)','????販売店當り(販売個數)','加重販売率(販売金額)','加重販売率(販売金額)'],
                  'Type_JP': ['サルサソ?ス','ケチャップ','ケチャップ','ケチャップ','ケチャップ'],
                  'SKU': [4582152498325,4582112498325,4500152498325,4582112398325,4582152483125]},
                 )


        Product_JP     Value1              Metric1_JP Type_JP            SKU
0  ????- ??? C225G      1.000  ?-???????(販売金額(x1000))  サルサソ?ス  4582152498325
1         ????????  12313.000             加重販売率(販売金額)   ケチャップ  4582112498325
2   ?????????????-      1.123         ????販売店當り(販売個數)   ケチャップ  4500152498325
3        ???????-?      0.112             加重販売率(販売金額)   ケチャップ  4582112398325
4  ???????????????      0.000             加重販売率(販売金額)   ケチャップ  4582152483125

我可以使用以下功能應用 df.apply()

from deep_translator import (GoogleTranslator)
df['Product_EN'] = df['Product_JP'].apply(lambda row:GoogleTranslator(source='ja', target='en').translate(row))

        Product_JP     Value1              Metric1_JP Type_JP            SKU  \
0  ????- ??? C225G      1.000  ?-???????(販売金額(x1000))  サルサソ?ス  4582152498325   
1         ????????  12313.000             加重販売率(販売金額)   ケチャップ  4582112498325   
2   ?????????????-      1.123         ????販売店當り(販売個數)   ケチャップ  4500152498325   
3        ???????-?      0.112             加重販売率(販売金額)   ケチャップ  4582112398325   
4  ???????????????      0.000             加重販売率(販売金額)   ケチャップ  4582152483125   

             Product_EN  
0  Tomatoco-Salsa C225G  
1               Matthew  
2          Tomato miser  
3                 Catch  
4  Tomato miser premium

但我想要做的是傳遞一個列串列，像這樣一次性應用

JP_columns = [column for column in df.columns if '_JP' in column]
EN_columns = [column.replace('_JP', '_EN') for column in JP_columns]

df[EN_columns] = df[JP_columns].apply(lambda row:GoogleTranslator(source='ja', target='en').translate(row))

這將回傳 ValueError：“系列的真值不明確。使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()。”

我做錯了什么 df.apply()
這會更好地使用np.vectorize嗎？

例如（還回傳值錯誤：“DataFrame 的真值不明確”）

df[EN_columns] = np.vectorize(GoogleTranslator(source='ja', target='en').translate(df[JP_columns]))

謝謝

uj5u.com熱心網友回復：

Series.apply由于只有一個維度，因此將函式應用于系列中的每個單元格（行）。但是，DataFrame.apply默認情況下將整個列傳遞給函式。然而，translate期望text不是一個集合。

將函式應用于 DataFrame 中的每個單元格的函式是applymap并且可以這樣使用：

JP_columns = [column for column in df.columns if '_JP' in column]
EN_columns = [column.replace('_JP', '_EN') for column in JP_columns]

# apply to all cells in the DataFrame
df[EN_columns] = df[JP_columns].applymap(
    GoogleTranslator(source='ja', target='en').translate
)

np.vectorize也可以作業，注意pyfunc在這種情況下它需要一個作為輸入translate并回傳一個callable：

JP_columns = [column for column in df.columns if '_JP' in column]
EN_columns = [column.replace('_JP', '_EN') for column in JP_columns]

# vectorize function then call function on DataFrame
df[EN_columns] = np.vectorize(
    GoogleTranslator(source='ja', target='en').translate
)(df[JP_columns])

兩種方法都會導致df：

產品_JP	值1	指標1_JP	型別_JP	單品	產品_CN	指標1_EN	型別_EN
????- ??? C225G	1	?-?????水?(販売金額(x1000))	サルサソ?ス	4582152498325	番茄莎莎醬 C225G	市場規模（銷售額（x1000））	莎莎源
????????	12313	當代販売率(販売金額)	ケチャップ	4582112498325	馬修	加權銷售率（銷售額）	番茄醬
?????????????-	1.123	????販売店當り(販売個數)	ケチャップ	4500152498325	番茄吝嗇鬼	每件商店（售出的單位數）	番茄醬
???????-?	0.112	當代販売率(販売金額)	ケチャップ	4582112398325	抓住	加權銷售率（銷售額）	番茄醬
???????????????	0	當代販売率(販売金額)	ケチャップ	4582152483125	番茄吝嗇鬼溢價	加權銷售率（銷售額）	番茄醬

設定和匯入：

import numpy as np  # only for np.vectorize
import pandas as pd
from deep_translator import GoogleTranslator

df = pd.DataFrame({
    'Product_JP': ['????- ??? C225G', '????????', '?????????????-', '???????-?',
                   '???????????????'],
    'Value1': [1, 12313, 1.123, 0.112, 0],
    'Metric1_JP': ['?-???????(販売金額(x1000))', '加重販売率(販売金額)',
                   '????販売店當り(販売個數)', '加重販売率(販売金額)',
                   '加重販売率(販売金額)'],
    'Type_JP': ['サルサソ?ス', 'ケチャップ', 'ケチャップ', 'ケチャップ', 'ケチャップ'],
    'SKU': [4582152498325, 4582112498325, 4500152498325, 4582112398325,
            4582152483125]
})

轉載請註明出處，本文鏈接：https://www.uj5u.com/qukuanlian/310979.html

標籤：Python 熊猫矢量化申请

上一篇：使用前面的行創建一個新列，pandas

下一篇：使用Python在資料集中創建計算欄位