這個問題看起來很簡單,但我遇到了很多麻煩,而且還沒有在任何地方看到它。我有一列在每一行中包含一個不同的串列,我想要做的就是根據特定值是否在該串列中創建一個新列。資料如下所示:
Col1
[5,6,23,7,20,21]
[0,7,20,21]
[3,4,5,23,7,20,21]
[2,3,23,7,20,21]
[3,4,5,23,7,20,21]
每個號碼對應一個特定的值,所以0 = 'apple',2 = 'grape'等...
雖然每個串列中有多個值,但我實際上只是在尋找某些值,特別是 0, 2, 4, 6, 16, 17
所以我想要做的是添加一個新列,其值對應于在Col1.
這就是解決方案應該是什么:
Col1 Col2
[5,6,23,7,20,21] Pear
[0,7,20,21] Apple
[3,4,5,23,7,20,21] Watermelon
[2,3,23,7,20,21] Grape
[16,20,21] Pineapple
我努力了:
df['Col2'] = np.where(0 in df['Col1'], 'Apple',
np.where(2 in df['Col1'], 'Grape',
np.where(4 in df['Col1'], 'Watermelon', )
依此類推...但這將所有值默認為 Apple
Col1 Col2
[5,6,23,7,20,21] Apple
[0,7,20,21] Apple
[3,4,5,23,7,20,21] Apple
[2,3,23,7,20,21] Apple
[16,20,21] Apple
通過將上述內容放入for回圈中,我能夠成功地做到這一點,但是我遇到了問題。代碼:
df['Col2'] = ''
for i in range(0,df.shape[0]):
df['Col2'][i] = np.where(0 in df['Col1'][i], 'Apple',
np.where(2 in df['Col1'][i], 'Grape',
np.where(4 in df['Col1'][i], 'Watermelon', )
我得到了我正在尋找的結果,但我遇到了警告:
<ipython-input-638-5dfd74b69688>:4: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
我認為警告是因為我已經創建了空白列,但我這樣做的唯一原因是因為如果我沒有創建它會出錯。此外,當我嘗試執行一個簡單的 時df['Col2'].value_counts(),我收到一個錯誤:TypeError: unhashable type: 'numpy.ndarray'. value_counts()即使我收到此錯誤,結果仍然顯示,這很奇怪。
我不完全確定如何繼續,我已經嘗試了很多其他方法來創建這個專欄,但沒有一個能夠作業。任何建議表示贊賞!
uj5u.com熱心網友回復:
使用explode:
d = {0: 'Apple', 2: 'Grape', 4: 'Watermelon', 6: 'Banana', 16: 'Pear', 17: 'Orange'}
df['Col2'] = df['Col1'].explode().map(d).dropna().groupby(level=0).apply(', '.join)
print(df)
# Output:
Col1 Col2
0 [5, 6, 23, 7, 20, 21] Banana
1 [0, 7, 20, 21] Apple
2 [3, 4, 5, 23, 7, 20, 21] Watermelon
3 [2, 3, 23, 7, 20, 21] Grape
4 [3, 4, 5, 23, 7, 20, 21] Watermelon
uj5u.com熱心網友回復:
遍歷串列值并將它們映射到正確的水果,忽略不需要的水果。如果沒有匹配項,則設定為 NaN。使用str.join包括多個匹配的可能性。
要按行應用此邏輯,請使用 Series.apply
import numpy as np
mapping = {0: 'Apple', 2: 'Grape', 4: 'Watermelon'}
df['Col2'] = df['Col1'].apply(lambda lst: ', '.join(mapping[n] for n in lst if n in mapping) or np.nan)
輸出:
>>> df
Col1 Col2
0 [5, 6, 23, 7, 20, 21] NaN
1 [0, 7, 20, 21] Apple
2 [3, 4, 5, 23, 7, 20, 21] Watermelon
3 [2, 3, 23, 7, 20, 21] Grape
4 [3, 4, 5, 23, 7, 20, 21] Watermelon
表現
請注意,這應該比 Corralien 的解決方案更快。
設定:
df = pd.DataFrame({
'Col1': [[5, 6, 23, 7, 20, 21],
[0, 7, 20, 21],
[3, 4, 5, 23, 7, 20, 21],
[2, 3, 23, 7, 20, 21],
[3, 4, 5, 23, 7, 20, 21]]
})
mapping = {0: 'Apple', 2: 'Grape', 4: 'Watermelon'}
def number_to_fruit(lst):
return ', '.join(mapping[n] for n in lst if n in mapping) or np.nan
# Simulate a large DataFrame
n = 20000
df = pd.concat([df]*n, ignore_index=False)
>>> df.shape
(100000, 1)
時間:
# Using apply. (I've added dropna for a more fair comparison)
>>> %timeit -n 10 df['Col1'].apply(number_to_fruit).dropna()
116 ms ± 7.74 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# Corralien's solution
>>> %timeit -n 10 df['Col1'].explode().map(mapping).dropna().groupby(level=0).apply(', '.join)
710 ms ± 71 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/381277.html
