如何在另一列上創建具有條件的新列？-有解無憂

我想cat_month在我的expeditions資料框中創建一列。此列將包含山地類別（小、中或大），我想根據highpoint_metres列中包含的高度分配一個類別（四分位數：小 = 高度低于第一個四分位數），但我無法做到做吧。

資料：

import pandas as pd
expeditions = pd.read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-22/expeditions.csv")

我試過的：

peaks[cat_monts] = 
for peak_id in expeditions : 
 if "highpoint_metres" < 6226.5 : #1er quartile 
  return "petite montagne"
elif 6226.5<"highpoint_metres" <7031.25:
  return "moyenne montagne"
else : 
 return "grande montagne"

uj5u.com熱心網友回復：

Use np.selectwhich 接受條件串列、其對應值串列和默認（“else”）值。

條件按順序進行評估，因此您可以使用它：

conditions = {
    'moyenne montagne': expeditions['highpoint_metres'] < 7031.25,
    'petite montagne': expeditions['highpoint_metres'] < 6226.5,
}
expeditions['cat_month'] = np.select(conditions.values(), conditions.keys(), default='grande montagne')

輸出：

      expedition_id  ...  highpoint_metres  ...         cat_month
0         ANN260101  ...            7937.0  ...   grande montagne
1         ANN269301  ...            7937.0  ...   grande montagne
2         ANN273101  ...            7937.0  ...   grande montagne
3         ANN278301  ...            7000.0  ...  moyenne montagne
4         ANN279301  ...            7160.0  ...   grande montagne
...             ...  ...               ...  ...               ...
10359     PUMO19101  ...            7138.0  ...   grande montagne
10360     PUMO19102  ...            7138.0  ...   grande montagne
10361     PUTH19101  ...            6350.0  ...  moyenne montagne
10362     RATC19101  ...            6600.0  ...  moyenne montagne
10363     SANK19101  ...            6452.0  ...  moyenne montagne

uj5u.com熱心網友回復：

我認為np.select()上面的方法可能更好，但我已經在研究這個，所以我想我會分享。

您可以創建一個函式，然后使用該函式df.apply()創建新列。

def func(row):
    height = row['height_metres'] # your actual dataframe had this called 'highpoint_metres', not 'height_metres'
    if height < 6226.5: 
        return 'petite montagne'
    elif height < 7031.25:
        return 'moyenne montagne'
    else:
        return 'grande montagne'
df['cat_monts'] = df.apply(func,axis=1)

另外，請注意新的最后一列df['cat_monts']在列名周圍加上引號。您希望名為該字串的列，而不是從具有該名稱的變數的值中獲取其名稱的列。

uj5u.com熱心網友回復：

def peaks(x):
    if x < 6226.5 :
        return "petite montagne"
    elif 6226.5 < x < 7031.25:
        return "moyenne montagne"
    else :
        return "grande montagne"

    
expeditions['cat_month'] = expeditions['highpoint_metres'].apply(lambda x: peaks(x))

uj5u.com熱心網友回復：

一種選擇是case_when來自pyjanitor以下方面的功能：

# pip install pyjanitor
import pandas as pd
import janitor
expeditions.case_when(
    # condition, value if True
    expeditions.highpoint_metres < 7031.25, 'moyenne montagne',
    expeditions.highpoint_metres < 6226.5, 'petite montagne',
    'grande montagne', # default if False
    column_name = 'cat_month')

對于這種情況，比case_whenor更快的選擇np.select是使用分箱方法，其中pd.cut：

binned_data = pd.cut(expeditions.highpoint_metres, 
                     bins=[0, 6226.5, 7031.25, np.inf], 
                     right = False, 
                     labels = ["petite montagne", "moyenne montagne", "grande montagne"])

expeditions.assign(cat_month = binned_data)

請注意，cat_month分箱方法是一個分類列

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/376275.html

標籤：Python 熊猫数据框

上一篇：迭代資料框進行計算

下一篇：在查找資料框中從句子中洗掉單詞