Pandas：創建新列并根據字串列中的值（子字串）和另一列上的值添加值-有解無憂

如果這是一個重復的問題，我很抱歉，在我覺得必須發布問題之前，我確實四處尋找。

我正在嘗試devicevalue根據另外 2 列的值在新列中分配一個值。我的資料框看起來有點像這樣；

devicename           make     devicevalue
switch1               cisco        0
switch1-web100        netgear      0  
switch10              cisco        0
switch23              cisco        1
switch31-web200       netgear      0
switch31              cisco        1
switch41-new          cisco        1
switch40e             cisco        1
switch31-web200-new   netgear      0
switch40e             cisco        1
switch11-data100e     netgear      0

我正在嘗試根據這些標準添加一個值；

如果make == netgear（設定為 0）
如果 switch 后的值是 20 或更大（設定為 1，否則設定為 0）

（如果兩個條件都滿足，則設定為0，即“make == netgear設定為0”的條件優先。請注意，這與現有代碼不同，如果兩個條件都滿足，則第二個條件覆寫（并覆寫結果值）。）

我最初得到了一些幫助，但是現在有些設備有一個-newandp或aor e，它破壞了在字串末尾查看數字的代碼

我使用的代碼本質上是；

def get_number_suffix(devicename: str) -> int:
    i = 1
    while i < len(devicename) and devicename[-i:].isnumeric():
        i  = 1

    return int(devicename[-(i-1):])


def compute_devicevalue(row) -> int:
    if 'netgear' in row['make']:
        return 0
    if 20 <= get_number_suffix(row['devicename']):
        return 1
    else:
        return 0

df['devicevalue'] = df.apply(compute_devicevalue, axis=1)

這在一些命名結束的新添加之前運行良好，現在它顯然中斷了。我嘗試了各種方法，但我找不到一種體面的方法來忽略-new和p或a或e

編輯

對不起，我完全搞砸了我想問的，我試圖根據'switch'.

本質上是在將字串轉換為整數時使用現有代碼，并且它len是否落在任何具有-new和p或a或e跟隨它的名稱上

舉個例子說

ValueError: 基數為 10 的 int() 的文字無效：'switch23-new'

uj5u.com熱心網友回復：

您可以使用.locand str.extract()，如下所示：

df['devicevalue'] = 0     # init value to 0

# Set to 1 if the value after 'switch' >= 20. 
# Otherwise part is set during init to 0 at the first statement
df.loc[df['devicename'].str.extract(r'switch(\d )', expand=False).astype(float) >= 20, 'devicevalue'] = 1

# Set to 0 if `make` == 'netgear'
df.loc[df['make'] == 'netgear', 'devicevalue'] = 0 
# If you have 2 or more values of `make` to match, use, e.g.:
#df.loc[df['make'].isin(['netgear', 'dell']), 'devicevalue'] = 0

無論數字在末尾還是中間，正則運算式都r'switch(\d )'可以與它們一起str.extract()提取數字'switch'。因此，它解決了您之前將數字放在最后的問題，現在在中間。

結果：

             devicename     make  devicevalue
0               switch1    cisco            0
1        switch1-web100  netgear            0
2              switch10    cisco            0
3              switch23    cisco            1
4       switch31-web200  netgear            0
5              switch31    cisco            1
6          switch41-new    cisco            1
7             switch40e    cisco            1
8   switch31-web200-new  netgear            0
9             switch40e    cisco            1
10    switch11-data100e  netgear            0

uj5u.com熱心網友回復：

我嘗試使用正則運算式從字串中提取數字，例如這里。

為簡單起見，我將您的資料框轉換為串列

a = [{"devicename" : "switch1","make": "cisco", "devicevalue" :0}, {"devicename" : "switch1-web100", "make" : "netgear", "devicevalue" :0}, {"devicename" : "switch10" , "make" : "cisco", "devicevalue" :0}.... ]

然后我用這個函式來做到這一點：

import re

def clean_data(data):
    for i in range(len(data)): #remove this if using dataframe row
        row = data[i] #Dict
        if row["make"] == "netgear":
            row["devicevalue"] = 0
        
        tmp = -1
        if "web" in row["devicename"]:
            tmp = [int(s) for s in re.findall(r'\d ', row["devicename"].split("web")[1])][0]
        elif "data" in row["devicename"]:
            tmp = [int(s) for s in re.findall(r'\d ', row["devicename"].split("data")[1])][0]

        if tmp >= 200:
            row["devicevalue"] = 0
        elif tmp == -1:
            pass #Nothing to change

        data[i] = row 
    return data #remove this and return row

我得到以下

[{'devicename': 'switch1', 'make': 'cisco', 'devicevalue': 0}, {'devicename': 'switch1-web100', 'make': 'netgear', 'devicevalue': 0}, {'devicename': 'switch10', 'make': 'cisco', 'devicevalue': 0}, {'devicename': 'switch23', 'make': 'cisco', 'devicevalue': 1}, {'devicename': 'switch31-web200', 'make': 'netgear', 'devicevalue': 0}, {'devicename': 'switch31', 'make': 'cisco', 'devicevalue': 1}, {'devicename': 'switch40', 'make': 'cisco', 'devicevalue': 1}, {'devicename': 'switch23', 'make': 'cisco', 'devicevalue': 1}, {'devicename': 'switch31-web200-new', 'make': 'netgear', 'devicevalue': 0}, {'devicename': 'switch31-web100a', 'make': 'cisco', 'devicevalue': 1}, {'devicename': 'switch40', 'make': 'cisco', 'devicevalue': 1}, {'devicename': 'switch11-data100e', 'make': 'cisco', 'devicevalue': 1}]

由于您正在發送資料幀行，因此洗掉外部回圈并回傳代碼中的行而不是資料

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/313618.html

標籤：Python 蟒蛇-3.x 熊猫文件

上一篇：使用python中的名稱匹配模式迭代檔案夾中的特定檔案

下一篇：查找兩個csv檔案之間的共同值