我有一個包含 2 列 level1 和 level2 的資料框。級別 1 中的每個帳號都鏈接到列級別 2 中的 ParentID。對于在“level1”列中以“8409”結尾的帳戶,其中一些被映射到 level2 中的錯誤 ParentID。要找到其正確的 ParentID,您需要在 level1 中搜索,將所有以“8409”結尾的帳戶替換為“8400”。這將在同一列中找到其等效帳戶。如果找到匹配項,請復制“level2”列中的內容并將其替換為以“8409”結尾的帳戶列下。
import pandas as pd
import numpy as np
df = pd.DataFrame([[7854568400,489],
[9632588400,126],
[3699633691,189],
[9876543697,987],
[1111118409,987],
[7854568409,396],
[7854567893,897],
[9632588409,147]],
columns = ['level1','level2'])
df
下面的解決方案允許創建一個新列“new_level2”來解決上述問題。
maps = df.set_index('level1')['level2']
s = df['level1'].astype(str).str.replace('8409$', '8400', regex=True).astype('int64')
df['new_level2'] = s.map(maps).combine_first(df['level2']).convert_dtypes()
在下面的輸出中,帳戶“7854568409”的 level2 從 396 更改為 489(取自第 0 行),帳戶“9632588409”的 level2 從 147 更改為 126(取自第 1 行)。
level1 level2 new_level2
0 7854568400 489 489
1 9632588400 126 126
2 3699633691 189 189
3 9876543697 987 987
4 1111118409 987 987
5 7854568409 396 489
6 7854567893 897 897
7 9632588409 147 126
但是,當我將上述解決方案應用于其他變數時,這就是我主要在將貨幣添加到資料幀時遇到問題的地方。level2 值的替換僅適用于美元,所有其他貨幣都需要在列 level2 中保留其當前值。
老東風
df = pd.DataFrame([['USD',7854568400,489],
['USD',9632588400,126],
['USD',3699633691,189],
['USD',9876543697,987],
['EUR',1111118409,987],
['USD',1111118409,987],
['USD',7854568409,396],
['USD',7854567893,897],
['USD',9632588409,147]],
columns = ['cur','level1','level2'])
df
修訂后的DF
df = pd.DataFrame([['USD',7854568400,489],
['USD',9632588400,126],
['USD',3699633691,189],
['USD',9876543697,987],
['EUR',1111118400,120],
['EUR',1111118409,987],
['USD',1111118409,987],
['USD',7854568409,396],
['USD',7854567893,897],
['USD',9632588409,147]],
columns = ['cur','level1','level2'])
當我嘗試將上述解決方案應用于包含貨幣的新資料框時,出現以下錯誤。
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
值得注意的是,您可以跨不同貨幣擁有相同的帳號。
Desired output is below. Only 2 accounts (9632588409 & 7854568409) had their level2 changed. Index 4 and 5 should retain its original level2 value because they are euro and not in scope and Index 6 because there was no corresponding match found for this account, therefore it retains it original value.
cur level1 level2 new_level2
0 USD 7854568400 489 489
1 USD 9632588400 126 126
2 USD 3699633691 189 189
3 USD 9876543697 987 987
4 EUR 1111118400 120 120
5 EUR 1111118409 987 987
6 USD 1111118409 987 987
7 USD 7854568409 396 489
8 USD 7854567893 897 897
9 USD 9632588409 147 126
Any help is greatly appreciated.
uj5u.com熱心網友回復:
df = pd.DataFrame([['USD',7854568400,489],
['USD',9632588400,126],
['USD',3699633691,189],
['USD',9876543697,987],
['EUR',1111118400,120],
['EUR',1111118409,987],
['USD',1111118409,987],
['USD',7854568409,396],
['USD',7854567893,897],
['USD',9632588409,147]],
columns = ['cur','level1','level2'])
df['level1'] = df['level1'].astype(str).str.replace('8409$', '8400', regex=True).astype('int64')
df['new_col'] = df.where(df['cur'] == 'USD').groupby(['level1', 'cur'])['level2']\
.transform('first').fillna(df['level2']).astype(int)
print(df)
cur level1 level2 new_col
0 USD 7854568400 489 489
1 USD 9632588400 126 126
2 USD 3699633691 189 189
3 USD 9876543697 987 987
4 EUR 1111118400 120 120
5 EUR 1111118400 987 987
6 USD 1111118400 987 987
7 USD 7854568400 396 489
8 USD 7854567893 897 897
9 USD 9632588400 147 126
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/371358.html
標籤:python pandas dataframe numpy
下一篇:如何將兩列相乘并顯示結果
