基于2列合并兩個資料集，或在第一個資料框中查找缺失值并從另一個資料框中填充缺失值-有解無憂

我有 2 個熊貓資料框，有 2 列，即索引和日期。第一個資料幀中缺少一些日期，這些值可以從與索引對應的第二個資料幀中獲得。我嘗試使用 pd.concat、pd.merge 和 pd.join 等，但這些似乎并沒有給我想要的結果。這是桌子。

df1 = 基于 2 列合并兩個資料集，或在第一個資料框中查找缺失值并從另一個資料框中填充缺失值

df2 = 基于 2 列合并兩個資料集，或在第一個資料框中查找缺失值并從另一個資料框中填充缺失值

uj5u.com熱心網友回復：

您是否嘗試過 df1 = df1.update(df2)？

雖然更新函式不會增加 df1 的大小，但它只會更新缺失值或已經存在的值。

uj5u.com熱心網友回復：

由于沒有可重現的資料框，我嘗試在生成的資料上運行下面的代碼，但我認為它也適用于您的代碼：

import pandas as pd
df1 = pd.DataFrame({"date": [None, None, None, "01/01/2022"], "index":[402,402,403,404]})
df2 = pd.DataFrame({"date": ["16/05/2020", "18/07/2021", "13/08/2022", "26/07/2020"], "index":[402,405,403,404]})
df1.set_index("index", inplace=True)
df2.set_index("index", inplace=True)
for index, row in df1.iterrows():
  if row["date"] != row["date"] or row["date"] == None:
    df1.loc[index , "date"] = df2.loc[index]["date"]
df1

輸出

指數	日期
402	16/05/2020
402	16/05/2020
403	13/08/2022
404	2022 年 1 月 1 日

請注意，row["date"] != row["date"]當單元格的值是nan并且具有浮點型別時使用。nan價值觀甚至不等于自己！

uj5u.com熱心網友回復：

你可以試試這個解決方案：

import pandas as pd
import numpy as np

# initialize list of lists
df1 = [[402, '15/05/2020'], [408, np.nan], [408, '14/05/2020']]
df2 = [[402, '16/05/2020'], [408, '10/05/2020'], [409, '13/05/2020']]

# Create the pandas DataFrame
df1 = pd.DataFrame(df1, columns=['index', 'date'])
df2 = pd.DataFrame(df2, columns=['index', 'date'])

df1.set_index("index", inplace=True)
df2.set_index("index", inplace=True)
for index, row in df1.iterrows():
    if row["date"] != row["date"]:
        row["date"] = df2.loc[index]["date"]

輸出：

index            
402    15/05/2020
408    10/05/2020
408    14/05/2020

使用此解決方案，只有日期已更新nan或null已使用其他資料幀上的相應值更新的行。

轉載請註明出處，本文鏈接：https://www.uj5u.com/yidong/430361.html

標籤：Python python-3.x 熊猫数据框数据科学

上一篇：在本教程中接收“TypeError:__init__()gotanunexpectedkeywordargument'basey'”

下一篇：如何在嵌套串列中搜索元素并回傳單個陳述句（如果元素）？