Pandas基于日期時間索引重新排列和插入時間序列-有解無憂

我有一個反復出現的問題，我每次都無法優雅地解決它，我無法找到一個好的方法來解決它。假設我有一個索引中包含日期時間的資料框，每 3 小時 (df1) 跨度一次。我每天都有另一個資料幀（df2）。

我想做兩件事：

通過計算一天中每 3 小時周期的平均值，對 df1 重新采樣以跨越每天而不是每 3 小時。
為將丟失的任何一天插入 df2，并將該天添加到它所屬的位置。

問題：我使用 for 回圈（并希望避免這種情況）并且對缺失天數的重新采樣不完整（只能屬性 1 個值）。

這就是我的做法：

import numpy as np
import pandas as pd
from datetime import *

# Create df1
rng = pd.date_range('2000-01-01', periods=365*(24/3), freq='3H')
df1 = pd.DataFrame({'Val': np.random.randn(len(rng)) }, index = rng)

# Create df2 and drop a few rows
rng2 = pd.date_range('2000-01-01', periods=365, freq='D')
df2 = pd.DataFrame({'Val': np.random.randn(len(rng2)) },index = rng2)
df2 = df2.drop([datetime(2000,1,5),datetime(2000,1,24)])

# Create reference timelist 
date_list = [datetime(2000,1,1)   timedelta(days=x) for x in range(365)]


# Calculate the daily mean of df1:
# We create an array hosting the resampled values of df1
arr = []
c = 1

# Loop that appends the array everytime we hit a new day, and calculate a mean of the day that passed
for i in range(1,len(df1)):

    if c < 365 and df1.index[i] == date_list[c]:
        arr.append(np.mean(df1[i-8:i])[0])
        c = c   1

# Calculate the last value of the array
arr.append(np.mean(df1[i-7:i 1])[0])

# Create a new dataframe hosting the daily values from df1
df3 = pd.DataFrame({'Val': arr}, index = rng2)


# Replace missing days in df2
df2 = df2.reindex(date_list, fill_value=0)
df2 = df2.resample('D').interpolate(method='linear') # but this does not work

uj5u.com熱心網友回復：

我認為這兩個問題都有兩個簡單的修復方法；你只需要更新你resample對兩者的使用。

第一點：只需重新采樣

您的第一點恰恰是使用resample. 您可以將整個創作替換為df3：

df1.resample('D').mean()

這將平均每天所有 3 小時的時間段。為了確認，我們可以檢查您的結果是否與我提出的相同：

>>> all(df1.resample('D').mean().round(8) == df3.round(8))
True

請注意，我必須四舍五入，因為您的代碼和resample;之間存在浮點錯誤。但他們非常接近。

第二點：不要先重新索引

當您在第二種情況下進行插值以填補缺失的天數時，您仍然希望有缺失的天數來填補！又名，如果您reindex首先用填充值0，則插值“失敗”，因為它找不到任何要插值的內容。因此，如果我正確理解您的問題，您只想洗掉該reindex行：

# df2 = df2.reindex(date_list, fill_value=0)
df2 = df2.resample('D').interpolate(method='linear')

所以如果你從df2這樣開始：

>>> df.head(10)
                 Val
2000-01-01  0.235151
2000-01-02  1.279017
2000-01-03 -1.267074
2000-01-04 -0.270182 # the fifth is missing
2000-01-06  0.382649
2000-01-07  0.120253
2000-01-08 -0.223690
2000-01-09  1.379003
2000-01-10 -0.477681
2000-01-11  0.619466

你以這個結束：

>>> df2.head(10)
                 Val
2000-01-01  0.235151
2000-01-02  1.279017
2000-01-03 -1.267074
2000-01-04 -0.270182
2000-01-05  0.056233 # the fifth is here, halfway between 4th and 6th
2000-01-06  0.382649
2000-01-07  0.120253
2000-01-08 -0.223690
2000-01-09  1.379003
2000-01-10 -0.477681

轉載請註明出處，本文鏈接：https://www.uj5u.com/ruanti/333460.html

標籤：Python 熊猫约会时间熊猫重新采样

上一篇：如何為熊貓資料框中的每一行映射/替換列中的多個值

下一篇：迭代資料框的行并創建新的資料框