如何替換空白值和按日期重新排列的資料？-有解無憂

我有一個 CSV 檔案，其中包含 1988-2020 的值。檔案中的前兩列包含年和月 (1988,1 .... 2020,12)。其余列包含該月每個日期的相應值。但是，特定日期存在缺失值，并且某些月份也缺失。缺失值需要替換為“NA”或“-99”，還需要考慮閏年。我需要將此資料轉換為以下格式，而不要跳過從 01-01-1988 到 31-12-2020 的日期。

Column1,Column2
“dd-mm-yyyy”, “Value”

示例資料

uj5u.com熱心網友回復：

在這里，我嘗試將您的資料重塑為從 01-01-1988 到 31-12-2020 的行，并NaN在每個缺失的月份中添加缺失的天數。可能錯過了一些東西，所以不要猶豫，自己檢查一下，如果有問題，請告訴我：

import pandas as pd
import numpy as np

# parse first two columns as dates
df = pd.read_csv('/path_to_file/Data.csv', parse_dates=[[0, 1]], header=None)

# rename columns
df.columns = range(0, 33)

# inspecting the df has shown that the row with index 2016-12-01 is invalid
# and the column 32 is empty  
df = df.drop(columns=32).set_index(0).drop(index='2016-12-01')

# fill missing data in non-month-end days with -99
df.iloc[:, 0:28] = df.iloc[:, 0:28].fillna(value=-99, axis=0)

# reformat index
df.index = df.index.strftime('%m-%Y')

print(df)

中間輸出：

           1     2     3     4     5     6     7     8     9    10  ...   22  \
0                                                                   ...        
01-1988   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  0.0  ...  0.0   
02-1988   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  0.0  ...  0.0   
03-1988   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  0.0  ...  0.0   
04-1988   0.0   0.0   0.0   0.0   0.0   0.0  36.1   0.0   0.0  0.0  ...  0.0   
01-1990   9.8  13.0   0.0   0.0   0.0   0.0  16.8  26.0  10.0  0.0  ...  0.0   
...       ...   ...   ...   ...   ...   ...   ...   ...   ...  ...  ...  ...   
08-2020   0.0   8.0  15.0   9.0   8.0  13.0   0.0  12.0   0.0  0.0  ...  0.0   
09-2020  49.0  15.0   7.0   9.0   5.0  22.0   0.0   0.0   5.0  3.0  ...  0.0   
10-2020   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  0.0  ...  0.0   
11-2020  32.0   4.0  26.0  20.0  10.0   0.0  10.0   0.0   0.0  0.0  ...  0.0   
12-2020   0.0   0.0   3.0   2.0   0.0   8.0   6.0  23.0   3.0  0.0  ...  0.0   

          23    24    25   26    27   28   29   30    31  
0                                                         
01-1988  0.0   0.0   0.0  0.0   0.0  0.0  0.0  0.0   0.0  
02-1988  0.0   0.0   0.0  0.0   0.0  0.0  0.0  NaN   NaN  
03-1988  0.0   0.0   0.0  0.0   0.0  0.0  0.0  0.0   0.0  
04-1988  0.0  35.3  48.0  0.0   4.0  0.0  0.0  0.0   NaN  
01-1990  0.0   0.0   0.0  0.0   0.0  0.0  0.0  0.0   0.0  
...      ...   ...   ...  ...   ...  ...  ...  ...   ...  
08-2020  0.0   0.0   0.0  0.0   0.0  0.0  0.0  0.0  36.0  
09-2020  0.0   0.0   0.0  0.0   0.0  0.0  0.0  0.0   NaN  
10-2020  0.0   0.0   0.0  0.0   0.0  0.0  0.0  0.0   0.0  
11-2020  0.0   0.0   0.0  0.0  32.0  2.0  5.0  0.0   NaN  
12-2020  0.0   0.0   0.0  0.0   0.0  0.0  0.0  0.0   0.0  

[199 rows x 31 columns]

# flatten df into 1-d array
flattened = df.to_numpy().flatten()

# expand months in df into days
valid_days = [pd.date_range(start=m, periods=pd.Period(m).days_in_month, freq='D')
              for m in df.index]

# create datetime index by days
valid_days = pd.to_datetime([item for sublist in valid_days 
                             for item in sublist]).strftime('%d-%m-%Y')

# create new DataFrame from flattened values and daily index
df_days = pd.DataFrame(flattened[~np.isnan(flattened)], 
                       index=valid_days, 
                       columns=['value'])

# create the range of all days within the period
total_days = pd.date_range('01-01-1988', '31-12-2020', freq='D').strftime('%d-%m-%Y')

# reindex df_days with all days within the period
df_days = df_days.reindex(total_days)

print(df_days)

輸出：

          value
01-01-1988  0.0
02-01-1988  0.0
03-01-1988  0.0
04-01-1988  0.0
05-01-1988  0.0
    ...     ...
27-12-2020  0.0
28-12-2020  0.0
29-12-2020  0.0
30-12-2020  0.0
31-12-2020  0.0
12054 rows × 1 columns

我很高興知道如何以更優雅的方式解決這個問題。

轉載請註明出處，本文鏈接：https://www.uj5u.com/qianduan/511864.html

標籤：Python熊猫麻木的

上一篇：x陣列中對應唯一值的ynumpy陣列中的最大值

下一篇：如何根據PythonPandas中兩列中的值聚合DataFrame并洗掉重復項？