如何從python中的檔案夾中選擇指定日期范圍的特定csv檔案？-有解無憂

我有一個檔案夾（與 python 腳本位于同一目錄中），其中包含從 1 月 1 日到 12 月 31 日的大量 csv 檔案，我只想將特定日期范圍內的特定 csv 檔案從檔案夾讀取到 python 中，然后附加檔案到一個串列中。

檔案命名如下，多個月的每一天都有檔案：

BANK_NIFTY_5MINs_2020-02-01.csv, BANK_NIFTY_5MINs_2020-02-02.csv, ... BANK_NIFTY_5MINs_2020-02-28.csv, BANK_NIFTY_5MINs_2020-03-01, .... BANK_NIFTY_5MINs_2020-03-31 等等。

目前，我有代碼通過使用“startswith”和“endswith”語法來獲取整個 3 月的 csv 檔案。但是，這樣做可以讓我一次只定位一個月的檔案。我希望能夠在指定的日期范圍內讀取多個月的 csv 檔案，例如 10 月、11 月和 12 月或 2 月和 3 月（基本上在任何月份開始和結束）。

以下代碼僅獲取 March 的檔案。然后我從串列中獲取檔案并將其合并到資料框中。

#Accessing csv files from directory
startdate  = datetime.strptime("2022-05-01", "%Y-%m-%d")
enddate = datetime.strptime("2022-06-30", "%Y-%m-%d")
all_files = []
path = os.path.realpath(os.path.join(os.getcwd(),os.path.dirname('__file__')))
for root, dirs, files in os.walk(path):
    for file in files:
        if file.startswith("/BANK_NIFTY_5MINs_") and file.endswith(".csv"):
             file_date = datetime.strptime(os.path.basename(file), "BANK_NIFTY_5MINs_%Y-%m-%d.csv")
             if startdate <= file_date <= enddate:
                  all_files.append(os.path.join(root, file))

上面的輸出看起來： 'BANK_NIFTY_5MINs_2020-03-01.csv'等等，但應該是整個路徑，例如： 'c:\Users\User123\Desktop\Myfolder\2020\BANK\BANK_NIFTY_5MINs_2020-03-01。 .csv'。合并功能要求串列中的完整路徑采用這種格式才能進一步處理。

uj5u.com熱心網友回復：

我會采用不同的方法來獲得更大的靈活性

import os
from datetime import datetime
from pprint import pprint


def quick_str_to_date(s: str) -> datetime:
    return datetime.strptime(s, "%Y-%m-%d")


def get_file_by_date_range(path: str, startdate: datetime or str, enddate: datetime or str) -> list:
    if type(startdate) == str:
        startdate = quick_str_to_date(startdate)
    if type(enddate) == str:
        enddate = quick_str_to_date(enddate)
    result = []   
    for root, dirs, files in os.walk(path):
        for filename in files:
            if filename.startswith("BANK_NIFTY_5MINs_") and filename.lower().endswith(".csv"):
                file_date = datetime.strptime(os.path.basename(filename), "BANK_NIFTY_5MINs_%Y-%m-%d.csv")
                if startdate <= file_date <= enddate:
                    result.append(filename)
    return result


print("all")
pprint(get_file_by_date_range("/full/path/to/files", "2000-01-01", "2100-12-31"))

print("\nfebuari")
pprint(get_file_by_date_range("/full/path/to/files", "2020-02-01", "2020-02-28"))

print("\none day")
pprint(get_file_by_date_range("/full/path/to/files", "2020-02-01", "2020-02-01"))

輸出

all
['/full/path/to/files/BANK_NIFTY_5MINs_2020-02-02.csv',
 '/full/path/to/files/BANK_NIFTY_5MINs_2020-02-28.csv',
 '/full/path/to/files/BANK_NIFTY_5MINs_2020-03-01.csv',
 '/full/path/to/files/BANK_NIFTY_5MINs_2020-03-31.csv',
 '/full/path/to/files/BANK_NIFTY_5MINs_2020-02-01.csv']

febuari
['/full/path/to/files/BANK_NIFTY_5MINs_2020-02-02.csv',
 '/full/path/to/files/BANK_NIFTY_5MINs_2020-02-28.csv',
 '/full/path/to/files/BANK_NIFTY_5MINs_2020-02-01.csv']

one day
['/full/path/to/files/BANK_NIFTY_5MINs_2020-02-01.csv']

uj5u.com熱心網友回復：

如果你想用正則運算式來做，這里是：

# replace `file.startswith(...) and file.endswith(...)`
re.match('BANK_NIFTY_5MINs_2020-(02|03|10|11|12)-[0-9] ', file)
###                              ^^^^^^^^^^^^^^ Feb, Mar, Oct-Dec

這是讓你開始的最基本的一個，它可能會得到改進。

但在你的情況下，我會選擇簡單的glob：

all_files = glob.glob('./BANK_NIFTY_5MINs_2020-0[2-3]-*.csv')

轉載請註明出處，本文鏈接：https://www.uj5u.com/ruanti/531335.html

標籤：Python正则表达式数据框

上一篇：從蒸汽價格歷史中提取谷歌表中的日期和數字

下一篇：PHPRegex-排除3個條件并回傳其余條件