如何將新添加的檔案連接到熊貓資料框？-有解無憂

我正在嘗試撰寫一個腳本，該腳本將從檔案夾中獲取新添加的 csv 檔案并將其添加到一個大檔案中。基本上，我希望將所有 csv 檔案添加到特定檔案夾中，并存盤在一個生成的 csv 檔案中。我在下面有一個生成檔案串列的代碼，我在那里選擇新添加的檔案：

def check_dir(fh,start_path='/Users/.../Desktop/files',new_cb=None,changed_cb=None):
    total_size = 0
    for dirpath, dirnames, filenames in os.walk(start_path):
        for f in filenames:
            fp = os.path.join(dirpath, f)
            if not os.path.islink(fp):
                fs = os.path.getsize(fp)
                total_size  = fs
                if f in fh:
                    if fh[f] == fs:
                        # file unchanged
                        pass
                    else:
                        if changed_cb:
                            changed_cb(fp)
                else:
                    #new file
                    if new_cb:
                        new_cb(fp)
                fh[f] = fs

    return total_size

def new_file(fp):
    print("New File {0}!".format(fp))

def changed_file(fp):
    print("File {0} changed!".format(fp))

if __name__ == '__main__':
    file_history={}
    total = 0

    while(True):
        nt = check_dir(file_history,'/Users/.../Desktop/files',new_file,changed_file)
        if total and nt != total:
            print("Total size changed from {0} to {1}".format(total,nt))
            total = nt
        time.sleep(200)
        print("File list:\n{0}".format(file_history))
        print(list(dict.keys(file_history))[-1])

我真的不知道如何創建這個空的 pandas 資料框，這個最新添加的檔案將定期添加到該資料框（這就是我在time.sleep那里的原因）。最后，我想要這個大的 csv 檔案，其中添加了所有檔案。

請幫忙：（

PS我是Python新手，所以請不要判斷它是否超級簡單..

uj5u.com熱心網友回復：

您打算使用 Pandas 處理 csv 中的資料還是僅連接檔案？

如果您只是想將每個 csv 檔案附加到大檔案，那么為什么不使用 python io 來提高速度和簡單性。假設所有 csv 檔案都使用相同型別的格式。

我已更新 new_file 方法以使用 io 附加到大 csv。我添加了一個未使用的append_pandas函式，但如果你必須使用 pandas 來完成這項作業，它應該會對你有所幫助。我還沒有測驗過 pandas 函式，還有更多的事情需要考慮，比如 csv 檔案的格式。查看檔案以獲取更多詳細資訊。

import os
import time


def check_dir(fh,start_path='/Users/.../Desktop/files',new_cb=None,changed_cb=None,**kwargs):
    total_size = 0
    for dirpath, dirnames, filenames in os.walk(start_path):
        for f in filenames:
            fp = os.path.join(dirpath, f)
            if not os.path.islink(fp):
                fs = os.path.getsize(fp)
                total_size  = fs
                if f in fh:
                    if fh[f] == fs:
                        # file unchanged
                        pass
                    else:
                        if changed_cb:
                            changed_cb(fp,**kwargs)
                else:
                    #new file
                    if new_cb:
                        new_cb(fp, **kwargs)
                fh[f] = fs

    return total_size

def is_csv(f):
    # you can add more to check here
    return 'csv' in f

def append_csv(s,d,skip_header=1):

    with open(s,'r') as readcsv:
        with open(d,'a') as appendcsv:
            for line in readcsv:
                if(skip_header < 1):
                    appendcsv.write(line)
                else:
                    skip_header -= 1

            if not "\n" in line:
                appendcsv.write("\n")

def append_pandas(s,d):
    # i haven't tested this
    pd = pandas.read_csv(s)
    pdb = pandas.read_csv(d)
    newpd = pdb.append(pd)
    DataFrame.to_csv(d)

def new_file(fp, **kwargs):
    if is_csv(fp):
        print("Appending {0}!".format(fp))
        bcsv = kwargs.get('append_to_csv','/default/path/to/big.csv')
        skip = kwargs.get('skip_header',1)
        append_csv(fp,bcsv,skip)

def changed_file(fp, **kwargs):
    print("File {0} changed!".format(fp))

if __name__ == '__main__':
    file_history={}
    total = 0

    while(True):
        nt = check_dir(file_history,'/tmp/test/',new_file,changed_file, append_to_csv ='/tmp/big.csv', skip_header = 1)
        if total and ns != total:
            print("Total size changed from {0} to {1}".format(total,ns))
            total = ns
        time.sleep(10)
        print("File list:\n{0}".format(file_history))

uj5u.com熱心網友回復：

我想這pandas.concat()就是你要找的

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/427718.html

標籤：Python 熊猫文件目录级联

上一篇：是否可以在不復制的情況下將一個大檔案拆分為n個檔案？

下一篇：如何使用exceldart包保存excel檔案？