所以我要遍歷一個目錄,我正在讀取這些檔案上的一些 JSON 檔案,我決議出 4 個鍵,然后創建一個包含所有決議出的資料的 CSV 檔案
碰巧我有重復的條目,所以我想根據日期(較新)消除重復項,然后重新撰寫?CSV 不確定如何實作它
例如:
def mdy_to_ymd(d):
# convert the date into comparable string
cor_date = datetime.strptime(d, '%b %d %Y').strftime('%d/%m/%Y')
return time.strptime(cor_date, "%d/%m/%Y")
def date_converter(date): # convert the date to readable string for csv
return datetime.strptime(date, '%b %d %Y').strftime('%d/%m/%Y')
def csv_generator(path): # creating the csv
list_json = []
ffresult = []
duplicate_dict = {}
for file in os.listdir(path): # iterating through the directory with the files
fresult = []
with open(f"{directory}/{file}", "r") as result: # opening the json file
templates = json.load(result)
hostname_str = file.split(".")
site_code_str = (f"{file[:5]}")
datetime_str3 = (mdy_to_ymd(datetime_str2)) # converting the date to comparable data
duplicate_dict[hostname_str[0]] = datetime_str3
"""?? i am creating a
dictionary which as key has the hostname and as date has the date
but it doesnt work since when there is the same hostname it only updates the current key and there are
not duplicates but it doesnt guarantee there are only the newest based on date"""
fresult.append(site_code_str)
fresult.append(hostname_str[0])
fresult.append((templates["execution_status"]))
fresult.append(date_converter(datetime_str2))
fresult.append(templates["protocol_name"])
fresult.append(templates["protocol_version"])
ffresult.append(fresult)
# i append the values i need into 2 lists
with open("jsondicts.csv", "w") as dst:
writetoit = csv.writer(dst)
writetoit.writerows(csv_generator(directory))
# this is how i write to csv so right now i have duplicate values on the csv
我只想擁有基于主機名的唯一值,但也只有基于日期的最新唯一值,當然還有其他決議出的資料(協議名稱、站點代碼等)
uj5u.com熱心網友回復:
這解決了它我不得不使用熊貓庫
result_pan_xls = (result_pan.sort_values(by="Execution_Date").drop_duplicates(subset="HOSTNAME",keep="last"))
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/435901.html
