我有一長串 CSV 我不想附加到一個中,我希望保持 CSV 的名稱不變,只需將 2 列從當前具有 13 位 unix 日期的列更改為自然日期時間 IE YYYY/MM/DD HH: MM:SS。
我很高興使用似乎是更簡單的方法的 Pandas,但我正在為此苦苦掙扎,我希望這樣的事情可能會奏效。任何幫助表示贊賞!
這是一個 unix 時間示例:1640227953000 這將轉換為 2021 年 12 月 23 日星期四 02:52:33
import pandas as pd
import datetime
from pathlib import Path # available in python 3.4
dir = r'csv/' # raw string for windows.
csv_files = [f for f in Path(dir).glob('*.csv')] # finds all csvs in your folder.
print(csv_files)
for csv in csv_files: #iterate list
df = pd.read_csv(csv) #read cs
print(df.columns.tolist()) # used for trouble shooting
df['values_authorTimestamp']=df['values_authorTimestamp'].apply(lambda d: datetime.datetime.fromtimestamp(int(d)/1000).strftime('%Y-%m-%d %H:%M:%S'))
df['values_committerTimestamp']=df['values_committerTimestamp'].apply(lambda d: datetime.datetime.fromtimestamp(int(d)/1000).strftime('%Y-%m-%d %H:%M:%S'))
#df['values_authorTimestamp'] = pd.to_datetime(df['values_authorTimestamp'], format='%Y-%m-%d %H:%M:%S')
# print(df)
print(f'{csv.name} saved.')
df.to_csv(f'csv/{csv.name}')
#values_committerTimestamp
這是有效的,保存到 CSV 但是它只能通過其中的一些并引發錯誤,有什么想法嗎?
File "Scripts/Audit/change-csv.py", line 16, in <module>
df['values_authorTimestamp']=df['values_authorTimestamp'].apply(lambda d: datetime.datetime.fromtimestamp(int(d)/1000).strftime('%Y-%m-%d %H:%M:%S'))
File "/opt/homebrew/lib/python3.9/site-packages/pandas/core/series.py", line 4433, in apply
return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
File "/opt/homebrew/lib/python3.9/site-packages/pandas/core/apply.py", line 1082, in apply
return self.apply_standard()
File "/opt/homebrew/lib/python3.9/site-packages/pandas/core/apply.py", line 1137, in apply_standard
mapped = lib.map_infer(
File "pandas/_libs/lib.pyx", line 2870, in pandas._libs.lib.map_infer
File "Scripts/Audit/change-csv.py", line 16, in <lambda>
df['values_authorTimestamp']=df['values_authorTimestamp'].apply(lambda d: datetime.datetime.fromtimestamp(int(d)/1000).strftime('%Y-%m-%d %H:%M:%S'))
ValueError: invalid literal for int() with base 10: '2021-11-04 17:19:24'
uj5u.com熱心網友回復:
似乎有混合格式的日期時間,errors='coerce'如果不是數學格式,請嘗試使用缺失值引數,然后用另一個替換缺失Series值Series.fillna:
df = pd.DataFrame({'values_authorTimestamp':[1640227953000,'2021-11-04 17:19:24']})
d1 = pd.to_datetime(df['values_authorTimestamp'], unit='ms', errors='coerce')
d2 = pd.to_datetime(df['values_authorTimestamp'], errors='coerce')
df['values_authorTimestamp'] = d1.fillna(d2).dt.strftime('%Y/%m/%d %H:%M:%S')
print (df)
values_authorTimestamp
0 2021/12/23 02:52:33
1 2021/11/04 17:19:24
所以你的解決方案改變了:
for csv in csv_files: #iterate list
df = pd.read_csv(csv) #read cs
d1 = pd.to_datetime(df['values_authorTimestamp'], unit='ms', errors='coerce')
d2 = pd.to_datetime(df['values_authorTimestamp'], errors='coerce')
df['values_authorTimestamp'] = d1.fillna(d2).dt.strftime('%Y/%m/%d %H:%M:%S')
d11 = pd.to_datetime(df['values_committerTimestamp'], unit='ms', errors='coerce')
d21 = pd.to_datetime(df['values_committerTimestamp'], errors='coerce')
df['values_committerTimestamp'] = d11.fillna(d21).dt.strftime('%Y/%m/%d %H:%M:%S')
# print(df)
print(f'{csv.name} saved.')
df.to_csv(f'csv/{csv.name}')
轉載請註明出處,本文鏈接:https://www.uj5u.com/qianduan/455907.html
