在下面的代碼中,我獲取了 Google Drive 上一個 csv 檔案的 fileID。現在,我想將檔案內容直接存盤在 Pandas 框架中,而不是下載 csv 檔案然后提取資料(如代碼所示)。
import io
import os.path
import pandas as pd
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build
from googleapiclient.http import MediaIoBaseDownload
# If modifying these scopes, delete the file token.json.
SCOPES = ['https://www.googleapis.com/auth/drive.readonly']
# Login to Google Drive
def login():
creds = None
# The file token.json stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the first
# time.
if os.path.exists('token.json'):
creds = Credentials.from_authorized_user_file('token.json', SCOPES)
# If there are no (valid) credentials available, let the user log in.
if not creds or not creds.valid:
print ("Login to your to your Google Drive account which holds/shares the file database")
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(
'./src/credentials.json', SCOPES)
creds = flow.run_local_server(port=0)
# Save the credentials for the next run
with open('token.json', 'w') as token:
token.write(creds.to_json())
# Return service
service = build('drive', 'v3', credentials=creds)
return service
# Download files from Google Drive
def downloadFile(file_name):
# Authenticate
service = login()
# Search file by name
response = service.files().list(q=f"name='{file_name}'", spaces='drive', fields='nextPageToken, files(id, name)').execute()
for file in response.get('files', []):
file_id = file.get('id')
# Download file file if it exists
if ("file_id" in locals()):
request = service.files().get_media(fileId=file_id)
fh = io.FileIO(f"./data/{file_name}.csv", "wb")
downloader = MediaIoBaseDownload(fh, request)
print (f"Downloading {file_name}.csv")
else:
print (f"\033[1;31m Warning: Can't download >> {file_name} << because it is missing!!!\033[0;0m")
return
downloadFile("NameOfFile")
有沒有辦法實作這一目標?非常感謝你的幫助
uj5u.com熱心網友回復:
從The problem is to be able to do that I need the file's URL but I'm not able to retrieve it.,我認為您的檔案可能是 Google 電子表格。當檔案是 Google 電子表格時,webContentLink不包含在檢索到的元資料中。
如果我對你的情況的理解是正確的,下面的修改如何?
修改后的腳本:
從:
file_id = file.get('id')
# !!! Here, I would like to get the URL of the file and download it to a pandas data frame !!!
file_url = file.get("webContentLink")
到:
file_id = file.get('id')
file_url = file.get("webContentLink")
if not file_url:
request = service.files().export_media(fileId=file_id, mimeType='text/csv')
fh = io.BytesIO()
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print("Download %d%%" % int(status.progress() * 100))
fh.seek(0)
df = pd.read_csv(fh)
print(df)
- 在此修改中,使用 Drive API 將 Google 電子表格匯出為 CSV 資料,并將匯出的資料放入資料框。
- 在此修改中,請添加
import io和from googleapiclient.http import MediaIoBaseDownload。
筆記:
- 在這種情況下,Google 電子表格將使用 Drive API 匯出為 CSV 資料。所以請包括
https://www.googleapis.com/auth/drive.readonlyor的范圍https://www.googleapis.com/auth/drive。當您的范圍是 only 時https://www.googleapis.com/auth/drive.metadata.readonly,會發生錯誤。請注意這一點。
參考:
- 檔案:匯出
添加:
當檔案為 CSV 資料時,請進行如下修改。
file_id = file.get('id')
request = service.files().get_media(fileId=file_id)
fh = io.BytesIO()
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print("Download %d%%" % int(status.progress() * 100))
fh.seek(0)
df = pd.read_csv(fh)
print(df)
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/387581.html
