我有一個 s3 存盤桶,其檔案夾層次結構如下:
Folder 1
Subfolder 1
Subsubfolder 1
Subsubsubfolder 1
Subsubsubfolder 2
Subsubfolder 2
Subsubsubfolder 1
Subsubsubfolder 2
Subfolder 2
Subsubfolder 1
Subsubsubfolder 1
Subsubsubfolder 2
Subsubfolder 2
Subsubsubfolder 1
Subsubsubfolder 1
我正在嘗試檢索每個檔案夾以及存盤桶中結構的概述。我目前正在使用此代碼:
import boto3
s3 = boto3.client('s3')
bucket = "Bucket_name"
response = s3.list_objects_v2(Bucket=bucket)
for bucket in response['Contents']:
print(bucket['Key'])
這讓我得到了最后一個子檔案夾中每個檔案的檔案路徑,這不是我想要的。有什么辦法可以只列出存盤桶中的檔案夾和所有子檔案夾?3
uj5u.com熱心網友回復:
如果您想模仿 AWS CLI 工具的行為和 S3 的其他 UI 表示,您需要將分隔符傳遞給任何串列物件呼叫,以告訴 S3 使用共享前綴對任何物件進行分組,并將它們呈現為類似于檔案夾的東西.
串列物件只會回傳一批 1000 個專案。要正確列舉存盤桶,您需要NextContinuationToken從回應中獲取并在另一個呼叫中使用它,直到沒有繼續令牌。boto3 有一個輔助函式被呼叫get_paginator來處理這個邏輯。
綜上所述,您可以使用類似這樣的方式列出 S3 存盤桶中的物件。這包括展示如何呈現輸出,其格式看起來有點像aws s3 ls作業原理。
import boto3
from datetime import datetime
def enum_s3_items(s3, bucket_name, prefix="", delimiter="/"):
# Create a paginator to handle multiple pages from list_objects_v2
paginator = s3.get_paginator("list_objects_v2")
# Get each page from a call to list_objects_v2
for page in paginator.paginate(Bucket=bucket_name, Prefix=prefix, Delimiter=delimiter):
# Inside of each page, return the common prefixes (folders) first
for common_prefix in page.get("CommonPrefixes", []):
yield common_prefix
# And, if it's present, return each item in turn
for s3_object in page.get("Contents", []):
yield s3_object
s3 = boto3.client("s3")
for obj in enum_s3_items(s3, "example-bucket"):
# This is an example of how to process the output, in reality
# you would no doubt want to do something application specific
# with the results.
if 'Prefix' in obj:
# For common prefixes, just output the name of the prefix
# with some padding to mimic "aws s3 ls"
print(" " * 27 "PRE " obj['Prefix'])
else:
# Grab the interesting info out of the object to mimic
# how the cli works.
at = obj['LastModified']
# Conver to local time, just to mimic what the CLI does
at = at.astimezone(datetime.now().tzinfo)
# And pretty-print the datetime
at = at.strftime("%Y-%m-%d %H:%M:%S")
# Pull out other information
size = obj['Size']
key = obj['Key']
# Output to the console
print(f"{at} {size:10d} {key}")
uj5u.com熱心網友回復:
您使用前綴,您必須注意分頁才能真正獲得所有條目。所以類似上面的代碼應該可以解決問題。這將創建一個生成器物件,其中所有檔案/檔案夾都從前綴開始。
def get_all(bucket_name:str, prefix:str) -> Iterable[str]:
client = boto3.client("s3")
paginator = client.get_paginator("list_objects_v2")
pages = paginator.paginate(Bucket=bucket_name, Prefix=str(prefix))
for page in pages:
for obj in page.get("Contents", []):
yield obj["Key"]
all_ = list(get_all("a_bucket", "base_folder"))
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/417392.html
標籤:
上一篇:在S3中更改所選檔案的元資料
下一篇:在處理完所有s3檔案后執行操作
