列出AWSs3中的每個檔案夾及其子檔案夾-有解無憂

我有一個 s3 存盤桶，其檔案夾層次結構如下：

Folder 1
  
  Subfolder 1
    Subsubfolder 1
      Subsubsubfolder 1
      Subsubsubfolder 2
    
    Subsubfolder 2
      Subsubsubfolder 1
      Subsubsubfolder 2
  
  Subfolder 2
    Subsubfolder 1
      Subsubsubfolder 1
      Subsubsubfolder 2
    
    Subsubfolder 2
      Subsubsubfolder 1
      Subsubsubfolder 1

我正在嘗試檢索每個檔案夾以及存盤桶中結構的概述。我目前正在使用此代碼：

import boto3

s3 = boto3.client('s3')
bucket = "Bucket_name"

response = s3.list_objects_v2(Bucket=bucket)



for bucket in response['Contents']:
    print(bucket['Key'])

這讓我得到了最后一個子檔案夾中每個檔案的檔案路徑，這不是我想要的。有什么辦法可以只列出存盤桶中的檔案夾和所有子檔案夾？3

uj5u.com熱心網友回復：

如果您想模仿 AWS CLI 工具的行為和 S3 的其他 UI 表示，您需要將分隔符傳遞給任何串列物件呼叫，以告訴 S3 使用共享前綴對任何物件進行分組，并將它們呈現為類似于檔案夾的東西.

串列物件只會回傳一批 1000 個專案。要正確列舉存盤桶，您需要NextContinuationToken從回應中獲取并在另一個呼叫中使用它，直到沒有繼續令牌。boto3 有一個輔助函式被呼叫get_paginator來處理這個邏輯。

綜上所述，您可以使用類似這樣的方式列出 S3 存盤桶中的物件。這包括展示如何呈現輸出，其格式看起來有點像aws s3 ls作業原理。

import boto3
from datetime import datetime

def enum_s3_items(s3, bucket_name, prefix="", delimiter="/"):
    # Create a paginator to handle multiple pages from list_objects_v2
    paginator = s3.get_paginator("list_objects_v2")
    # Get each page from a call to list_objects_v2
    for page in paginator.paginate(Bucket=bucket_name, Prefix=prefix, Delimiter=delimiter):
        # Inside of each page, return the common prefixes (folders) first
        for common_prefix in page.get("CommonPrefixes", []):
            yield common_prefix
        # And, if it's present, return each item in turn
        for s3_object in page.get("Contents", []):
            yield s3_object

s3 = boto3.client("s3")
for obj in enum_s3_items(s3, "example-bucket"):
    # This is an example of how to process the output, in reality
    # you would no doubt want to do something application specific
    # with the results.
    if 'Prefix' in obj:
        # For common prefixes, just output the name of the prefix
        # with some padding to mimic "aws s3 ls"
        print(" " * 27   "PRE "   obj['Prefix'])
    else:
        # Grab the interesting info out of the object to mimic
        # how the cli works.
        at = obj['LastModified']
        # Conver to local time, just to mimic what the CLI does
        at = at.astimezone(datetime.now().tzinfo)
        # And pretty-print the datetime
        at = at.strftime("%Y-%m-%d %H:%M:%S")
        # Pull out other information
        size = obj['Size']
        key = obj['Key']
        # Output to the console
        print(f"{at} {size:10d} {key}")

uj5u.com熱心網友回復：

您使用前綴，您必須注意分頁才能真正獲得所有條目。所以類似上面的代碼應該可以解決問題。這將創建一個生成器物件，其中所有檔案/檔案夾都從前綴開始。

def get_all(bucket_name:str, prefix:str) -> Iterable[str]:
    client = boto3.client("s3")
    paginator = client.get_paginator("list_objects_v2")
    pages = paginator.paginate(Bucket=bucket_name, Prefix=str(prefix))
    for page in pages:
        for obj in page.get("Contents", []):
            yield obj["Key"]

all_ = list(get_all("a_bucket", "base_folder"))

轉載請註明出處，本文鏈接：https://www.uj5u.com/caozuo/417392.html

標籤：

上一篇：在S3中更改所選檔案的元資料

下一篇：在處理完所有s3檔案后執行操作