獲取特定存盤桶檔案夾中的gsutilURI串列以進行迭代-有解無憂

我對 GCS 非常陌生，并且將它與 python 一起使用。我有一個名為“my_data”的GCS存盤桶，里面有很多檔案夾，我對名為“ABC”的檔案夾和名為“WW3”的子檔案夾感興趣。我想在存盤桶內的特定檔案夾中獲取gsutil URI （不是 blob）串列，稱為“ABC”，因此我可以將它們作為 pandas 資料框打開并連接它們。

到目前為止，我能夠獲得這樣的 blob 串列（我已經使用了這篇文章和這段視頻來做到這一點）：

my_bucket=storage_client.get_bucket("my_data")

# Get blobs in specific subirectory
# Get blobs in specific subirectory
blobs_specific = list(my_bucket.list_blobs(prefix='ABC/WW3/'))

>>>
#printing blob_specific gives me the blocs like this as list:
[<Blob: my_data, ABC/S3/, 12231543135681432>,...,.......]

我想獲取如下所示的 URL 串列：

["gs://my_data/ABC/WW3/tab1.csv","gs://my_data/ABC/WW3/tab2.csv","gs://my_data/ABC/WW3/tab3.csv"...]

所以我以后可以用熊貓打開它們并將它們連接起來。

有沒有辦法可以獲取串列 URL 而不是 blob？

或者，如果我能以某種方式使用 blob 連接 csv 并讀取為 pandas ...

編輯： 我試圖通過拆分 blob 然后訪問檔案來解決它，它似乎創建了 url 串列，但它并不像它看起來的那樣，而且它不是很聰明：

urls=[]

for x,y in enumerate(blobs_specific):
    first_part="gs://my_data/WW3/"
    scnd_part=str(blobs_specific[x]).split(',')[1]
    
    url=first_part scnd_part
    
    urls.append(url)

但是，當我嘗試使用此串列進行迭代時，它失敗了。并且似乎它列印不同的 url 然后它保存的內容：

urls[1]
>>>'gs://my_data/WW3/ ABC/tab1.csv'

#seems like it has space between the / and the "ABC" and then when I try to read it with pandas I get path not found:

file_path = urls[1]

df = pd.read_csv(file_path,
                 sep=",",
                 storage_options={"token": "my_secret_token-20g8g632vsk1.json"})

>>>
#this is a bit different than the original because I couldn't put the real name but it gets the b and o and weird characters that don't appear when I print the path....
FileNotFoundError: b/my_data/o/ WW3*BC/S1/ABCtab1.csv

uj5u.com熱心網友回復：

我已經通過使用 .lstrip() 找到了解決方案，但是，如果有人有更聰明的解決方案，我想學習:)

urls=[]

for x,y in enumerate(blobs_specific):
    first_part="gs://my_data/WW3/"
    scnd_part=str(blobs_specific[x]).split(',')[1].lstrip()
    
    url=first_part scnd_part
    
    urls.append(url)

在您的情況下，gs://my_data 可能會有所不同，請確保您選擇正確的路徑

轉載請註明出處，本文鏈接：https://www.uj5u.com/ruanti/520176.html

標籤：Google Cloud Collective Python熊猫网址谷歌云存储

上一篇：使用劇作家在turo中更改接送日期

下一篇：在進行時觸發aws粘合作業