在awslambda中的python中從ANSI轉換為UTF-8-有解無憂

我在 s3 存盤桶中收到一個 zip 檔案。在它的放置事件中，我觸發了 aws lambda。我的 lambda 應該解壓縮檔案并將其中的檔案上傳到另一個 s3 存盤桶。

但這些檔案可以是 ANSI 和 UTF-8 檔案的混合。

我必須將所有這些轉換為 UTF-8。關于如何做到這一點的任何想法？

def get_utf_encoded_file(
    file,
    file_name: str
):
    is_ansi = False
    try:
        file.read().decode('utf-8')
    except:
        try:
            file.read().decode('cp1252') << I tried to print here, gives empty string
            is_ansi = True
        except Exception as e:
            log.error(f"Unable to parse file {file_name}")
            raise Exception(f"Unable to parse file {file_name}")
            
    if is_ansi:
        byte_stream = None
        temp_file_name = "/tmp/"   str(uuid.uuid4())   ".txt"
        with codecs.open(temp_file_name, "w", encoding='UTF-8') as temp_file:
            temp_file.write(file.read().decode('cp1252'))
                
        with open(temp_file_name, "rb") as temp_file:
            byte_stream = temp_file.read() << I tried print here gives empty byte array
            print(byte_stream)
            
        os.remove(temp_file_name)
        return byte_stream
    else:
        return file

呼叫它的函式：

def unzip_to_temp(
    zip: ZipFile
):
    for file_name in zip.namelist():
        file_data = get_utf_encoded_file(file_name, zip.open(file_name))
        upload_to_s3(file_data)

但是ansi檔案在s3中創建為空檔案。

uj5u.com熱心網友回復：

你打了好file.read()多次電話。你總是閱讀它，utf-8并且你正在為你做的閱讀得到空字串ANSI。

您應該呼叫一次并保存結果，然后進行解碼。

參考：https : //docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects

uj5u.com熱心網友回復：

您試圖read從同一個檔案多次呼叫。在 first 之后read，指標將位于檔案的末尾，因此不會讀取任何新內容。

而不是那樣，您可以只讀取一次資料，然后嘗試對其進行解碼。并且由于您在記憶體中對其進行解碼，因此您可以跳過一起寫入磁盤并回傳字串的編碼版本：

def get_utf_encoded_file(
    file,
    file_name: str
):
    data = file.read()
    try:
        data.decode('utf-8')
        # data decodes cleanly as utf-8
        return data
    except:
        pass

    try:
        data = data.decode('cp1252').encode("utf-8")
        # data decodes cleanly as cp1252, is now utf-8
        return data
    except:
        log.error(f"Unable to parse file {file_name}")
        raise Exception(f"Unable to parse file {file_name}")

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/374058.html

標籤：蟒蛇-3.x 亚马逊-s3 aws-lambda

上一篇：Sagemaker檔案大小限制？

下一篇：如何以html格式播放上傳到AmazonS3存盤桶的檔案？