連接字串時出現UnicodeDecodeError-有解無憂

我有以下 Python 2.7 小腳本：

#!/usr/bin/python
# -*- coding: utf-8 -*-

import geoip2.database

def ret_country_iso(ip):
    reader = geoip2.database.Reader('/usr/local/geoip/GeoLite2-Country.mmdb')
    response = reader.country(ip)
    return response.country.iso_code.lower()

result = ret_country_iso("8.8.8.8")
print result
result  = "Роман"
print result

如您所見，我首先找出“8.8.8.8”IP 所在的國家/地區（這將回傳“us” - 見下文），然后我將一個包含一些俄羅斯字符的短字串連接到它。

結果：

# ./script.py
us
Traceback (most recent call last):
   File "./script.py", line 12, in <module>
    result  = "Роман"
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)

現在，如果我嘗試以下操作

#!/usr/bin/python
# -*- coding: utf-8 -*-

result = "us"
print result
result  = "Роман"
print result

然后一切正常：

./script.py 
us
usРоман

顯然，'ret_country_iso()' 函式回傳的東西與文字“us”字串不同，我的 Python 太差了，雖然不能這么說。

如何糾正上述情況？

編輯：按照snakecharmerb的建議，以下作業：

#!/usr/bin/python
# -*- coding: utf-8 -*-

import geoip2.database

def ret_country_iso(ip):
    reader = geoip2.database.Reader('/usr/local/geoip/GeoLite2-Country.mmdb')
    response = reader.country(ip)
    return response.country.iso_code.lower().encode('utf-8')

result = ret_country_iso("8.8.8.8")
print result
result  = "Роман"
print result

uj5u.com熱心網友回復：

Python 2 并沒有嚴格區分 unicode 和 bytes，所以將這兩種型別拼接起來的結果是不一致的：

u'abc'   'def'

成功，但是

u'US'   'Роман'

導致例外。通常的方法——“Unicode Sandwich”模式——是在應用程式的邊緣對字串型別的資料進行解碼和編碼，并且只在應用程式內使用 unicode（對于主要處理位元組的應用程式，采用反向模式）。

因此，在組合str和unicode實體時，您可以采用以下任一選項：

# unicode result
u'US '   'Роман'.decode('utf-8')

# str result
u'US '.encode('utf-8')   'Роман'

但關鍵是在整個代碼中保持一致，否則最終會出現很多錯誤。

Python 3 對區分這兩種型別更加嚴格；如果可能的話，你應該考慮使用它來更好地處理 unicode，因為 Python 2 不再受支持。

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/410772.html

標籤：

上一篇：如何將文本檔案轉換為dict？

下一篇：將SQL查詢轉換為pandas邏輯？