該代碼使用 OCR 從串列“url_list”中的 URL 讀取文本。我試圖將字串 'txt' 形式的輸出附加到空的 Pandas 列 'url_text' 中。但是,代碼沒有向“url_text”列附加任何內容?什么時候
df = pd.read_csv(r'path') # main dataframe
df['url_text'] = "" # create empty column that will later contain the text of the url_image
url_list = (df.iloc[:, 5]).tolist() # convert column with urls to a list
print(url_list)
['https://pbs.twimg.com/media/ExwMPFDUYAEHKn0.jpg',
'https://pbs.twimg.com/media/ExuBd4-WQAMgTTR.jpg',
'https://pbs.twimg.com/media/ExuBd5BXMAU2-p_.jpg',
' ',
'https://pbs.twimg.com/media/Ext0Np0WYAEUBXy.jpg',
'https://pbs.twimg.com/media/ExsJrOtWUAMgVxk.jpg',
'https://pbs.twimg.com/media/ExrGetoWUAEhOt0.jpg',
' ',
' ']
for img_url in url_list: # loop over all urls in list url_list
try:
img = io.imread(img_url) # convert image/url to cv2/numpy.ndarray format
# Preprocessing of image
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
(h, w) = gry.shape[:2]
gry = cv2.resize(gry, (w*3, h*3))
thr = cv2.threshold(gry, 0, 255, cv2.THRESH_BINARY cv2.THRESH_OTSU)[1]
txt = pytesseract.image_to_string(thr) # read tweet image text
df['url_text'].append(txt)
print(txt)
except: # ignore any errors. Some of the rows does not contain a URL causing the loop to fail
pass
print(df)
uj5u.com熱心網友回復:
我無法對其進行測驗,但請嘗試此操作,因為您可能需要先創建串列,然后將其作為新列添加到 df(我將串列本身轉換為資料框,然后連接到原始 df)
txtlst=[]
for img_url in url_list: # loop over all urls in list url_list
try:
img = io.imread(img_url) # convert image/url to cv2/numpy.ndarray format
# Preprocessing of image
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
(h, w) = gry.shape[:2]
gry = cv2.resize(gry, (w*3, h*3))
thr = cv2.threshold(gry, 0, 255, cv2.THRESH_BINARY cv2.THRESH_OTSU)[1]
txt = pytesseract.image_to_string(thr) # read tweet image text
txtlst.append(txt)
print(txt)
except: # ignore any errors. Some of the rows does not contain a URL causing the loop to fail
txtlst.append("")
pass
dftxt=pd.Dataframe({"url_text":txtlst})
df=pd.concat([df, dftxt], axis=1)
print(df)
uj5u.com熱心網友回復:
正如Series.append()的檔案中所指出的,append 呼叫僅在兩個系列之間起作用。
更好的是在回圈外創建一個空串列,附加到回圈本身內的字串串列,然后將該串列插入到df["url_list"] = list_of_urls. 這在運行時也比重復地將兩個系列附加在一起要快得多。
url_list = []
for ...:
...
url_list.append(url_text)
df["url_list"] = url_list
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/399122.html
