我正在將python OCR影像轉換為文本,并比較是否有重復,我正在一一檢查,以便更容易定位
例如: listA = [1, 2 ,3 , 4, 4, 5, 6]
所以當我附加串列 A 時,可以顯示 4 是重復的
勉問題:my list "listOfElems" is empty和want to save text and detect is duplicate in list one by one
from PIL import Image
import pytesseract
import cv2
import numpy as np
from os import listdir
from os.path import isfile, join
mypath = "/home/DC_ton/desktop/test_11_8/output02"
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]
print(onlyfiles)
i = 1
listOfElems = []
Number_of_onlyfiles = len(onlyfiles)
while i < Number_of_onlyfiles :
each_file_path = '/home/DC_ton/desktop/test_11_8/output02/' onlyfiles[i]
image = Image.open(each_file_path)
text = pytesseract.image_to_string(image, lang='eng')
print(text)
for text in listOfElems:
if text not in listOfElems:
listOfElems.append(text)
else:
print("here get duplicate")
i =1
print(listOfElems)
newlist = []
duplist = []
def checkIfDuplicates_1(listOfElems):
''' Check if given list contains any duplicates '''
if len(listOfElems) == len(set(listOfElems)):
return False
else:
return True
result = checkIfDuplicates_1(listOfElems)
if result:
print('Yes, list contains duplicates')
else:
print('No duplicates found in list')
for k in listOfElems:
if k not in newlist:
newlist.append(k)
else:
duplist.append(k)
print("List of duplicates", duplist)
- 輸出:
my list "listOfElems" is empty我想一一比較
['final_output_11.png', 'final_output_6.png', 'final_output_17.png', 'final_output_8.png', 'final_output_15.png', 'final_output_14.png', 'final_output_2.png', 'final_output_12.png', 'final_output_21.png', 'final_output_3.png', 'final_output_24.png', 'final_output_18.png', 'final_output_19.png', 'final_output_10.png', 'final_output_29.png', 'final_output_9.png', 'final_output_20.png', 'final_output_7.png', 'final_output_31.png', 'final_output_30.png', 'final_output_25.png', 'final_output_1.png', 'final_output_16.png', 'final_output_5.png', 'final_output_27.png', 'final_output_13.png', 'final_output_28.png', 'final_output_4.png', 'final_output_23.png', 'final_output_26.png', 'final_output_22.png']
CA7T4B2
CAT7T4BF
CAT4B8
CAT4BE
CAT4C4
CAT4C1
CAT4B7
CA7T4CB
CAT4cs
CAT4B4
CAT4BA
CAT7T4BC
CA74B9
CAT4BD
(CAT4AF
CAT4CA
[]
No duplicates found in list
List of duplicates []
圖片鏈接:如果重復,我可以檢查“整套”,只是不知道一一
https://imgur.com/a/RGUumoy
我搜索了類似的案例,但我未能適應我的案例,因此,我仍然需要幫助如何在 Python 中按陣列順序隨機獲取陣列
uj5u.com熱心網友回復:
您正在創建一個空串列,從不向其中添加任何內容,然后對其進行迭代(無)
i = 1
listOfElems = [] # <- empty
Number_of_onlyfiles = len(onlyfiles)
while i < Number_of_onlyfiles :
each_file_path = '/home/DC_ton/desktop/test_11_8/output02/' onlyfiles[i]
image = Image.open(each_file_path)
text = pytesseract.image_to_string(image, lang='eng')
print(text)
for text in listOfElems: # <- still empty
if text not in listOfElems:
listOfElems.append(text)
else:
print("here get duplicate")
i =1
簡單的解決方案是將當前元素添加到串列中,如果它已經不在串列中。像這樣:
while i < Number_of_onlyfiles :
each_file_path = '/home/DC_ton/desktop/test_11_8/output02/' onlyfiles[i]
image = Image.open(each_file_path)
text = pytesseract.image_to_string(image, lang='eng')
print(text)
if text not in listOfElems:
listOfElems.append(text)
else:
print("Duplicate")
另請注意,索引從 0 開始,因此 i 應該在開頭為 0,并且您不必遍歷串列來檢查元素是否在其中,只需使用“in”運算子即可。
您還可以通過遍歷 onlyfiles 來節省幾行:
for file in onlyfiles:
file_path = mypath file
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/530228.html
