roi = [[(284, 764), (996, 840), 'text', 'name'],
[(1560, 756), (2312, 836), 'text', 'cnic'],
[(2000, 704), (2060, 748), 'box', 'corporate'],
[(2296, 696), (2360, 756), 'box', 'individual'],
[(1220, 844), (2360, 920), 'text', 'email']]
以上是我運行 tesseract 的選擇,如果它是文本,如果它是一個框,則獲取 '0' 或 '1' 并希望將其保存到資料框,然后可以將其保存到 Excel 并具有所需的輸出(采用列標題)來自上面“roi”的最后一列以及從 tesseract 和框值(1 或 0)的輸出中獲取的值。
myPicList = os.listdir(sof_folder)
for j, y in enumerate(myPicList):
if 'SOF' in y:
img = cv.imread(sof_folder "/" y)
df = pd.DataFrame()
pixelThreshold = 1100
for x, r in enumerate(roi):
section = img[r[0][1]:r[1][1], r[0][0]:r[1][0]]
if len(df.columns) < len(roi):
if r[2] == 'text':
df[r[3]] = tess.image_to_string(section)
if r[2] == 'box':
imgGray = cv.cvtColor(section, cv.COLOR_BGR2GRAY)
imgThresh = cv.threshold(imgGray, 170, 255, cv.THRESH_BINARY_INV)[1]
totalPixels = cv.countNonZero(imgThresh)
if totalPixels > pixelThreshold: totalPixels = 1;
else: totalPixels = 0
df[r[3]] = totalPixels
df.to_excel('forms saved.xlsx')
但是,它只回傳列名(即名稱、cnic、電子郵件等)。
更短的代碼版本更容易看到
for x, r in enumerate(roi):
section = img[r[0][1]:r[1][1], r[0][0]:r[1][0]]
if r[2] == 'text':
df[r[3]] = tess.image_to_string(section)
我從這里嘗試了兩種解決方案,但沒有一個對我有用。第二個解決方案不起作用,第一個給出奇怪的輸出為 [![一行只包含一個 tesseract 輸出,只保留最后一個影像的輸出]
我的代碼編輯如下:
d1 = {}
d = {}
results = []
df = pd.DataFrame(data=d1)
for x, r in enumerate(roi):
section = img[r[0][1]:r[1][1], r[0][0]:r[1][0]]
if len(df.columns) < len(roi):
if r[2] == 'text':
# df[r[3]] = tess.image_to_string(section)
readings = tess.image_to_string(section)
d = {r[3]: [readings]}
df = pd.DataFrame(data=d)
results.append(df)
if r[2] == 'box':
imgGray = cv.cvtColor(section, cv.COLOR_BGR2GRAY)
imgThresh = cv.threshold(imgGray, 170, 255, cv.THRESH_BINARY_INV)[1]
totalPixels = cv.countNonZero(imgThresh)
if totalPixels > pixelThreshold: totalPixels = 1;
else: totalPixels = 0
df[r[3]] = totalPixels
d = {r[3]: [totalPixels]}
df = pd.DataFrame(data=d)
results.append(df)
final_df = pd.concat(results, axis=0)
final_df.to_csv("final.csv")
uj5u.com熱心網友回復:
df = pd.DataFrame()
for j, y in enumerate(myPicList):
if 'SOF' in y:
with open('dataOutput.csv', 'a ') as f:
f.write(y ',')
img = cv.imread(sof_folder "/" y)
pixelThreshold = 1100
myData = []
for x, r in enumerate(roi):
section = img[r[0][1]:r[1][1], r[0][0]:r[1][0]]
if len(df.columns) < len(roi):
if r[2] == 'text':
text = tess.image_to_string(section)
text = text.replace("\n", " ")
myData.append(text)
if r[2] == 'box':
imgGray = cv.cvtColor(section, cv.COLOR_BGR2GRAY)
imgThresh = cv.threshold(imgGray, 170, 255, cv.THRESH_BINARY_INV)[1]
totalPixels = cv.countNonZero(imgThresh)
if totalPixels > pixelThreshold: totalPixels = 1;
else: totalPixels = 0
myData.append(totalPixels)
with open('dataOutput.csv', 'a ') as f:
for data in myData:
f.write((str(data) ','))
f.write('\n')
不完全存盤在資料框中,但我希望這對你有用。
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/530826.html
下一篇:Oracle函式獲取最小值
