如何將多邊形坐標轉換為矩形（yolo格式）以進行影像標注？-有解無憂

我正在嘗試通過 OCR 讀取水表讀數，但是，我的第一步是找到 ROI。我從 Kaggle 找到了一個帶有 ROI 標記資料的資料集。但它們不是矩形，而是多邊形，有的有 5 個點，有的有 8 個，具體取決于影像。如何將其轉換為 yolo 格式？例如：

file name | value | coordinates

id_53_value_595_825.jpg 595.825 {'type': 'polygon', 'data': [{'x': 0.30788, 'y': 0.30207}, {'x': 0.30676, 'y': 0.32731}, {'x': 0.53501, 'y': 0.33068}, {'x': 0.53445, 'y': 0.33699}, {'x': 0.56529, 'y': 0.33741}, {'x': 0.56697, 'y': 0.29786}, {'x': 0.53501, 'y': 0.29786}, {'x': 0.53445, 'y': 0.30417}]}

id_553_value_65_475.jpg 65.475 {'type': 'polygon', 'data': [{'x': 0.26133, 'y': 0.24071}, {'x': 0.31405, 'y': 0.23473}, {'x': 0.31741, 'y': 0.26688}, {'x': 0.30676, 'y': 0.26763}, {'x': 0.33985, 'y': 0.60851}, {'x': 0.29386, 'y': 0.61449}]}

id_407_value_21_86.jpg 21.86 {'type': 'polygon', 'data': [{'x': 0.27545, 'y': 0.19134}, {'x': 0.37483, 'y': 0.18282}, {'x': 0.38935, 'y': 0.76071}, {'x': 0.28185, 'y': 0.76613}]}

我知道對于 yolo 格式，我需要獲取 xmin、ymin、xmax、ymax 以便我可以計算寬度和高度，但我無法決議資料。有人可以幫忙嗎？

謝謝你。

編輯：最后，它成功了。如果有人正在努力將 csv 檔案從https://www.kaggle.com/datasets/tapakah68/yandextoloka-water-meters-dataset轉換為 yolo 格式，這里是我的代碼片段，用于為每個影像創建文本檔案。

import csv
import pandas as pd
import json
import ast
def converttoyolo(csv_file):
  df = pd.read_csv(csv_file)
  l_csv = len(df)
  for i in range(l_csv):
    df_row = df.iloc[i]  #get each row
    
    df_ = df_row['photo_name']  #image column 
    df__ = df_.split('.')  #to get name for text file

    df_new = df_row['location']  #start of gettinf coordinates access
    df_dict = ast.literal_eval(df_new)  #str to dict
    df__dict = json.dumps(df_dict, indent = 4)
    df_dict__ = json.loads(df__dict)
    
    convertedDict = df_dict__
    length = len(convertedDict['data'])
    x = []
    y = []
    for j in range(length):  #put each x and y for each row in seperate array
      x.append(convertedDict['data'][j]['x'])
      y.append(convertedDict['data'][j]['y'])

    max_x = max(x)
    max_y = max(y)  #yolo conversion, check answer below
    min_x = min(x)
    min_y = min(y)

    width = max_x - min_x
    height = max_y - min_y 
    center_x = min_x   (width/2)
    center_y = min_y   (height/2)

    def filename(file):   #put in text files
      with open(file ".txt", "w") as file:
        
         file.write(str(width) ',' str(height) ','  ...
         str(center_y) ',' str(center_y))
    
    filename('/content/drive/MyDrive/yolo/custom_data/jpeg/' df__[0]) 
    
converttoyolo(csv_file)

uj5u.com熱心網友回復：

您需要為每個形狀創建一個輪廓（點串列）。一旦你有了它，然后呼叫cv::boundingRect()將每個輪廓變成一個單一的邊界矩形。一旦你有了矩形，你就可以計算出 X、Y、W 和 H。但是由于 YOLO 格式是 CX 和 CY——而不是 X 和 Y——那么你需要這樣做：

CX = X   W/2.0
CY = Y   H/2.0

最后，您必須標準化所有 4 個值。YOLO 格式是空格分隔的，第一個值是整數類 ID。因此，如果“dog”是您的第二類（因此 id #1，因為它是從零開始的），那么您將輸出：

1 0.234 0.456 0.123 0.111

...其中 4 個坐標是：

CX / image width
CY / image height
W / image width
H / image height

如果您想了解更多數學示例，請參閱 Darknet/YOLO 常見問題解答： https ://www.ccoderun.ca/programming/darknet_faq/#darknet_annotations

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/461372.html

標籤：解析 ocr 约洛图像识别

上一篇：根據JSON中的嵌套標簽獲取所有嵌套的url

下一篇：您好，我需要一些幫助才能將一個檔案的內容復制到另一個檔案，但tcl中的一些損壞的行除外