根據串列值從dict中檢索-有解無憂

這是我的資料-

inp = [{'father_husband_mother_name': [['Father s Name', 0.8603670001029968],
   ['Shripati', 0.8603670001029968],
   ['Father s Name', 0.8903670001029969],
   ['Shpppati', 0.8903670001029969]],
  'doc_id': [['GGX2176', 0.8435981869697571],
   ['GGC2176', 0.8835981869697571]],
  'name': [['Elector s Name', 0.8301510810852051],
   ['Shibshankar Ghosh', 0.8301510810852051],
   ['Elector s Name', 0.8501510810852051],
   ['Shibshankar Ghosh', 0.8501510810852051]],
  'date_of_birth': [['Age as on 1.1.2000', 0.8067844915390014],
   ['15', 0.8067844915390014],
   ['Age as on 1.1.2000', 0.8267844915390015],
   ['15', 0.8267844915390015]],
  'gender_sex': [['Sex', 0.7784658074378967],
   ['M', 0.7784658074378967],
   ['Sex', 0.8784658074378967],
   ['M', 0.8784658074378967]]}]

STOPWORDS = ['Sex', 'Father s Name', 'Elector s Name', 'Address', 'Name', 'Gender', 'Mother s Name', 
             'Husband s Name']

我期望的輸出：

{'father_husband_mother_name': 'Shpppati',
 'doc_id': 'GGC2176',
 'name': 'Shibshankar Ghosh',
 'date_of_birth': 'Age as on 1.1.2000,15',
 'gender_sex': 'M'}

這是邏輯 -

檢索每個鍵中不存在的具有最高置信度得分 [float串列串列內部] 的值。STOPWORDS

我試過的 -

def process_kie_dict(voter_raw_labels, threshold=0.7):
    cleaned_dict = {}
    intermediate_dict = {}
    for entity_dict in voter_raw_labels:
        for entity, val in entity_dict.items():
            conf_val = [item[1] for item in val]
            unique_val = list(set(conf_val))
            max_conf = max(unique_val)
            if max_conf > threshold:
                if len(unique_val)==1:
                    add_val = [item[0] for item in val]
                else:
                    max_conf_index = conf_val.index(max_conf)
                    add_val = [item[0] for item in val[max_conf_index:]]

                if entity not in intermediate_dict.keys():
                    intermediate_dict[entity] = [add_val,max_conf]
                else:
                    if intermediate_dict[entity][1] < max_conf:
                        intermediate_dict[entity] = [add_val,max_conf]
    
#     print(intermediate_dict)
    for key, val in intermediate_dict.items():
        final_value = ''
        for value in val[0]:
            m = len(str.strip(value))
            edit_dist_list = []
            for word in STOPWORDS:
                n = len(word)
                edit_dist = editDistDP(value, word, m, n)
                edit_dist_list.append(edit_dist)

            if min(edit_dist_list) < 2:
                value=''
            
            final_value = final_value   value   ','
            clean_value = final_value.strip(",")
        
        cleaned_dict[key]=clean_value
    return cleaned_dict

def editDistDP(str1, str2, m, n):
    # Create a table to store results of subproblems
    dp = [[0 for x in range(n   1)] for x in range(m   1)]
 
    # Fill d[][] in bottom up manner
    for i in range(m   1):
        for j in range(n   1):
 
            # If first string is empty, only option is to
            # insert all characters of second string
            if i == 0:
                dp[i][j] = j    # Min. operations = j
 
            # If second string is empty, only option is to
            # remove all characters of second string
            elif j == 0:
                dp[i][j] = i    # Min. operations = i
 
            # If last characters are same, ignore last char
            # and recur for remaining string
            elif str1[i-1] == str2[j-1]:
                dp[i][j] = dp[i-1][j-1]
 
            # If last character are different, consider all
            # possibilities and find minimum
            else:
                dp[i][j] = 1   min(dp[i][j-1],        # Insert
                                   dp[i-1][j],        # Remove
                                   dp[i-1][j-1])    # Replace
 
    return dp[m][n]

您可以忘記編輯距離的實作，這并不重要。我想知道的是嵌套的 for 回圈，這不會大規模作業。尋找更有效的實施。

uj5u.com熱心網友回復：

這是您的資料的決議器

result = {k: sorted(v, key=lambda x: x[1] if x[0] not in STOPWORDS else 0)[-1][0] for k, v in inp[0].items()}

簡而言之，它需要一個鍵并根據置信度值對字典的其余部分進行排序，除非串列的第一個元素包含在STOPWORDS. 然后將該排序串列的第一個元素result作為值添加到字典中。

轉載請註明出處，本文鏈接：https://www.uj5u.com/shujuku/514368.html

標籤：Python列表表现字典最大限度

上一篇：使用子選擇添加過濾器后SQL查詢失去性能

下一篇：如何使用pygame以不同的速度交替顯示不同位置的物件