從自定義字典物件的嵌套串列中按鍵洗掉重復項-有解無憂

我有一個嵌套的物件串列，稱為“單詞”。它由一個類的物件組成，該類具有 conf(float)、end(float)、start(float)、word(string) 等資料我想洗掉具有相同“word”的重復出現的物件

class Word:
    ''' A class representing a word from the JSON format for vosk speech recognition API '''

    def __init__(self, dict):
        '''
        Parameters:
          dict (dict) dictionary from JSON, containing:
            conf (float): degree of confidence, from 0 to 1
            end (float): end time of the pronouncing the word, in seconds
            start (float): start time of the pronouncing the word, in seconds
            word (str): recognized word
        '''

        self.conf = dict["conf"]
        self.end = dict["end"]
        self.start = dict["start"]
        self.word = dict["word"]

    def to_string(self):
        ''' Returns a string describing this instance '''
        return "{:20} from {:.2f} sec to {:.2f} sec, confidence is {:.2f}%".format(
            self.word, self.start, self.end, self.conf*100)


    def compare(self, other):
        if self.word == other.word:
            return True
        else:
            return False

我試過了，但無法讓它作業

 nr_words = []
 c = custom_Word.Word({'conf': 0.0, 'end': 0.00, 'start': 0.00, 'word': ''}) 
 nr_words.append(c)

 for w in words:
     for nr in nr_words:
         if w.compare(nr_words[nr]):
             print("same")
         else:
             print("not same")
             nr_words.append(w.word)
             nr_words.append(w.start)
             nr_words.append(w.end)

這是物件的集合從自定義字典物件的嵌套串列中按鍵洗掉重復項

每個物件都包含這樣的資料

{'conf': 0.0, 'end': 0.00, 'start': 0.00, 'word': 'hello'} 

{'conf': 0.0, 'end': 1.00, 'start': 0.00, 'word': 'hello'} 

{'conf': 0.0, 'end': 2.00, 'start': 0.00, 'word': 'to'}

我的“Word”類中的比較功能完美無缺

words[0].compare(words[1])
True

我也試過這種方式

for i in range(0,len(words)):
    for o in range(0,len(nr_words)):
        if words[i].compare(nr_words[o]):
            print("same")
        else:
            print("not same")
            nr_words.append(w.word)
            nr_words.append(w.start)
            nr_words.append(w.end)

但出現錯誤“AttributeError：'str'物件沒有屬性'word'”

我不確定屬性詞有什么問題可以一些好的靈魂指導我如何通過“詞”洗掉重復的物件提前謝謝！

uj5u.com熱心網友回復：

回答：

與您現在所做的完全相反的串列我們現在只保留唯一的單詞串列，但重復的單詞串列將包含重復出現的單詞

frequency = {}

for w in words:
    if frequency.get(w.word, False):
        frequency[w.word].append(w)
    else:
        frequency[w.word] = [w]

repeated_words_list = []
for key in frequency:
    if len(frequency[key]) > 1:
        repeated_words_list.extend(frequency[key])

# 'repeated_words_list' is now a list containing
# all the Word objects whose `word` attribute
# appears 2 times or more.

uj5u.com熱心網友回復：

（第一個答案：
看這段代碼，我猜nr_words是一個串列。
你能指定nr_words代表什么嗎？它像'已經看到'單詞的串列嗎？

我還看到您列印出來，nr.word所以我想這nr_words是一個Word物件串列。

但是，第二個for回圈遍歷nr_words串列（Word物件）的所有值，而不是它的索引。
因此，當您在第 4 行比較兩個 Word 物件時，我認為您應該簡單地將其nr用作方法的other引數compare()，而不是nr_words[nr].
)

編輯：
回復您的評論

nr_words 是一種空串列，因此當我可以將其與字典進行比較并在 nr_words 中附加不重復的單詞時。我也按照你說的嘗試通過 nr 但得到錯誤 AttributeError: 'str' object has no attribute 'word'

錯誤是因為當兩個單詞不相同時，您將w.word,w.start和附加w.end到nr_words串列（分別是字串、浮點數和浮點數）嘗試僅附加Word物件，如下所示：

更正的代碼：

filtered_list = []

for w in words:
    already_seen = False
    for seen in filtered_list:
        # print(seen.word)
        if w.compare(seen):
            already_seen = True
    if not already_seen:
        filtered_list.append(w)

# now filtered_list is the list
# of all your words without the duplicates
# (based on the `word` attribute)

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/436687.html

標籤：Python 列表字典重复

上一篇：如何在python字典中洗掉部分鍵

下一篇：如果第一個鍵作為子值存在，則比較兩個dict并在第二個更改鍵