如何根據另一個串列和條件中的值組合連續的字串？-有解無憂

我有 2 個串列：

tokens = ['[CLS]', 'Thinking', 'historically', 'is', ',', 'first', ',', 'an', 'attitude', 'acknowledging', 'that', 'every', 'event', 'can', 'be', 'meaningful', '##ly', 'understood', 'only', 'in', 'relation', 'to', 'previous', 'events', ',', 'and', ',', 'second', ',', 'the', 'method', '##ical', 'application', 'of', 'this', 'attitude', ',', 'which', 'en', '##tails', 'both', 'analyzing', 'events', 'context', '##ually', '-', '-', 'as', 'having', 'occurred', 'in', 'the', 'midst', 'of', 'pre', '-', 'existing', 'circumstances', '-', '-', 'and', 'comprehend', '##ing', 'them', 'from', 'historical', 'actors', '[SEP]', '[PAD]', '[PAD]', '[PAD]']
labels = [0, 0, 0, 0, 0, 2, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 2, 0, 0, 0]

我還有一本字典，將標簽映射到它們的含義：

labels_meaning = {}
labels_meaning[0] = 'subject'
labels_meaning[1] = 'relation'
labels_meaning[2] = 'object'
labels_meaning[3] = 'na'

目標是將每個字串放在它們對應的labels_list（忽略na）中：

subjects = []
relations = []
objects = []

有3個條件：

將具有連續標簽（例如，0、0、0）的標記組合成一個字串。例如，前 5 個標簽是 0，因此第一個字串應該是"[CLS] Thinking historically is ,"，它應該附加到相應的labels_list：subjects.append(string)
如果令牌中包含字串"##"，則應將其與前一個字串連接，不帶空格。例如，"meaningful", "##ly" --> "meaningfully"。假設它們具有相同的標簽。否則"##"應該洗掉，字串應該附加到相應的labels_list：subjects.append("ly")
應該忽略一些標記：[CLS], [SEP], [PAD]

更新：

添加我的嘗試，但我堅持組合連續的標記

labels_meaning = {}
labels_meaning[0] = 'subject'
labels_meaning[1] = 'relation'
labels_meaning[2] = 'object'
labels_meaning[3] = 'na'
ignore = ['[CLS]', '[SEP]', '[PAD]']

def get_sentence_triples_from_token_labels(tokens, token_labels):
    for tok, label in zip(tokens, token_labels):
        current_label = label
        if tok == '[CLS]': # initialize
            previous_label = current_label
            prev = False
            current_string = ''
        if tok not in ignore:
            if previous_label != current_label and prev==True:
                current_string = f'{tok} ' 
                pass
                
            else:
                pass
            
            prev = True


        break


get_sentence_triples_from_token_labels(tokens, labels)

uj5u.com熱心網友回復：

解決方案

不確定這是否是您想要的。

labels_meaning = { 0:'subject', 1:'relation', 2:'object', 3:'na' }
ignore = ['[CLS]', '[SEP]', '[PAD]']


tokens = ['[CLS]', 'Thinking', 'historically', 'is', ',', 'first', ',', 'an', 'attitude', 'acknowledging', 'that', 'every', 'event', 'can', 'be', 'meaningful', '##ly', 'understood', 'only', 'in', 'relation', 'to', 'previous', 'events', ',', 'and', ',', 'second', ',', 'the', 'method', '##ical', 'application', 'of', 'this', 'attitude', ',', 'which', 'en', '##tails', 'both', 'analyzing', 'events', 'context', '##ually', '-', '-', 'as', 'having', 'occurred', 'in', 'the', 'midst', 'of', 'pre', '-', 'existing', 'circumstances', '-', '-', 'and', 'comprehend', '##ing', 'them', 'from', 'historical', 'actors', '[SEP]', '[PAD]', '[PAD]', '[PAD]']
labels = [0, 0, 0, 0, 0, 2, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 2, 0, 0, 0]

subjects = []
relations = []
objects = []

def get_sentence_triples_from_token_labels(tokens, token_labels):
    dicRlt = {lab:[] for lab in [0,1,2,3]}
    last_label = token_labels[0]
    for tok, label in zip(tokens, token_labels):
        if tok not in ignore:
            if last_label != label:
                if label == 0: 
                    subjects.append(" ".join(dicRlt[0]).replace(" ##",""))
                elif label == 1:
                    relations.append(" ".join(dicRlt[1]).replace(" ##",""))
                elif label == 2: 
                    objects.append(" ".join(dicRlt[2]).replace(" ##",""))
                dicRlt[label]=[]
            dicRlt[label].append(tok)                
            last_label = label
    subjects.append(" ".join(dicRlt[0]).replace(" ##",""))
    relations.append(" ".join(dicRlt[1]).replace(" ##",""))
    objects.append(" ".join(dicRlt[2]).replace(" ##",""))
    return

測驗

print(subjects)
print(relations)
print(objects)

輸出：

['Thinking historically is ,', 'attitude', 'that every event can be meaningfully understood only', 'relation', 'previous events ,', ', second', 'methodical', 'attitude', 'entails', 'events contextually -', 'as having occurred in', 'circumstances -', 'comprehend', 'them from', 'actors']
['', ', an', 'acknowledging', 'in', 'to', 'and', ', the', 'application of this', ', which', 'both analyzing', '-', 'the midst of pre - existing', '- and', '##ing', 'historical']
['', 'first']

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/517889.html

標籤：Python细绳列表条件语句

上一篇：字串連接c 給出了意想不到的答案

下一篇：計算字串長度但排除某些字符