我有 2 個串列:
tokens = ['[CLS]', 'Thinking', 'historically', 'is', ',', 'first', ',', 'an', 'attitude', 'acknowledging', 'that', 'every', 'event', 'can', 'be', 'meaningful', '##ly', 'understood', 'only', 'in', 'relation', 'to', 'previous', 'events', ',', 'and', ',', 'second', ',', 'the', 'method', '##ical', 'application', 'of', 'this', 'attitude', ',', 'which', 'en', '##tails', 'both', 'analyzing', 'events', 'context', '##ually', '-', '-', 'as', 'having', 'occurred', 'in', 'the', 'midst', 'of', 'pre', '-', 'existing', 'circumstances', '-', '-', 'and', 'comprehend', '##ing', 'them', 'from', 'historical', 'actors', '[SEP]', '[PAD]', '[PAD]', '[PAD]']
labels = [0, 0, 0, 0, 0, 2, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 2, 0, 0, 0]
我還有一本字典,將標簽映射到它們的含義:
labels_meaning = {}
labels_meaning[0] = 'subject'
labels_meaning[1] = 'relation'
labels_meaning[2] = 'object'
labels_meaning[3] = 'na'
目標是將每個字串放在它們對應的labels_list(忽略na)中:
subjects = []
relations = []
objects = []
有3個條件:
- 將具有連續標簽(例如,0、0、0)的標記組合成一個字串。例如,前 5 個標簽是 0,因此第一個字串應該是
"[CLS] Thinking historically is ,",它應該附加到相應的labels_list:subjects.append(string) - 如果令牌中包含字串
"##",則應將其與前一個字串連接,不帶空格。例如,"meaningful", "##ly" --> "meaningfully"。假設它們具有相同的標簽。否則"##"應該洗掉,字串應該附加到相應的labels_list:subjects.append("ly") - 應該忽略一些標記:
[CLS], [SEP], [PAD]
更新:
添加我的嘗試,但我堅持組合連續的標記
labels_meaning = {}
labels_meaning[0] = 'subject'
labels_meaning[1] = 'relation'
labels_meaning[2] = 'object'
labels_meaning[3] = 'na'
ignore = ['[CLS]', '[SEP]', '[PAD]']
def get_sentence_triples_from_token_labels(tokens, token_labels):
for tok, label in zip(tokens, token_labels):
current_label = label
if tok == '[CLS]': # initialize
previous_label = current_label
prev = False
current_string = ''
if tok not in ignore:
if previous_label != current_label and prev==True:
current_string = f'{tok} '
pass
else:
pass
prev = True
break
get_sentence_triples_from_token_labels(tokens, labels)
uj5u.com熱心網友回復:
解決方案
不確定這是否是您想要的。
labels_meaning = { 0:'subject', 1:'relation', 2:'object', 3:'na' }
ignore = ['[CLS]', '[SEP]', '[PAD]']
tokens = ['[CLS]', 'Thinking', 'historically', 'is', ',', 'first', ',', 'an', 'attitude', 'acknowledging', 'that', 'every', 'event', 'can', 'be', 'meaningful', '##ly', 'understood', 'only', 'in', 'relation', 'to', 'previous', 'events', ',', 'and', ',', 'second', ',', 'the', 'method', '##ical', 'application', 'of', 'this', 'attitude', ',', 'which', 'en', '##tails', 'both', 'analyzing', 'events', 'context', '##ually', '-', '-', 'as', 'having', 'occurred', 'in', 'the', 'midst', 'of', 'pre', '-', 'existing', 'circumstances', '-', '-', 'and', 'comprehend', '##ing', 'them', 'from', 'historical', 'actors', '[SEP]', '[PAD]', '[PAD]', '[PAD]']
labels = [0, 0, 0, 0, 0, 2, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 2, 0, 0, 0]
subjects = []
relations = []
objects = []
def get_sentence_triples_from_token_labels(tokens, token_labels):
dicRlt = {lab:[] for lab in [0,1,2,3]}
last_label = token_labels[0]
for tok, label in zip(tokens, token_labels):
if tok not in ignore:
if last_label != label:
if label == 0:
subjects.append(" ".join(dicRlt[0]).replace(" ##",""))
elif label == 1:
relations.append(" ".join(dicRlt[1]).replace(" ##",""))
elif label == 2:
objects.append(" ".join(dicRlt[2]).replace(" ##",""))
dicRlt[label]=[]
dicRlt[label].append(tok)
last_label = label
subjects.append(" ".join(dicRlt[0]).replace(" ##",""))
relations.append(" ".join(dicRlt[1]).replace(" ##",""))
objects.append(" ".join(dicRlt[2]).replace(" ##",""))
return
測驗
print(subjects) print(relations) print(objects)輸出:
['Thinking historically is ,', 'attitude', 'that every event can be meaningfully understood only', 'relation', 'previous events ,', ', second', 'methodical', 'attitude', 'entails', 'events contextually -', 'as having occurred in', 'circumstances -', 'comprehend', 'them from', 'actors'] ['', ', an', 'acknowledging', 'in', 'to', 'and', ', the', 'application of this', ', which', 'both analyzing', '-', 'the midst of pre - existing', '- and', '##ing', 'historical'] ['', 'first']
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/517889.html
上一篇:字串連接c 給出了意想不到的答案
下一篇:計算字串長度但排除某些字符
