我有一個檔案的標題和副標題串列。
test_list = ['heading', 'heading','sub-heading', 'sub-heading', 'heading', 'sub-heading', 'sub-sub-heading', 'sub-sub-heading', 'sub-heading', 'sub-heading', 'sub-sub-heading', 'sub-sub-heading','sub-sub-heading', 'heading']
我想為每個標題和子標題分配唯一索引,如下所示:
seg_ids = ['1', '2', '2_1', '2_2', '3', '3_1', '3_1_1', '3_1_2', '3_2', '3_3', '3_3_1', '3_3_2', '3_3_3', '4']
這是我創建此結果的代碼,但它很混亂,并且僅限于深度 3。如果有任何帶有子子標題的檔案,代碼會變得更加復雜。有什么pythonic方法可以做到這一點嗎?
seg_ids = []
for idx, an_ele in enumerate(test_list):
head_id = 0
subh_id = 0
subsubh_id = 0
if an_ele == 'heading' and idx == 0: # if it is the first element
head_id = '1'
seg_ids.append(head_id)
else:
last_seg_ids = seg_ids[idx-1].split('_') # find the depth of the last element
head_id = last_seg_ids[0]
if len(last_seg_ids) == 2:
subh_id = last_seg_ids[1]
elif len(last_seg_ids) == 3:
subh_id = last_seg_ids[1]
subsubh_id = last_seg_ids[2]
if an_ele == 'heading':
head_id= str(int(head_id) 1)
subh_id = 0 # reset sub_heading index
subsubh_id = 0 # reset sub_sub_heading index
elif an_ele == 'sub-heading':
subh_id= str(int(subh_id) 1)
subsubh_id = 0 # reset sub_sub_heading index
elif an_ele == 'sub-sub-heading':
subsubh_id= str(int(subsubh_id) 1)
else:
print('ERROR')
if subsubh_id==0:
if subh_id !=0:
seg_ids.append(head_id '_' subh_id)
else:
seg_ids.append(head_id)
if subsubh_id !=0:
seg_ids.append(str(head_id) '_' str(subh_id) '_' str(subsubh_id))
print(seg_ids)
uj5u.com熱心網友回復:
def get_level(s):
return s.count('-')
def translate(test_list):
seg_ids = []
levels = [0]*9
last_level = 99
for an_ele in test_list:
level = get_level(an_ele)
if level <= last_level:
levels[level] = 1
else:
levels[level] = 1
seg_ids.append( '_'.join(str(k) for k in levels[:level 1]))
last_level = level
return seg_ids
print(translate(['heading', 'heading','sub-heading', 'sub-heading', 'heading', 'sub-heading', 'sub-sub-heading', 'sub-sub-heading', 'sub-heading', 'sub-heading', 'sub-sub-heading', 'sub-sub-heading','sub-sub-heading', 'heading']))
輸出:
['1', '2', '2_1', '2_2', '3', '3_1', '3_1_1', '3_1_2', '3_2', '3_3', '3_3_1', '3_3_2', '3_3_3', '4']
這將最大級別數固定為 9。如果新級別超出末尾,您可以通過設定levels=[0]然后擴展它來擴展它,但這說明了重點。
uj5u.com熱心網友回復:
您可以使用該split('-')方法來查找標題的級別:
subs_amount = an_ele.split('-')
您可以從串列的長度推斷標題的級別subs_amount。如果長度為 1,則為"heading"。如果是 3,則為"sub-sub-heading". 等等然后,有一個串列store_levels來存盤更高級別的先前標題的索引,就像 Tim Roberts 在他們的評論中所說:
if len(subs_amount) > len(store_levels):
store_levels.append(1) #add a sub-level
elif len(subs_amount) == len(store_levels):
store_levels[-1] = 1 #add a heading of the same level
else:
del store_levels[-1] #go back to the level above
現在,要構建您的輸出,您只需要將"_".join(store_levels)其附加到輸出中。
很抱歉沒有使用與您相同的變數名。我這樣做不是為了混淆或改變它們的用途。我希望我的代碼足夠清晰,以便您可以將其實作到您的代碼中。
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/452137.html
