- 檔案:純文本檔案
- 內容:Youtube 帶時間戳的成績單

我可以分別洗掉每一行的時間戳:
for count, line in enumerate(content, start=1):
if count % 2 == 0:
s = line.replace('\n','')
print(s)
如果我不洗掉時間戳,我也可以加入句子:
with open('file.txt') as f:
print (" ".join(line.strip() for line in f))
但我試圖以各種格式一起做這些(洗掉時間戳并加入行),但沒有正確的結果:
with open('Russell Brand Script.txt') as m:
for count, line in enumerate(m, start=1):
if count % 2 == 0:
sentence=line.replace('\n',' ')
print(" ".join(sentence.rstrip('\n')))
我也嘗試了各種形式的print(" ".join(sentence.rstrip('\n')))andprint(" ".join(sentence.strip()))但結果總是以下之一:

如何洗掉時間戳并加入句子以立即創建一個段落?
uj5u.com熱心網友回復:
每當您呼叫.join()字串時,它都會在字串的每個字符之間插入分隔符。您還應該注意print(),默認情況下,在列印字串后添加換行符。
為了解決這個問題,您可以將每個修改過的句子保存到一個串列中,然后使用"".join(). 這解決了上述換行問題,并讓您能夠在之后對段落進行額外處理(如果需要)。
with open('put_your_filename_here.txt') as m:
sentences = []
for count, line in enumerate(m, start=1):
if count % 2 == 0:
sentence=line.replace('\n', '')
sentences.append(sentence)
print(' '.join(sentences))
(對代碼做了一個小的編輯——舊版本的代碼在段落之后產生了一個尾隨空格。)
uj5u.com熱心網友回復:
TL;DR:使用串列推導式復制粘貼解決方案,將 if 作為過濾器和正則運算式以匹配時間戳:
' '.join([line.strip() for line in transcript if not re.match(r'\d{2}:\d{2}', line)])。
解釋
假設您給出的文本輸入是:
00:00
merry christmas it's our christmas video
00:03
to you i already regret this hat but if
00:05
we got some fantastic content for you a
00:07
look at the most joyous and wonderful
00:09
aspects have a very merry year ho ho ho
然后你可以忽略帶有正則運算式的時間戳\d{2}:\d{2}和append所有過濾的行作為串列的短語。使用strip()洗掉標題/尾隨空格修剪每個短語。但是,當您最終join將段落的所有短語都使用空格作為分隔符時:
import re
def to_paragraph(transcript_lines):
phrases = []
for line in transcript_lines:
trimmed = line.strip()
if trimmed != '' and not re.matches(r'\d{2}:\d{2}', trimmed):
phrases.append(trimmed)
else: # TODO: for debug only, remove
print(line) # TODO: for debug only, remove
return " ".join(phrases)
t = '''
00:00
merry christmas it's our christmas video
00:03
to you i already regret this hat but if
00:05
we got some fantastic content for you a
00:07
look at the most joyous and wonderful
00:09
aspects have a very merry year ho ho ho
'''
paragraph = to_paragraph(t.splitlines())
print(paragraph)
with open('put_your_filename_here.txt') as f:
print(to_paragraph(f.readlines())
輸出:
00:00
00:03
00:05
00:07
00:09
('result:', "merry christmas it's our christmas video to you i already regret this hat but if we got some fantastic content for you a look at the most joyous and wonderful aspects have a very merry year ho ho ho")
結果與為給定youtube 視頻回傳的youtubetranscript.com相同。
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/396867.html
上一篇:用逗號將字串拆分為R中的多列
