洗掉成績單時間戳并加入行以制作段落-有解無憂

檔案：純文本檔案
內容：Youtube 帶時間戳的成績單

洗掉成績單時間戳并加入行以制作段落

我可以分別洗掉每一行的時間戳：

for count, line in enumerate(content, start=1):
        if count % 2 == 0:
            s = line.replace('\n','')
            print(s)

如果我不洗掉時間戳，我也可以加入句子：

with open('file.txt') as f:
    print (" ".join(line.strip() for line in f))

但我試圖以各種格式一起做這些（洗掉時間戳并加入行），但沒有正確的結果：

with open('Russell Brand Script.txt') as m:
    for count, line in enumerate(m, start=1):
        if count % 2 == 0:
            sentence=line.replace('\n',' ')
            print(" ".join(sentence.rstrip('\n')))

我也嘗試了各種形式的print(" ".join(sentence.rstrip('\n')))andprint(" ".join(sentence.strip()))但結果總是以下之一：

洗掉成績單時間戳并加入行以制作段落

如何洗掉時間戳并加入句子以立即創建一個段落？

uj5u.com熱心網友回復：

每當您呼叫.join()字串時，它都會在字串的每個字符之間插入分隔符。您還應該注意print()，默認情況下，在列印字串后添加換行符。

為了解決這個問題，您可以將每個修改過的句子保存到一個串列中，然后使用"".join(). 這解決了上述換行問題，并讓您能夠在之后對段落進行額外處理（如果需要）。

with open('put_your_filename_here.txt') as m:
    sentences = []
    for count, line in enumerate(m, start=1):
        if count % 2 == 0:
            sentence=line.replace('\n', '')
            sentences.append(sentence)
    print(' '.join(sentences))

（對代碼做了一個小的編輯——舊版本的代碼在段落之后產生了一個尾隨空格。）

uj5u.com熱心網友回復：

TL;DR：使用串列推導式復制粘貼解決方案，將 if 作為過濾器和正則運算式以匹配時間戳： ' '.join([line.strip() for line in transcript if not re.match(r'\d{2}:\d{2}', line)])。

解釋

假設您給出的文本輸入是：

00:00
merry christmas it's our christmas video
00:03
to you i already regret this hat but if
00:05
we got some fantastic content for you a
00:07
look at the most joyous and wonderful
00:09
aspects have a very merry year ho ho ho

然后你可以忽略帶有正則運算式的時間戳\d{2}:\d{2}和append所有過濾的行作為串列的短語。使用strip()洗掉標題/尾隨空格修剪每個短語。但是，當您最終join將段落的所有短語都使用空格作為分隔符時：

import re

def to_paragraph(transcript_lines):
        phrases = []  
        for line in transcript_lines:
            trimmed = line.strip()
            if trimmed != '' and not re.matches(r'\d{2}:\d{2}', trimmed):
                phrases.append(trimmed)
            else:  # TODO: for debug only, remove
                print(line)  # TODO: for debug only, remove
        return " ".join(phrases) 

t = '''
00:00
merry christmas it's our christmas video
00:03
to you i already regret this hat but if
00:05
we got some fantastic content for you a
00:07
look at the most joyous and wonderful
00:09
aspects have a very merry year ho ho ho
'''

paragraph = to_paragraph(t.splitlines())
print(paragraph)

with open('put_your_filename_here.txt') as f:
     print(to_paragraph(f.readlines())

輸出：


00:00
00:03
00:05
00:07
00:09
('result:', "merry christmas it's our christmas video to you i already regret this hat but if we got some fantastic content for you a look at the most joyous and wonderful aspects have a very merry year ho ho ho")

結果與為給定youtube 視頻回傳的youtubetranscript.com相同。

轉載請註明出處，本文鏈接：https://www.uj5u.com/caozuo/396867.html

標籤：Python 细绳文本 YouTube

上一篇：用逗號將字串拆分為R中的多列

下一篇：使用StringBuilder而不是字符陣列旋轉字串