跨多行刮一個句子|遞回錯誤未解決-有解無憂

目標：如果 pdf 行包含子字串，則復制整個句子（跨多行）。

我能夠print()在line與phrase中出現。

現在，一旦我找到了 this line，我想回傳迭代，直到找到一個句子終止符：. ! ?，從上一個句子開始，再次向前迭代直到下一個句子終止符。

這是因為我可以print()了解該短語所屬的整個句子。

但是，我scrape_sentence()遇到了無限運行的遞回錯誤。

Jupyter 筆記本：

# pip install PyPDF2
# pip install pdfplumber

# ---
# import re
import glob
import PyPDF2
import pdfplumber

# ---
phrase = "Responsible Care Company"
# SENTENCE_REGEX = re.pattern('^[A-Z][^?!.]*[?.!]$')

def scrape_sentence(sentence, lines, index, phrase):
    if '.' in lines[index] or '!' in lines[index] or '?' in lines[index]:
        return sentence.replace('\n', '').strip()
    sentence = scrape_sentence(lines[index-1]   sentence, lines, index-1, phrase)  # previous line
    sentence = scrape_sentence(sentence   lines[index 1], lines, index 1, phrase)  # following line    
    
    sentence = sentence.replace('!', '.')
    sentence = sentence.replace('?', '.')
    sentence = sentence.split('.')
    sentence = [s for s in sentence if phrase in s]
    sentence = sentence[0]  # first occurance
    print(sentence)
    
    return sentence
    
# ---    
    
with pdfplumber.open('../data/gri/reports/GPIC_Sustainability_Report_2020__-_40_Years_of_Sustainable_Success.pdf') as opened_pdf:
    for page in opened_pdf.pages:
        text = page.extract_text()
        lines = text.split('\n')
        i = 0
        sentence = ''
        while i < len(lines):
            if 'and Knowledge of Individuals; Behaviours; Attitudes, Perception ' in lines[i]:
                sentence = scrape_sentence('', lines, i)  # !
                print(sentence)  # !
            i  = 1

輸出：

connection and the linkage to the relevant UN’s 17 SDGs.and Leadership. We have long realized and recognized that there

短語：

Responsible Care Company

句子（跨多行）：

"GPIC is a Responsible Care Company certified for RC 14001 
since July 2010."

PDF（第 2 頁）。

如果還有什么我可以添加到帖子中，請告訴我。

uj5u.com熱心網友回復：

我在這里通過從scrape_sentence().

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/371662.html

標籤：Python 递归 pypdf2 pypdf pdf水管工

上一篇：使用遞回資料結構提升序列化最終導致堆疊溢位

下一篇：在Python中解決CryptArithmetic問題時陷入無限回圈