更新:我對一個字串中的多個句子感興趣。
我一直在關注這個方便的教程,它提供了我的要求的變化。
如何只將多個句子的第一個字母大寫?
句子是三個之一:. ! ?。
代碼:
PDF,第 3 頁
from io import StringIO
from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage
def convert_pdf_to_string(file_path):
output_string = StringIO()
with open(file_path, 'rb') as in_file:
parser = PDFParser(in_file)
doc = PDFDocument(parser)
rsrcmgr = PDFResourceManager()
device = TextConverter(rsrcmgr, output_string, laparams=LAParams())
interpreter = PDFPageInterpreter(rsrcmgr, device)
for page in PDFPage.create_pages(doc):
interpreter.process_page(page)
return(output_string.getvalue())
text = convert_pdf_to_string('GPIC_Sustainability_Report_2016-v9_(lr).pdf')
print(text)
text:
In 2012, Gulf Petrochemical InDuStRiEs Company becomes part of \nthe global transformation for a sustainable future by committing to \nthe United Nations Global Compact’s ten principles in the realms \nof Human Rights, Labour, Environment and Anti-Corruption. \n\nGPIC becomes an organizational stakeholder of Global Reporting \nInitiative ( GRI) in 2014.
所需文字:
In 2012, Gulf Petrochemical Industries Company becomes part of the global transformation for a sustainable future by committing to the United Nations Global Compact’s ten principles in the realms of Human Rights, Labour, Environment and Anti-Corruption. GPIC becomes an organizational stakeholder of Global Reporting Initiative ( GRI) in 2014.
更新代碼: 可以在任何地方添加
text = text.replace('\n', '')
text = text.replace('\x0c', '')
請讓我知道我是否應該澄清其他任何事情。
uj5u.com熱心網友回復:
s = 'This is An ExAmplE senTENCE.'
s.capitalize()
>> 'This is an example sentence.'
嘗試這個:
from nltk import tokenize
paragraph = "Hello there. How are you?"
sentences = tokenize.sent_tokenize(p)
capitalized = [s.capitalize() for s in sentences]
new_paragraph = ''.join(capitalized)
uj5u.com熱心網友回復:
'.'.join([i.capitalize() for i in s.split('.')])
對于很多句子^
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/370905.html
