我試圖計算從 PDF 中提取的一系列單詞,但我只得到 0 并且它不正確。
total_number_of_keywords = 0
pdf_file = "CapitalCorp.pdf"
tables=[]
words = ['blank','warrant ','offering','combination ','SPAC','founders']
count={} # is a dictionary data structure in Python
with pdfplumber.open(pdf_file) as pdf:
pages = pdf.pages
for i,pg in enumerate(pages):
tbl = pages[i].extract_tables()
for elem in words:
count[elem] = 0
for line in f'{i} --- {tbl}' :
elements = line.split()
for word in words:
count[word] = count[word] elements.count(word)
print (count)
uj5u.com熱心網友回復:
這將完成這項作業:
import pdfplumber
pdf_file = "CapitalCorp.pdf"
words = ['blank','warrant ','offering','combination ','SPAC','founders']
# Get text
text = ''
with pdfplumber.open(pdf_file) as pdf:
for i, page in enumerate(pdf.pages):
text = text '\n' str(page.extract_text())
# Setup count dictionary
count = {}
for elem in words:
count[elem] = 0
# Count occurences
for i, el in enumerate(words):
count[f'{words[i]}'] = text.count(el)
首先,您將 PDF 的內容存盤在變數 中text,該變數是一個字串。
然后,您設定count字典,其中每個元素都有一個鍵,words并且各自的值都設定為 0。
最后,您使用該方法計算wordsin的每個元素的出現次數,并將其存盤在字典的相應鍵中。textcount()count
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/311212.html
