編碼領域:python3 中使用 pyPDF2 的 PDF 目錄
問題:我需要一個程式,它可以遍歷包含多個字典的聯合變數,然后是包含多個字典的多個串列。
[
{},
[{}, {}, {}],
{},
[{}, {}, {}],
{},
[{}, {}, {}]
]
這種模式重復多次。
預期輸出:輸出應如下所示
1 Title Goes Here
1.1 Title Goes Here
1.1.1 Title Goes Here
1.1.2 Title Goes Here
1.1.3 Title Goes Here
1.2 Title Goes Here
1.2.1 Title Goes Here
1.2.2 Title Goes Here
1.2.3 Title Goes Here
1.3 Title Goes Here
1.3.1 Title Goes Here
1.3.2 Title Goes Here
1.3.3 Title Goes Here
2 Title Goes Here
2.1 Title Goes Here
2.1.1 Title Goes Here
2.1.2 Title Goes Here
2.1.3 Title Goes Here
2.2 Title Goes Here
2.2.1 Title Goes Here
2.2.2 Title Goes Here
2.2.3 Title Goes Here
2.3 Title Goes Here
2.3.1 Title Goes Here
2.3.2 Title Goes Here
2.3.3 Title Goes Here
程式:
import argparse as arp
from PyPDF2 import PdfFileReader
parser = arp.ArgumentParser()
parser.add_argument("-f", "--file", help="File to analyse")
arg = parser.parse_args()
filename = arg.file
def fileread():
doc = PdfFileReader(filename)
ToC = doc.getOutlines()
# ToC: Union[List[Union[Destination, list]], {__eq__}] = doc.getOutlines()
for elements in ToC:
#print(elements)
#print("\n")
try:
if elements is {}: # If the element is a dictionary just find the Title
print(elements['/Title']) # TODO: This is just skipped
else: # If the element is a list go through and print out the titles
for nest_dict in elements:
try:
print(nest_dict["/Title"])
except:
continue
except:
continue
fileread()
我正在測驗這個程式:編譯器 - 原理、技術和工具-Pearson_Addison Wesley (2006).pdf
任何幫助深表感謝。
uj5u.com熱心網友回復:
這行是不對的:
if elements is {}: # If the element is a dictionary just find the Title
它應該改為:
if isinstance(elements, dict):
uj5u.com熱心網友回復:
使用下面的代碼,我可以從您的 pdf 檔案中獲得這樣的輸出:
輸出:
1 Introduction
1.1 Language Processors
1.1.1 Exercises for Section 1.1
1.2 The Structure of a Compiler
...
2 A Simple Syntax-Directed Translator
2.1 Introduction
2.2 Syntax Definition
2.2.1 Definition of Grammars
...
Python代碼:
import argparse as arp
from PyPDF2 import PdfFileReader
parser = arp.ArgumentParser()
parser.add_argument("-f", "--file", help="File to analyse")
arg = parser.parse_args()
filename = arg.file
def fileread():
doc = PdfFileReader(filename)
ToC = doc.getOutlines()
for elements in ToC:
try:
def print_title(input_data):
if isinstance(input_data, dict):
print(input_data['/Title'])
else:
for nest_dict in input_data:
try:
print_title(nest_dict)
except:
continue
print_title(elements)
except:
continue
fileread()
我不是 Python 方面的專家,但希望這會對你有所幫助。順便說一句,您可以在此處閱讀有關 Python 遞回的一些資訊
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/420071.html
標籤:
