1.簡介
PyPDF的前身是PyPDF包在2005年發布,該包的最后一個版本發布于2010年,后來大約經過一年左右,名為Phasit的公司贊助PyPDF的一個分支后來命名為PyPDF2,兩個版本功能都基本一樣,最大的區別就是PyPDF2中加入了支持Python3功能,后面又出現了PyPDF3、PyPDF4等不同版本,但這些包并沒有對PyPDF2功能向后完全兼容,受歡迎程度當然也不如PyPDF2,
2.安裝
使用命令:pip install pypdf2,
3.應用1:將單個PDF拆分為多個PDF檔案
#! /usr/bin/env python3 # -*- coding:utf-8 -*- # Author : MaYi # Blog : http://www.cnblogs.com/mayi0312/ # Date : 2022-08-19 # Name : test01 # Software : PyCharm # Note : 使用PyPDF2模塊將單個PDF拆分為多個PDF檔案 # 匯入模塊 from PyPDF2 import PdfFileReader, PdfFileWriter # pdf 檔案 pdf_name = "test.pdf" pdf_reader = PdfFileReader(pdf_name) # PDF頁數 page_num = pdf_reader.getNumPages() i_count = 0 # 計數 for i in range(0, page_num, 20): # 每20頁拆分成一個新的PDF檔案 i_count += 1 pdf_writer = PdfFileWriter() for j in range(i, min(i + 20, page_num)): pdf_writer.addPage(pdf_reader.getPage(j)) save_pdf_name = str(i_count).zfill(3) + ".pdf" with open(save_pdf_name, "wb") as fo: pdf_writer.write(fo)
4.應用2:將多個PDF合并為一個PDF檔案
#! /usr/bin/env python3 # -*- coding:utf-8 -*- # Author : MaYi # Blog : http://www.cnblogs.com/mayi0312/ # Date : 2022-08-19 # Name : test02 # Software : PyCharm # Note : 使用PyPDF2模塊將多個PDF合并為一個PDF檔案 # 匯入模塊 from PyPDF2 import PdfFileReader, PdfFileWriter # 需要合并的PDF檔案名串列 merge_pdf_names = ["001.pdf", "002.pdf", "003.pdf", "004.pdf"] merge_writer = PdfFileWriter() # 遍歷處理每一個PDF檔案 for pdf_name in merge_pdf_names: curr_reader = PdfFileReader(pdf_name) page_num = curr_reader.getNumPages() for i in range(page_num): merge_writer.addPage(curr_reader.getPage(i)) with open("merge.pdf", "wb") as fo: merge_writer.write(fo)
5.應用3:對PDF添加水印
#! /usr/bin/env python3 # -*- coding:utf-8 -*- # Author : MaYi # Blog : http://www.cnblogs.com/mayi0312/ # Date : 2022-08-19 # Name : test03 # Software : PyCharm # Note : 使用PyPDF2模塊對PDF添加水印 # 匯入模塊 from PyPDF2 import PdfFileReader, PdfFileWriter # pdf 檔案 pdf_name = "test.pdf" # 水印pdf 檔案 water_mark = "watermark.pdf" # 加完水印的pdf 檔案 new_pdf_name = "test_watermark.pdf" water_mark_page = PdfFileReader(water_mark).getPage(0) pdf_reader = PdfFileReader(pdf_name) pdf_writer = PdfFileWriter() page_num = pdf_reader.getNumPages() for i in range(page_num): page = pdf_reader.getPage(i) page.mergePage(water_mark_page) pdf_writer.addPage(page) with open(new_pdf_name, "wb") as fo: pdf_writer.write(fo)
6.應用4:對PDF加密
#! /usr/bin/env python3 # -*- coding:utf-8 -*- # Author : MaYi # Blog : http://www.cnblogs.com/mayi0312/ # Date : 2022-08-19 # Name : test04 # Software : PyCharm # Note : 使用PyPDF2模塊對PDF檔案加密 # 匯入模塊 from PyPDF2 import PdfFileReader, PdfFileWriter # pdf 檔案 pdf_name = "test.pdf" # 加密后的pdf 檔案 new_pdf_name = "test_encryption.pdf" pdf_reader = PdfFileReader(pdf_name) pdf_writer = PdfFileWriter() page_num = pdf_reader.getNumPages() for i in range(page_num): page = pdf_reader.getPage(i) pdf_writer.addPage(page) # 加密 pdf_writer.encrypt(user_pwd="mayi", use_128bit=True) with open(new_pdf_name, "wb") as fo: pdf_writer.write(fo)
7.應用5:對PDF解密
#! /usr/bin/env python3 # -*- coding:utf-8 -*- # Author : MaYi # Blog : http://www.cnblogs.com/mayi0312/ # Date : 2022-08-19 # Name : test05 # Software : PyCharm # Note : 使用PyPDF2模塊對PDF檔案解密 # 匯入模塊 from PyPDF2 import PdfFileReader, PdfFileWriter # pdf 檔案 pdf_name = "test_encryption.pdf" # 解密后的pdf 檔案 new_pdf_name = "test.pdf" # 密碼 pass_word = "mayi" pdf_reader = PdfFileReader(pdf_name) # 解密 pdf_reader.decrypt(pass_word) pdf_writer = PdfFileWriter() page_num = pdf_reader.getNumPages() for i in range(page_num): page = pdf_reader.getPage(i) pdf_writer.addPage(page) with open(new_pdf_name, "wb") as fo: pdf_writer.write(fo)
8.應用6:獲取PDF檔案的基本資訊
使用PyPDF2可以從PDF中提取到一些元資料和文本資訊,對PDF有個大致了解,
#! /usr/bin/env python3 # -*- coding:utf-8 -*- # Author : MaYi # Blog : http://www.cnblogs.com/mayi0312/ # Date : 2022-08-19 # Name : test06 # Software : PyCharm # Note : 使用PyPDF2模塊獲取PDF檔案的基本資訊 # 匯入模塊 from PyPDF2 import PdfFileReader # pdf 檔案 pdf_name = "test.pdf" with open(pdf_name, 'rb') as f: pdf = PdfFileReader(f) # 獲取PDF檔案的基本資訊 infomation = pdf.getDocumentInfo() # 作者 author = infomation.author # 創建者 creator = infomation.creator # 制作者 producer = infomation.producer # Subject(主題) subject = infomation.subject # 標題 title = infomation.title # 頁數 page_num = pdf.getNumPages() # 列印獲取的基本資訊 print("作者:%s\t創建者:%s\t制作者:%s\t主題:%s\t標題:%s\t頁數:%s" % (author, creator, producer, subject, title, page_num))
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/502295.html
標籤:Python
上一篇:爬取 flbook 檔案
