呼叫Python物件時超出最大遞回深度-相同代碼適用于某些公司，但不適用于其他公司-有解無憂

我有以下代碼：

import requests
import urllib
from bs4 import BeautifulSoup
import re
  

master_data=[{'cik_number': '1556179', 'company_name': 'RMR Industrials, Inc.', 'form_id': '10-K', 'date': '20200103', 'file_url': 'https://www.sec.gov/Archives/edgar/data/1556179/0001104659-20-000861.txt'}]

sentence_regex = re.compile(r"\b[A-Z](?:[^\.!?]|\.\d)*[\.!?]")

def identify_sentences(input_text:str):
    """Returns all sentences in the input text"""
    sentences = re.findall(sentence_regex, input_text)
    return sentences 

rdterms=['research and development','R&D','product development','research, development',
          'research, engineering, and development','research and product development']

# creates a list of earnings regex expressions 
rdterms_regex=[re.compile(r'\b'   term   r'\b', re.IGNORECASE) 
                    for term in rdterms] 

def rdsentence(sentence:str):
    """Checks whether a sentence is R&D-oriented."""
    for term in rdterms_regex:    
        if term.search(sentence): 
            return True 
    return False

for entry in master_data: 
    path=entry['file_url']
    r=requests.get(path, headers={"User-Agent": "b2g"})
    content=r.content.decode('utf8')
    soup=BeautifulSoup(content, "html5lib")
    soup=str(soup)
    entry['count']=0
    sentences=identify_sentences(soup)
    for sentence in sentences:
        if rdsentence(sentence) is True:
            entry['count']=entry['count'] 1
        else:
            continue

print(master_data)
len(master_data)

這是錯誤訊息：

呼叫 Python 物件時超出最大遞回深度 - 相同代碼適用于某些公司，但不適用于其他公司

如果我將 master_data 行更改為

master_data=[{'cik_number': '1041588', 'company_name': 'ACCESS-POWER INC', 'form_id': '10-K', 'date': '20200102', 'file_url': 'https://www.sec.gov/Archives/edgar/data/1041588/0001041588-20-000001.txt'}]

一切正常。為什么代碼對某些公司有效，而對其他公司無效？我應該如何修改代碼？謝謝！

uj5u.com熱心網友回復：

問題是 str(soup) 沒有明確定義，并將 html5lib 拋出一個無限回圈。正確的是

soup = soup.text

轉載請註明出處，本文鏈接：https://www.uj5u.com/caozuo/417800.html

標籤：

下一篇：遞回函式多次列印零