似乎在 Python 出現的這些年里已經解決了這個問題,但無論如何,這里仍然存在:
def soupstrainer(tag_element,srch_str):
''' take a soup element
return a list of found items
'''
results=[]
###literal search string returns results, even though two lines down,
### print(srch_str) returns the expected string
souper=tag_element.find_all('a',{'data-tn-element':'companyName'}) #srch_str)
print(srch_str)
for r in souper:
if r != None:
results.append(r.get_text(r.string, strip=True))
return results
with open('scrapesnip.html', 'r') as the_file:
doc4 = the_file.read()
soup = BeautifulSoup(doc4, 'html.parser')
result = soupstrainer(soup,str("'a',{'data-tn-element':'companyName'}"))
print(result,len(result))
結果:
## zero results passing the string
/PYscripts/bravosierra4.py
'a',{'data-tn-element':'companyName'} <=== these two strings *look* identical
[] 0
## with the identical string
## 'hard coded' into the function
/PYscripts/bravosierra4.py
'a',{'data-tn-element':'companyName'} <=== these two strings *look identical
['Keysight Technologies', 'ECS Federal LLC', 'Corsica Technologies, LLC', 'Caribou', 'Collins Aerospace', 'Travelers', 'CyberCoders', 'HealthVerity', 'Circadence Corporation'] 9
我是不是傳srch_string錯了?
uj5u.com熱心網友回復:
我不確定你到底是如何通過srch_string的,但是這個:
souper = tag_element.find_all('a', {'data-tn-element': 'companyName'})
與此不同:
srch_string = "'a', {'data-tn-element': 'companyName'}"
souper = tag_element.find_all(srch_string)
在第一種情況下,您將一個字串和一個字典作為單獨的引數傳遞。在第二種情況下,您傳遞的是單個字串。您放入字串變數的代碼不會被評估為其他運算式中的代碼(如果這樣做,那將是一個非常大的問題)。
你可以這樣做:
def soupstrainer(tag_element, *srch_args):
"""take a soup element and search args, return a list of found items"""
souper = tag_element.find_all(*srch_args)
return [r.get_text(r.string, strip=True) for r in souper if r is not None]
...
result = soupstrainer(soup, ,'a', {'data-tn-element': 'companyName'})
這樣soupstrainer只需將搜索引數作為單獨的引數(而不是將它們打包成單個字串)并將它們直接傳遞給find_all.
uj5u.com熱心網友回復:
看起來 BeautifulSoup 將 find_all 決議為自然串列,因此當它顯示為字串時,不會回傳任何內容。這是我成功撰寫呼叫函式的行的方式:
result = soupstrainer(soup,['a',{'data-tn-element':'companyName'}])
并像這樣重新編碼函式:
def soupstrainer(tag_element,srch_str):
''' take a soup element
return a list of found items
'''
results=[]
souper=tag_element.find_all(srch_str[0],srch_str[1])
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/420270.html
標籤:
