我正在嘗試從一個存盤田徑時間的網站athlete.net 中獲取每個賽季給定運動員的串列,他們參加的每項賽事,以及他們每次獲得的每項賽事的時間。
到目前為止,我已經列印了賽季標題和每個事件的名稱。我現在正試圖從一大堆a標簽中篩選出時間。我試過使用find_next('a'),find_next_sibling('a')但正在努力隔離時代。
for text in soup.find_all('h5'):
#print season titles and event name neatly
if "Season" in str(text):
text_file.write(('\n' '\n' str(text.contents[0])) '\n')
else:
text_file.write(str(text.contents[0]) '\n')
#print all siblings
for i in range(0,100):
try:
text = text.find_next_sibling()
text_file.write(str(text) '\n')
except:
print("miss")
到目前為止,我所能做的就是列印所有兄弟姐妹,其中包含所有時間。例如:
<table class="table table-sm table-responsive table-hover"><tbody><tr id="rID_162222827"><td class="text-nowrap" style="width:35px;"><a href="" ng-click="appC.edit(162222827)" ng-if="appC.params.edit=='mark' && appC.params.canEdit"><i class="far fa-pencil text-primary mr-2"></i></a><i>9 </i></td><td style="width:110px;"><span class="mRight5 m-1"><i class="far fa-fw"></i></span><a href="/result/AWi088nH1S1rZxdSN">2:10.97</a></td><td class="text-nowrap" style="width: 60px;">Mar 4</td><td><a href="meet/443782#53587">Sunset Invitational</a></td><td class="text-muted text-right text-nowrap">O F</td></tr><tr id="rID_164098252"><td class="text-nowrap" style="width:35px;"><a href="" ng-click="appC.edit(164098252)" ng-if="appC.params.edit=='mark' && appC.params.canEdit"><i class="far fa-pencil text-primary mr-2"></i></a><i>60 </i></td><td style="width:110px;"><span class="mRight5 m-1"><i class="far fa-fw"></i></span><a href="/result/R3iEDYqsnSQ48l0h8">2:05.56</a><a href="/post/R3iEDYqsnSQ48l0h8" rel="nofollow"><small class="text-muted pr-text" style="font-weight:normal; margin-left: 4px;" uib-tooltip="Personal Record">PR</small></a></td><td class="text-nowrap" style="width: 60px;">Mar 19</td><td><a href="meet/441280#53587">Dublin Distance Fiesta</a></td><td class="text-muted text-right text-nowrap">V F</td></tr><tr id="rID_164212389"><td class="text-nowrap" style="width:35px;"><a href="" ng-click="appC.edit(164212389)" ng-if="appC.params.edit=='mark' && appC.params.canEdit"><i class="far fa-pencil text-primary mr-2"></i></a><i>3 </i></td><td style="width:110px;"><span class="mRight5 m-1"><i class="far fa-fw"></i></span><a href="/result/16ik6rkINSN6VJEHy">2:18.54</a></td><td class="text-nowrap" style="width: 60px;">Mar 26</td><td><a href="meet/459101#53587">PSAL League Meet #1</a></td><td class="text-muted text-right text-nowrap">V F</td></tr><tr id="rID_174827223"><td class="text-nowrap" style="width:35px;"><a href="" ng-click="appC.edit(174827223)" ng-if="appC.params.edit=='mark' && appC.params.canEdit"><i class="far fa-pencil text-primary mr-2"></i></a><i>26 </i></td><td style="width:110px;"><span class="mRight5 m-1"><i class="far fa-fw"></i></span><a href="/result/gBivaaKIVSZRLp2UA">2:10.58</a></td><td class="text-nowrap" style="width: 60px;">Apr 9</td><td><a href="meet/443768#53587">Cupertino/De Anza Invite</a></td><td class="text-muted text-right text-nowrap">V F</td></tr><tr id="rID_168470829"><td class="text-nowrap" style="width:35px;"><a href="" ng-click="appC.edit(168470829)" ng-if="appC.params.edit=='mark' && appC.params.canEdit"><i class="far fa-pencil text-primary mr-2"></i></a><i>50 </i></td><td style="width:110px;"><span class="mRight5 m-1"><i class="far fa-fw"></i></span><a href="/result/vOi3ydru3SxDoBWso">2:13.20</a></td><td class="text-nowrap" style="width: 60px;">Apr 16</td><td><a href="meet/445132#53587">Granada Distance & Sprint Festival</a></td><td class="text-muted text-right text-nowrap">O F</td></tr></tbody></table>
此輸出包含該運動員在最近一個賽季中的所有時間。
a當有各種標簽不包含時間時,如何篩選以僅隔離時間?
如果我使用find_next_sibling('a')它只列印None.
uj5u.com熱心網友回復:
問題需要一些改進,重點并應該提供預期的輸出,這不是很清楚。
當有各種不包含時間的“a”標簽時,如何篩選以僅隔離時間?
你可以css selectors用來得到所有的<a>時間:
soup.select('tr [href^="/result"]')
或更具體
soup.select('tr td:nth-of-type(2) [href^="/result"]')
例子
from bs4 import BeautifulSoup
html = '''<table class="table table-sm table-responsive table-hover"><tbody><tr id="rID_162222827"><td class="text-nowrap" style="width:35px;"><a href="" ng-click="appC.edit(162222827)" ng-if="appC.params.edit=='mark' && appC.params.canEdit"><i class="far fa-pencil text-primary mr-2"></i></a><i>9 </i></td><td style="width:110px;"><span class="mRight5 m-1"><i class="far fa-fw"></i></span><a href="/result/AWi088nH1S1rZxdSN">2:10.97</a></td><td class="text-nowrap" style="width: 60px;">Mar 4</td><td><a href="meet/443782#53587">Sunset Invitational</a></td><td class="text-muted text-right text-nowrap">O F</td></tr><tr id="rID_164098252"><td class="text-nowrap" style="width:35px;"><a href="" ng-click="appC.edit(164098252)" ng-if="appC.params.edit=='mark' && appC.params.canEdit"><i class="far fa-pencil text-primary mr-2"></i></a><i>60 </i></td><td style="width:110px;"><span class="mRight5 m-1"><i class="far fa-fw"></i></span><a href="/result/R3iEDYqsnSQ48l0h8">2:05.56</a><a href="/post/R3iEDYqsnSQ48l0h8" rel="nofollow"><small class="text-muted pr-text" style="font-weight:normal; margin-left: 4px;" uib-tooltip="Personal Record">PR</small></a></td><td class="text-nowrap" style="width: 60px;">Mar 19</td><td><a href="meet/441280#53587">Dublin Distance Fiesta</a></td><td class="text-muted text-right text-nowrap">V F</td></tr><tr id="rID_164212389"><td class="text-nowrap" style="width:35px;"><a href="" ng-click="appC.edit(164212389)" ng-if="appC.params.edit=='mark' && appC.params.canEdit"><i class="far fa-pencil text-primary mr-2"></i></a><i>3 </i></td><td style="width:110px;"><span class="mRight5 m-1"><i class="far fa-fw"></i></span><a href="/result/16ik6rkINSN6VJEHy">2:18.54</a></td><td class="text-nowrap" style="width: 60px;">Mar 26</td><td><a href="meet/459101#53587">PSAL League Meet #1</a></td><td class="text-muted text-right text-nowrap">V F</td></tr><tr id="rID_174827223"><td class="text-nowrap" style="width:35px;"><a href="" ng-click="appC.edit(174827223)" ng-if="appC.params.edit=='mark' && appC.params.canEdit"><i class="far fa-pencil text-primary mr-2"></i></a><i>26 </i></td><td style="width:110px;"><span class="mRight5 m-1"><i class="far fa-fw"></i></span><a href="/result/gBivaaKIVSZRLp2UA">2:10.58</a></td><td class="text-nowrap" style="width: 60px;">Apr 9</td><td><a href="meet/443768#53587">Cupertino/De Anza Invite</a></td><td class="text-muted text-right text-nowrap">V F</td></tr><tr id="rID_168470829"><td class="text-nowrap" style="width:35px;"><a href="" ng-click="appC.edit(168470829)" ng-if="appC.params.edit=='mark' && appC.params.canEdit"><i class="far fa-pencil text-primary mr-2"></i></a><i>50 </i></td><td style="width:110px;"><span class="mRight5 m-1"><i class="far fa-fw"></i></span><a href="/result/vOi3ydru3SxDoBWso">2:13.20</a></td><td class="text-nowrap" style="width: 60px;">Apr 16</td><td><a href="meet/445132#53587">Granada Distance & Sprint Festival</a></td><td class="text-muted text-right text-nowrap">O F</td></tr></tbody></table>'''
soup = BeautifulSoup(html)
[t.text for t in soup.select('tr td:nth-of-type(2) [href^="/result"]')]
輸出
['2:10.97', '2:05.56', '2:18.54', '2:10.58', '2:13.20']
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/483232.html
上一篇:隨時間的scrapy遞回回呼
