我有一些 HTML,我試圖為其提取特定資訊,但是它有重復的元素,我知道如何解決這個問題。我正在嘗試實作如下條件引數:
- 從第一個
href標簽中提取玩家姓名 - 搜索下一個命名的標簽
flaggenrahmen并提取資料alt - 如果
flaggenrahmen再次重復,請跳過。 - 重復步驟。
我嘗試過的:
player_dict = defaultdict(list)
soup = BeautifulSoup(html)
player_id = soup.select('*[href]')
nation = soup.select('.flaggenrahmen')
for l,k in zip(player_id, nation):
player_dict[l.get_text(strip=True)].append(k['alt'])
但是,flaggenrahmen再次重復時我無法獲得“跳過” ,因此每個玩家可以獲得多個國家/地區。
產生的輸出:
defaultdict(list,
{'': ['England', 'Spain', 'Portugal'],
'Trent Alexander-Arnold': ['Morocco'],
'Achraf Hakimi': ['England']})
預期輸出:
{'Trent Alexander-Arnold':['England'],
'Achraf Hakimi':['Morocco'],
'Jo?o Cancelo':['Portugal'],
'Reece James':['England']
}
這是html資料:
html='''<tbody>
<tr class="odd">
<td class="zentriert">1</td><td class=""><table class="inline-table"><tr><td rowspan="2"><a href="#"><img alt="Trent Alexander-Arnold" class="bilderrahmen-fixed" src="https://img.a.transfermarkt.technology/portrait/small/314353-1559826986.jpg?lm=1" title="Trent Alexander-Arnold"/></a></td><td class="hauptlink"><a class="spielprofil_tooltip" href="/trent-alexander-arnold/profil/spieler/314353" id="314353" title="Trent Alexander-Arnold">Trent Alexander-Arnold</a></td></tr><tr><td>Right-Back</td></tr></table></td><td class="zentriert">23</td><td class="zentriert"><img alt="England" class="flaggenrahmen" src="https://tmssl.akamaized.net/images/flagge/verysmall/189.png?lm=1520611569" title="England"/></td><td class="zentriert"><a class="vereinprofil_tooltip" href="/fc-liverpool/startseite/verein/31" id="31"><img alt="Liverpool FC" class="" src="https://tmssl.akamaized.net/images/wappen/verysmall/31.png?lm=1456567819" title=" "/></a></td><td class="rechts hauptlink"><b>£67.50m</b><span class="icons_sprite red-arrow-ten" title="£90.00m"> </span></td></tr>
<tr class="even">
<td class="zentriert">2</td><td class=""><table class="inline-table"><tr><td rowspan="2"><a href="#"><img alt="Achraf Hakimi" class="bilderrahmen-fixed" src="https://img.a.transfermarkt.technology/portrait/small/398073-1633679363.jpg?lm=1" title="Achraf Hakimi"/></a></td><td class="hauptlink"><a class="spielprofil_tooltip" href="/achraf-hakimi/profil/spieler/398073" id="398073" title="Achraf Hakimi">Achraf Hakimi</a></td></tr><tr><td>Right-Back</td></tr></table></td><td class="zentriert">22</td><td class="zentriert"><img alt="Morocco" class="flaggenrahmen" src="https://tmssl.akamaized.net/images/flagge/verysmall/107.png?lm=1520611569" title="Morocco"/><br/><img alt="Spain" class="flaggenrahmen" src="https://tmssl.akamaized.net/images/flagge/verysmall/157.png?lm=1520611569" title="Spain"/></td><td class="zentriert"><a class="vereinprofil_tooltip" href="/fc-paris-saint-germain/startseite/verein/583" id="583"><img alt="Paris Saint-Germain" class="" src="https://tmssl.akamaized.net/images/wappen/verysmall/583.png?lm=1522312728" title=" "/></a></td><td class="rechts hauptlink"><b>£63.00m</b><span class="icons_sprite green-arrow-ten" title="£54.00m"> </span></td></tr>
<tr class="odd">
<td class="zentriert">3</td><td class=""><table class="inline-table"><tr><td rowspan="2"><a href="#"><img alt="Jo?o Cancelo" class="bilderrahmen-fixed" src="https://img.a.transfermarkt.technology/portrait/small/182712-1615221629.jpg?lm=1" title="Jo?o Cancelo"/></a></td><td class="hauptlink"><a class="spielprofil_tooltip" href="/joao-cancelo/profil/spieler/182712" id="182712" title="Jo?o Cancelo">Jo?o Cancelo</a></td></tr><tr><td>Right-Back</td></tr></table></td><td class="zentriert">27</td><td class="zentriert"><img alt="Portugal" class="flaggenrahmen" src="https://tmssl.akamaized.net/images/flagge/verysmall/136.png?lm=1520611569" title="Portugal"/></td><td class="zentriert"><a class="vereinprofil_tooltip" href="/manchester-city/startseite/verein/281" id="281"><img alt="Manchester City" class="" src="https://tmssl.akamaized.net/images/wappen/verysmall/281.png?lm=1467356331" title=" "/></a></td><td class="rechts hauptlink"><b>£49.50m</b><span class="icons_sprite green-arrow-ten" title="£45.00m"> </span></td></tr>
<tr class="even">
<td class="zentriert">4</td><td class=""><table class="inline-table"><tr><td rowspan="2"><a href="#"><img alt="Reece James" class="bilderrahmen-fixed" src="https://img.a.transfermarkt.technology/portrait/small/472423-1569484519.png?lm=1" title="Reece James"/></a></td><td class="hauptlink"><a class="spielprofil_tooltip" href="/reece-james/profil/spieler/472423" id="472423" title="Reece James">Reece James</a></td></tr><tr><td>Right-Back</td></tr></table></td><td class="zentriert">21</td><td class="zentriert"><img alt="England" class="flaggenrahmen" src="https://tmssl.akamaized.net/images/flagge/verysmall/189.png?lm=1520611569" title="England"/></td><td class="zentriert"><a class="vereinprofil_tooltip" href="/fc-chelsea/startseite/verein/631" id="631"><img alt="Chelsea FC" class="" src="https://tmssl.akamaized.net/images/wappen/verysmall/631.png?lm=1628160548" title=" "/></a></td><td class="rechts hauptlink"><b>£40.50m</b><span class="icons_sprite green-arrow-ten" title="£36.00m"> </span></td></tr>
<tr class="odd">
<tbody>'''.replace('< ', '<')
uj5u.com熱心網友回復:
這應該做
players={}
soup = BeautifulSoup(html, 'lxml')
for el in soup.tbody.children:
if el.name!='tr':
continue
name=el.select_one('.spielprofil_tooltip')
country=el.select_one('.flaggenrahmen')
if name and country:
players[name.text]=[country['title']]
print(players)
>>> {'Trent Alexander-Arnold': ['England'], 'Achraf Hakimi': ['Morocco'], 'Jo?o Cancelo': ['Portugal'], 'Reece James': ['England']}
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/347715.html
上一篇:我如何添加鏈接到這些框
下一篇:如何擺脫頁面右側的空白?
