這是我第一次用 Rust 編程(我目前正在閱讀這本書),最近我需要為這個網站洗掉一份疾病和條件串列,在嘗試了幾個指南之后,我最終得到了這個小片段。我目前在迭代 a ol,而不是將每個li作為陣列中的專案,而是將其作為單個元素。
use error_chain::error_chain;
use select::document::Document;
use select::predicate::Class;
error_chain! {
foreign_links {
ReqError(reqwest::Error);
IoError(std::io::Error);
}
}
// Source: https://rust-lang-nursery.github.io/rust-cookbook/web/scraping.html#extract-all-links-from-a-webpage-html
#[tokio::main]
async fn main() -> Result<()> {
let res = reqwest::get("https://www.cdc.gov/diseasesconditions/az/a.html")
.await?
.text()
.await?;
Document::from(res.as_str())
.find(Class("unstyled-list")) // This is returning the the whole "ol"
.for_each(|i| print!("{};", i.text()));
Ok(())
}
輸出,注意如何將整個串列列印為單個專案,而不是使用預期的分隔符列印每個專案;:
Abdominal Aortic Aneurysm — see Aortic AneurysmAcanthamoeba InfectionACE (Adverse Childhood Experiences)Acinetobacter InfectionAcquired Immune Deficiency Syndrome (AIDS) — see HIVAcute Flaccid Myelitis (AFM)Adenovirus InfectionAdenovirus VaccinationADHD [Attention Deficit/Hyperactivity Disorder]Adult VaccinationsAdverse Childhood Experiences (ACE)AFib, AF (Atrial fibrillation)AFMAfrican Trypanosomiasis — see Sleeping SicknessAgricultural Safety — see Farm Worker InjuriesAHF (Alkhurma hemorrhagic fever)AIDS (Acquired Immune Deficiency Syndrome)Alkhurma hemorrhagic fever (AHF)ALS [Amyotrophic Lateral Sclerosis]Alzheimer's DiseaseAmebiasis, Intestinal [Entamoeba histolytica infection]American Trypanosomiasis — see Chagas DiseaseAmphibians and Fish, Infections from — see Fish and Amphibians, Infections fromAmyotrophic Lateral Sclerosis — see ALSAnaplasmosis, HumanAncylostoma duodenale Infection, Necator americanus Infection — see Human HookwormAngiostrongylus InfectionAnimal-Related DiseasesAnisakiasis — see Anisakis InfectionAnisakis Infection [Anisakiasis]Anthrax VaccinationAnthrax [Bacillus anthracis Infection]Antibiotic-resistant Infections - ListingAntibiotic and Antimicrobial ResistanceAntibiotic Use, Appropriatesee also U.S. Antibiotic Awareness Week (USAAW)Aortic AneurysmAortic Dissection — see Aortic AneurysmArenavirus InfectionsArthritisChildhood ArthritisFibromyalgiaGoutOsteoarthritis (OA)Rheumatoid Arthritis (RA)Ascariasis — see Ascaris InfectionAscaris Infection [Ascariasis]Aseptic Meningitis — see Viral MeningitisAspergillosis — see Aspergillus InfectionAspergillus Infection [Aspergillosis]AsthmaAtrial fibrillation (AFib, AF)Attention Deficit/Hyperactivity Disorder — see ADHDAutismsee also Genetics and GenomicsAvian Influenza ;
預期的輸出將是:
Abdominal Aortic Aneurysm — see Aortic AneurysmAcanthamoeba Infection;ACE (Adverse Childhood Experiences);Acinetobacter Infection; etc...
uj5u.com熱心網友回復:
find()回傳與 creteria 匹配的元素串列。您需要致電.children()以獲取<li>s:
Document::from(res.as_str())
.find(Class("unstyled-list"))
.next() // Get the first match
.expect("no matching <ol>")
.children()
.for_each(|i| print!("{};", i.text()));
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/389815.html
上一篇:使用Scrapy傳遞請求
