對于一個任務,我試圖將 4 個抓取資料串列合并為 1 個。所有 4 個串列都正確排序,如下所示。
["Een gezonde samenleving? Het belang van sporten wordt onderschat","Zo vader, zo dochter","Milieuvriendelijk vervoer met waterstof","\"Ik heb zin in wat nog komen gaat\"","Oorlog in Oekra?ne"]
["Teamsport","Carsten en Kirsten","Kennisclip","Master Mind","Statement van het CvB"]
["16 maart 2022","10 maart 2022","09 maart 2022","08 maart 2022","07 maart 2022"]
["Directie","Bot","CB","Moniek","Christian"]
我想要的輸出是這樣的
[["Een gezonde samenleving? Het belang van sporten wordt onderschat", "Teamsport", "16 maar 2022", "Directie"], [...], [...], [...], [...]]
我已經嘗試了一些在互聯網上找到的解決方案,但我不明白其中一些,其中大多數是大約 2 個串列或在我嘗試實施它們時給出錯誤。
如需更多參考,我的代碼如下所示:
urlString :: String
urlString = "https://www.example.com"
--Main function in which we call the other functions
main :: IO()
main = do
resultTitle <- scrapeURL urlString scrapeHANTitle
resultSubtitle <- scrapeURL urlString scrapeHANSubtitle
resultDate <- scrapeURL urlString scrapeHANDate
resultAuthor <- scrapeURL urlString scrapeHANAuthor
print resultTitle
print resultSubtitle
print resultDate
print resultAuthor
scrapeHANTitle :: Scraper String [String]
scrapeHANTitle =
chroots ("div" @: [hasClass "card-news__body"]) scrapeTitle
scrapeHANSubtitle :: Scraper String [String]
scrapeHANSubtitle =
chroots ("div" @: [hasClass "card-news__body"]) scrapeSubTitle
scrapeHANDate :: Scraper String [String]
scrapeHANDate =
chroots ("div" @: [hasClass "card-article__meta__body"]) scrapeDate
scrapeHANAuthor :: Scraper String [String]
scrapeHANAuthor =
chroots ("div" @: [hasClass "card-article__meta__body"]) scrapeAuthor
-- gets the title of news items
-- https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128&utf8=dec
-- some titles contain special characters so use this utf8 table to add conversion
scrapeTitle :: Scraper String String
scrapeTitle = do
text $ "a" @: [hasClass "card-news__body__title"]
-- gets the subtitle of news items
scrapeSubTitle :: Scraper String String
scrapeSubTitle = do
text $ "span" @: [hasClass "card-news__body__eyebrow"]
--gets the date on which the news item was posted
scrapeDate :: Scraper String String
scrapeDate = do
text $ "div" @: [hasClass "card-news__footer__body__date"]
--gets the author of the news item
scrapeAuthor :: Scraper String String
scrapeAuthor = do
text $ "div" @: [hasClass "card-news__footer__body__author"]
我還嘗試了以下方法,但它給了我一堆型別錯誤。
mergeLists :: Maybe [String] -> Maybe [String] ->Maybe [String] -> Maybe [String] -> Maybe [String]
mergeLists = \s1 -> \s2 -> \s3 -> \s4 ->s1 s2 s3 s4
uj5u.com熱心網友回復:
您可以使用該Monoid實體并使用:
mergeLists :: Maybe [String] -> Maybe [String] ->Maybe [String] -> Maybe [String] -> Maybe [String]
mergeLists s1 s2 s3 s4 = s1 <> s2 <> s3 <> s4
但是,在這里您正在抓取同一頁面,因此您可以將來自抓取器的資料與:
myScraper :: Scraper String [String]
myScraper = do
da <- scrapeHANTitle
db <- scrapeHANSubtitle
dc <- scrapeHANDate
dd <- scrapeHANAuthor
return da db dc dd
然后運行它:
main :: IO()
main = do
result <- scrapeURL urlString myScraper
print result
或更短:
main :: IO()
main = scrapeURL urlString myScraper >>= print
uj5u.com熱心網友回復:
zip4您可以使用from組合四個串列Data.List。
import Data.List
list1 = ["Een gezonde samenleving? Het belang van sporten wordt onderschat","Zo vader, zo dochter","Milieuvriendelijk vervoer met waterstof","\"Ik heb zin in wat nog komen gaat\"","Oorlog in Oekra?ne"]
list2 = ["Teamsport","Carsten en Kirsten","Kennisclip","Master Mind","Statement van het CvB"]
list3 = ["16 maart 2022","10 maart 2022","09 maart 2022","08 maart 2022","07 maart 2022"]
list4 = ["Directie","Bot","CB","Moniek","Christian"]
result = zip4 list1 list2 list3 list4
result2 = [[x1,x2,x3,x4] | (x1,x2,x3,x4) <- zip4 list1 list2 list3 list4]
兩個結果略有不同。結果result創建一個元組串列。結果 result2根據要求創建串列串列。元組串列可能更好,因為:
- 該串列可以包含任意數量的值,所有的型別都相同(Haskell 串列是同質的)
- 元組可以包含任何型別,因此更靈活
- 具有兩個值的元組與具有三個值的元組是不同的型別,因此如果您想要使用元組的四個值的集合會阻止用戶擠入三個值或五個值的集合
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/448835.html
