我正在抓取一個網站,需要從我的字串中洗掉所有 /n 和 /t。
我嘗試了以下代碼:
item.post_category = [];
Array.from($doc.find('h6.link')).forEach(function(link){
console.log(link.textContent.replace(/\t \n /gm, ""));
item.post_category.push(link.textContent);
})
//this removes the linebreaks but not the tabs
這是我必須迭代的多個示例陣列:
["\n\t\t\t\t\tJune 15, 2021 ? \n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\tFamily,\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\tGender Equality,\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\tIn the News\n\t\t\t\t"]
["\n\t\t\t\t\tJune 13, 2020 ? \n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\tIn the News\n\t\t\t\t"]
["\n\t\t\t\t\tJuly 5, 2021 ? \n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\tNews\n\t\t\t\t"]
理想情況下,我希望我的陣列看起來像這樣。洗掉日期和 \n 和 \t。
["Family,Gender Equality,In the News"]
["In the News"]
["News"]
uj5u.com熱心網友回復:
有數百種方法可以做到這一點,您可以根據需要使用正則運算式或拆分。
這是可能的解決方案之一:
let str = "\n\t\t\t\t\tJune 15, 2021 ? \n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\tFamily,\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\tGender Equality,\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\tIn the News\n\t\t\t\t"
// Remove all new lines and tabs with a regex. You could also add '\r\n' if necessary.
str = str.replace(/(\n|\t)/gm, '');
// Here we assume that your string will
// always contain the date followed by this character: ?.
// So we split according to this character, and we select
// the second item of the table, which corresponds to the text without the date.
let result = str.split('?')[1].trim()
console.log(result) // prints 'Family,Gender Equality,In the News'
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/381098.html
標籤:javascript 数组 正则表达式 网页抓取
上一篇:如何在JavaScript中使用fetch(并顯示它)呼叫一個更復雜的陣列的API?
下一篇:如何根據熱點懸停設定不同的背景
