為什么變數體為空?如何獲取body標簽內容?
我正在為谷歌瀏覽器擴展撰寫代碼。我計劃僅將擴展程式用于個人用途。用于網頁抓取。最后分析正文。我想玩文字。
背景.js
chrome.tabs.onUpdated.addListener(function(tabId, changeInfo, tab) {
if(document.readyState == "complete"){
var url = changeInfo.url;
// I want to save the url to a file(url.txt)
var body = document.body.innerText;
// why is body empty?????
var pattern = /[A-Z].*?\./g;
var result = text.match(pattern);
result.forEach(myFunction);
function myFunction(item) {
text = item "\n";
}
// I want to save the text to a file(collection.txt)
}
});
清單.json
{
"name": "Parser",
"version": "1",
"manifest_version": 2,
"background": {
"scripts":["background.js"]
},
"permissions": [
"tabs",
"activeTab",
"storage",
"http://*/*",
"https://*/*"
]
}
uj5u.com熱心網友回復:
根據manifest.json您發布的內容,您似乎正在將代碼作為后臺腳本運行。
后臺腳本無法直接訪問加載的頁面內容 - 這就是代碼中正文為空的原因。
相反,您需要使用內容腳本來訪問頁面內容,然后將該資料發送到后臺腳本進行處理。
這是一個使用背景和內容腳本的示例設定,它應該允許您在選項卡加載時檢索和處理頁面內容(未經測驗,但應該為您指明正確的方向)。
感謝ResourceOverride 擴展,我將其用作撰寫下面示例的參考。
背景.js
// background.js
chrome.runtime.onMessage.addListener(function(message, sender){
if (!message || typeof message !== 'object' || !sender.tab){
// Ignore messages that weren't sent by our content script.
return;
}
switch (message.action){
case 'receiveBodyText': {
processBodyText(sender.tab, message.bodyText);
break;
}
}
});
function processBodyText(tab, bodyText){
var url = tab.url;
// I want to save the url to a file(url.txt)
// TODO: Process your bodyText
var pattern = /[A-Z].*?\./g;
var result = text.match(pattern);
result.forEach(myFunction);
function myFunction(item) {
text = item "\n";
}
// I want to save the text to a file(collection.txt)
}
內容.js
// content.js
window.addEventListener('load', function(){
let bodyText = document.body.innerText;
chrome.runtime.sendMessage({
action: 'receiveBodyText',
bodyText: bodyText
});
});
清單.json
// manifest.json
{
"name": "Parser",
"version": "1",
"manifest_version": 2,
"background": {
"scripts":["background.js"]
},
"content_scripts": [{
"matches" : [
"http://*/*",
"https://*/*"
],
"js": ["content.js"]
}],
"permissions": [
"tabs",
"activeTab",
"storage",
"http://*/*",
"https://*/*"
]
}
資訊和檔案
關于 firefox 和 chrome 之間 WebExtension API 差異的說明:
Chrome 使用
chrome命名空間,Firefox 使用未來標準的browser命名空間。因此,用 Chrome 撰寫的代碼將使用:
chrome.tabs.onUpdated(...),而 Firefox 中的等效代碼是:browser.tabs.onUpdated(...)Be aware of that when reading the docs and reading example extensions.
Background scripts
- do not have access to the loaded page
- have full access to the WebExtensions API
Content scripts
- have full access to the loaded page
- have only limited access to the WebExtensions API
- Chrome content scripts docs
- MDN content scripts docs
- Docs on communication between content scripts and background scripts
WebExtensions API
- Chrome WebExtensions API reference
- MDN WebExtensions API reference
other useful links
- MDN WebExtensions example Github repository
- MDN "Anatomy of a WebExtension"
- MDN detailed browser WebExtensions support tables
- ResourceOverride extension - this is a fairly complex extension that uses both background and content scripts. I this used as a reference/example to better understand how extensions are written.
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/428343.html
標籤:javascript 谷歌浏览器
