嘗試從文本中正確提取所有術語。看起來當術語在句子內并且術語包含()它時沒有拆分并且正則運算式找不到它。
我正在嘗試正確拆分包含(). 所以代替這個:
["What is API(Application Programming Interface) and how to use it?"]
我試圖得到這個:
["What is", "API(Application Programming Interface)", "and how to use it?"]
JSON術語已正確提取,我得到了這個:
["JSON", "is a Javascript Object Notation"] 所以這正是我想要的,但如果是 API,我沒有得到這個:
["What is", "API(Application Programming Interface)", "and how to use it?"]
我得到了這個,這不是我想要的:
["What is API(Application Programming Interface) and how to use it?"]
function getAllTextNodes(element) {
let node;
let nodes = [];
let walk = document.createTreeWalker(element,NodeFilter.SHOW_TEXT,null,false);
while (node = walk.nextNode()) nodes.push(node);
return nodes;
}
const allNodes = getAllTextNodes(document.getElementById("body"))
const terms = [
{id: 1, definition: 'API stands for Application programming Interface', expression: 'API(Application Programming Interface)'},
{id: 2, definition: 'JSON stands for JavaScript Object Notation.', expression: 'JSON'}
]
const termMap = new Map(
[...terms].sort((a, b) => b.expression.length - a.expression.length)
.map(term => [term.expression.toLowerCase(), term])
);
const regex = RegExp("\\b(" Array.from(termMap.keys()).join("|") ")\\b", "ig");
for (const node of allNodes) {
const pieces = node.textContent.split(regex).filter(Boolean);
console.log(pieces)
}
<div id="body">
<p>API(Application Programming Interface)</p>
<p>What is API(Application Programming Interface) and how to use it?</p>
<p>JSON is a Javascript Object Notation</p>
</div>
uj5u.com熱心網友回復:
由于您的“單詞”可以由非單詞字符組成,因此您不能依賴單詞邊界。我建議切換到明確的 ( (?<!\w)/ (?!\w)) 或自適應動態詞邊界。
此外,在正則運算式中使用之前,您需要轉義您的條款。
請參見下面具有自適應字邊界的示例:
function getAllTextNodes(element) {
let node;
let nodes = [];
let walk = document.createTreeWalker(element,NodeFilter.SHOW_TEXT,null,false);
while (node = walk.nextNode()) nodes.push(node);
return nodes;
}
const allNodes = getAllTextNodes(document.getElementById("body"))
const terms = [
{id: 1, definition: 'API stands for Application programming Interface', expression: 'API(Application Programming Interface)'},
{id: 2, definition: 'JSON stands for JavaScript Object Notation.', expression: 'JSON'}
]
const termMap = new Map(
[...terms].sort((a, b) => b.expression.length - a.expression.length)
.map(term => [term.expression.toLowerCase(), term])
);
const regex = RegExp("(?:(?!\\w)|\\b(?=\\w))(" Array.from(termMap.keys()).map(x => x.replace(/[-\/\\^$* ?.()|[\]{}]/g, '\\$&')).join("|") ")(?:(?<=\\w)\\b|(?<!\\w))", "ig");
for (const node of allNodes) {
const pieces = node.textContent.split(regex).filter(Boolean);
console.log(pieces)
}
<div id="body">
<p>API(Application Programming Interface)</p>
<p>What is API(Application Programming Interface) and how to use it?</p>
<p>JSON is a Javascript Object Notation</p>
</div>
正則運算式現在(?:(?!\w)|\b(?=\w))(api\(application programming interface\)|json)(?:(?<=\w)\b|(?<!\w))在哪里
(?:(?!\w)|\b(?=\w))- 左手自適應單詞邊界(如果以下字符是非單詞字符,則不進行背景關系檢查)(api\(application programming interface\)|json)- 組 1 匹配您的條件之一(請參閱轉義特殊字符)(?:(?<=\w)\b|(?<!\w))- 右手自適應單詞邊界(如果前面的字符是非單詞字符,則不進行背景關系檢查)
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/420678.html
標籤:
下一篇:如何不將子字串與正則運算式匹配
