我正在嘗試<span>使用 PHP 的 DOMDocument 和 XPath將某些短語的所有實體包裝起來。我的邏輯基于另一篇文章中的這個答案,但這僅允許我在需要選擇所有匹配項時選擇節點內的第一個匹配項。
一旦我修改了第一個匹配項的 DOM,我的后續回圈就會導致錯誤,Fatal error: Uncaught Error: Call to a member function splitText() on bool在帶有$after. 我很確定這是由修改標記引起的,但我一直無法弄清楚原因。
我在這里做錯了什么?
/**
* Automatically wrap various forms of CCJM in a class for branding purposes
*
* @link https://stackoverflow.com/a/6009594/654480
*
* @param string $content
* @return string
*/
function ccjm_branding_filter(string $content): string {
if (! (is_admin() && ! wp_doing_ajax()) && $content) {
$DOM = new DOMDocument();
/**
* Use internal errors to get around HTML5 warnings
*/
libxml_use_internal_errors(true);
/**
* Load in the content, with proper encoding and an `<html>` wrapper required for parsing
*/
$DOM->loadHTML("<?xml encoding='utf-8' ?><html>{$content}</html>", LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
/**
* Clear errors to get around HTML5 warnings
*/
libxml_clear_errors();
/**
* Initialize XPath
*/
$XPath = new DOMXPath($DOM);
/**
* Retrieve all text nodes, except those within scripts
*/
$text = $XPath->query("//text()[not(parent::script)]");
foreach ($text as $node) {
/**
* Find all matches, including offset
*/
preg_match_all("/(C\.? ?C\.?(?:JM| Johnson (?:&|&|&|and) Malhotra)(?: Engineers, LTD\.?|, P\.?C\.?)?)/i", $node->textContent, $matches, PREG_OFFSET_CAPTURE);
/**
* Wrap each match in appropriate span
*/
foreach ($matches as $group) {
foreach ($group as $key => $match) {
/**
* Determine the offset and the length of the match
*/
$offset = $match[1];
$length = strlen($match[0]);
/**
* Isolate the match and what comes after it
*/
$word = $node->splitText($offset);
$after = $word->splitText($length);
/**
* Create the wrapping span
*/
$span = $DOM->createElement("span");
$span->setAttribute("class", "__brand");
/**
* Replace the word with the span, and then re-insert the word within it
*/
$word->parentNode->replaceChild($span, $word);
$span->appendChild($word);
break; // it always errors after the first loop
}
}
}
/**
* Save changes, remove unneeded tags
*/
$content = implode(array_map([$DOM->documentElement->ownerDocument, "saveHTML"], iterator_to_array($DOM->documentElement->childNodes)));
}
return $content;
}
add_filter("ccjm_final_output", "ccjm_branding_filter");
示例內容(“CC Johnson & Malhotra, PC”和“CCJM”的所有實體都匹配,但只有第一個可以成功修改):
C.C. Johnson & Malhotra, P.C. (CCJM) was an integral member of a large Design Team for a 16.5-mile-long Public-Private Partnership (P3) Purple Line Project. The east-west light rail system extends from New Carrollton in PG County, MD to Bethesda in MO County, MD with 21 stations and one short tunnel. CCJM was Engineer of Record (EOR) for the design of eight (8) Bridges and design reviews for 35 transit/highway bridges and over 100 retaining walls of different lengths/types adjacent to bridges and in areas of cut/fill. CCJM designed utility structures for 42,000 LF of relocated water mains and 19,000 LF of relocated sewer mains meeting Washington Suburban Sanitary Commission (WSSC), Md Dept of Transportation (MDOT) MTA, and Local Standards.
編輯 1:做一些測驗,當我輸出時$node->textContent,我看到它在第一個回圈之后發生了變化......所以我認為發生的事情是在我做之后$node->splitText($offset),它實際上正在更新整個節點,所以后續的偏移量不起作用。
uj5u.com熱心網友回復:
首先,我認為foreach ($matches as $group)這里不正確 - 如果您檢查 $matches 包含的內容,則兩次相同的匹配項,但您可能不想將它們包裝到跨度中兩次。因此,應該洗掉 foreach 回圈,而應該$matches[0]只進行以下回圈。
其次,我認為您的偏移問題可以簡單地解決,如果您只是“向后騎馬”-不要從頭到尾替換找到的匹配項,而是以相反的順序替換。然后您將永遠只操縱當前位置“后面”的結構,因此無論那里發生什么變化,都不會影響之前匹配的位置。
/**
* Wrap each match in appropriate span
*/
//foreach ($matches as $group) {
$group = array_reverse($matches[0]);
foreach ($group as $key => $match) {
/**
* Determine the offset and the length of the match
*/
$offset = $match[1];
$length = strlen($match[0]);
/**
* Isolate the match and what comes after it
*/
$word = $node->splitText($offset);
$after = $word->splitText($length);
/**
* Create the wrapping span
*/
$span = $DOM->createElement("span");
$span->setAttribute("class", "__brand");
/**
* Replace the word with the span, and then re-insert the word within it
*/
$word->parentNode->replaceChild($span, $word);
$span->appendChild($word);
//break; // it always errors after the first loop
}
//}
結果我得到你的樣本輸入資料如下(這里是現場示例,https://3v4l.org/kbSQ8)
<p><span class="__brand">C.C. Johnson & Malhotra, P.C.</span> (<span
class="__brand">CCJM</span>) was an integral member of a large Design Team
for a 16.5-mile-long Public-Private Partnership (P3) Purple Line Project.
The east-west light rail system extends from New Carrollton in PG County,
MD to Bethesda in MO County, MD with 21 stations and one short tunnel.
<span class="__brand">CCJM</span> was Engineer of Record (EOR) for the
design of eight (8) Bridges and design reviews for 35 transit/highway
bridges and over 100 retaining walls of different lengths/types adjacent to
bridges and in areas of cut/fill. <span class="__brand">CCJM</span>
designed utility structures for 42,000 LF of relocated water mains and
19,000 LF of relocated sewer mains meeting Washington Suburban Sanitary
Commission (WSSC), Md Dept of Transportation (MDOT) MTA, and Local
Standards.</p>
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/341567.html
標籤:php 正则表达式 WordPress的 路径 dom文档
