作為我最后一個的后續問題(Perl XML::LibXML 從特定節點獲取資訊)
鑒于以下 XML 資料,我無法弄清楚如何獲取在<tab/>標記之后顯示的資料(如果沒有從該部分中的子節點獲取所有資料,則沒有結束標記?有關更多詳細資訊,請參見下文:
XML 示例:
<title number="3">
<catchline>Uniform Agricultural Cooperative Association Act</catchline>
<chapter number="3-1">
<catchline>
General Provisions Relating to Agricultural Cooperative Associations
</catchline>
<section number="3-1-1">
<histories>
<history>
Amended by Chapter
<modchap sess="2010GS">378</modchap>
, 2010 General Session
</history>
<modyear>2010</modyear>
</histories>
<catchline>Declaration of policy.</catchline>
<tab/>
It is the declared policy of this state, as one means of improving the economic position of agriculture, to encourage the organization of producers of agricultural products into effective associations under the control of such producers, and to that end this act shall be liberally construed. THIS IS THE DATA THAT I WANT TO GET
</section>
<section number="3-1-1.1">
<histories>
<history>
Amended by Chapter
<modchap sess="1996GS">79</modchap>
, 1996 General Session
</history>
<modyear>1996</modyear>
</histories>
<catchline>General corporation laws do not apply.</catchline>
<tab/>
<xref depth="1" refnumber="16-10a" start="0">
Title 16, Chapter 10a, Utah Revised Business Corporation Act
</xref>
, does not apply to domestic or foreign corporations governed by this chapter, except as specifically provided in Sections
<xref depth="3" refnumber="3-1-13.4" start="0">3-1-13.4</xref>
,
<xref depth="3" refnumber="3-1-13.7" start="0">3-1-13.7</xref>
, and
<xref depth="3" refnumber="3-1-16.1" start="0">3-1-16.1</xref>
.
</section>
</chapter>
</title>
這是我當前的 perl 腳本:
!/usr/bin/perl -w
use XML::LibXML;
my $dom = XML::LibXML->load_xml(location => "file.xml");
my $titleName = $dom->findvalue('/title/catchline');
print "Title $titleName\n";
my @chapters = $dom->findnodes('/title/chapter');
for $chapter (@chapters) {
my $chapterNo = $chapter->getAttribute('number');
my $chapterName = $chapter->findvalue('catchline');
print " Chapter #$chapterNo - $chapterName\n";
my @sections = $chapter->findnodes('section');
for $section (@sections) {
my $sectionNo = $section->getAttribute('number');
my $sectionName = $section->findvalue('catchline');
my $sectionData = $section->textContent;
print " Section #$sectionNo - $sectionName\nSECDATA: $sectionData\n\n";
}
}
這有效,但發生的情況可能正是我所要求的,它列印<section>$sectionData 變數中的所有內容。
我想要做的只是從<tab/>標簽之后獲取資料,標簽中沒有任何其他內容。像兒童標簽<histories><history><xref>等。
例如,字串:
, 不適用于受本章管轄的國內或外國公司,除非在章節中有特別規定
不包含在任何特定標簽中,我如何獲得該資料?
當前輸出為:
Title Uniform Agricultural Cooperative Association Act
Chapter #3-1 -
General Provisions Relating to Agricultural Cooperative Associations
Section #3-1-1 - Declaration of policy.
SECDATA:
Amended by Chapter
378
, 2010 General Session
2010
Declaration of policy.
It is the declared policy of this state, as one means of improving the economic position of agriculture, to encourage the organization of producers of agricultural products into effective associations under the control of such producers, and to that end this act shall be liberally construed.
Section #3-1-1.1 - General corporation laws do not apply.
SECDATA:
Amended by Chapter
79
, 1996 General Session
1996
General corporation laws do not apply.
Title 16, Chapter 10a, Utah Revised Business Corporation Act
, does not apply to domestic or foreign corporations governed by this chapter, except as specifically provided in Sections
3-1-13.4
,
3-1-13.7
, and
3-1-16.1
.
但我正在尋找的東西更像是:
Title Uniform Agricultural Cooperative Association Act
Chapter #3-1 -
General Provisions Relating to Agricultural Cooperative Associations
Section #3-1-1 - Declaration of policy.
SECDATA:
It is the declared policy of this state, as one means of improving the economic position of agriculture, to encourage the organization of producers of agricultural products into effective associations under the control of such producers, and to that end this act shall be liberally construed.
Section #3-1-1.1 - General corporation laws do not apply.
SECDATA:
, does not apply to domestic or foreign corporations governed by this chapter, except as specifically provided in Sections
uj5u.com熱心網友回復:
如果你想要跟在tab元素后面的文本節點,你可以使用
my @post_tab_text_nodes = $section_node->findnodes('following-sibling:text()');
但你想要的遠比這復雜得多。
use List::Util qw( first );
use XML::LibXML qw( XML_ELEMENT_NODE );
my @child_nodes = $section_node->childNodes();
my $tab_node_idx =
first {
my $node = $child_nodes[$_];
( $node->nodeType() == XML_ELEMENT_NODE
&& !defined( $node->namespaceURI() )
&& $node->nodeName() eq 'tab'
)
}
0..$#child_nodes;
my @post_tab_children =
defined($tab_node_idx)
? @child_nodes[ $tab_node_idx 1 .. $#child_nodes ]
: ();
將結果節點呈現為文本是留給用戶的練習。您似乎混合了元素節點 ( XML_ELEMENT_NODE) 和文本節點 ( XML_TEXT_NODE),可以使用$node->nodeType.
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/397751.html
標籤:xml perl xml-libxml
