我正在嘗試決議一個檔案,其中每一行都由以;. 每個屬性都定義為key valueor key=value,其中 key 和 value 可以用雙引號括起來,"以允許 key 和 value 包含特殊字符,例如 whitespace 、等號=或分號;。
為此,我首先使用boost::algorithm::make_split_iterator,然后,為了允許雙引號,我使用boost::tokenizer。
我需要將每個鍵和值決議為boost::iterator_range<const char*>. 我嘗試編碼為下面的代碼,但我無法構建它。可能是 tokenizer 的定義是正確的,但錯誤來自iterator_range. 如有必要,我可以提供更多資訊。
#include <boost/algorithm/string.hpp>
#include <boost/range/iterator_range.hpp>
#include <boost/tokenizer.hpp>
boost::iterator_range<const char*> line;
const auto topDelim = boost::token_finder(
[](const char c) { return (c == ';'); },
boost::token_compress_on);
for (auto attrIt = make_split_iterator(line, topDelim); !attrIt.eof() && !attrIt->empty(); attrIt ) {
std::string escape("\\");
std::string delim(" =");
std::string quote("\"");
boost::escaped_list_separator<char> els(escape, delim, quote);
boost::tokenizer<
boost::escaped_list_separator<char>,
boost::iterator_range<const char*>::iterator, // how to define iterator for iterator_range?
boost::iterator_range<const char*>
> tok(*attrIt, els);
for (auto t : tok) {
std::cout << t << std::endl;
}
構建錯誤:
/third_party/boost/boost-1_58_0/include/boost/token_functions.hpp: In instantiation of 'bool boost::escaped_list_separator<Char, Traits>::operator()(InputIterator&, InputIterator, Token&) [with InputIterator = const char*; Token = boost::iterator_range<const char*>; Char = char; Traits = std::char_traits<char>]':
/third_party/boost/boost-1_58_0/include/boost/token_iterator.hpp:70:36: required from 'void boost::token_iterator<TokenizerFunc, Iterator, Type>::initialize() [with TokenizerFunc = boost::escaped_list_separator<char>; Iterator = const char*; Type = boost::iterator_range<const char*>]'
/third_party/boost/boost-1_58_0/include/boost/token_iterator.hpp:77:63: required from 'boost::token_iterator<TokenizerFunc, Iterator, Type>::token_iterator(TokenizerFunc, Iterator, Iterator) [with TokenizerFunc = boost::escaped_list_separator<char>; Iterator = const char*; Type = boost::iterator_range<const char*>]'
/third_party/boost/boost-1_58_0/include/boost/tokenizer.hpp:86:33: required from 'boost::tokenizer<TokenizerFunc, Iterator, Type>::iter boost::tokenizer<TokenizerFunc, Iterator, Type>::begin() const [with TokenizerFunc = boost::escaped_list_separator<char>; Iterator = const char*; Type = boost::iterator_range<const char*>; boost::tokenizer<TokenizerFunc, Iterator, Type>::iter = boost::token_iterator<boost::escaped_list_separator<char>, const char*, boost::iterator_range<const char*> >]'
test.cpp:21:23: required from here
/third_party/boost/boost-1_58_0/include/boost/token_functions.hpp:188:19: error: no match for 'operator =' (operand types are 'boost::iterator_range<const char*>' and 'const char')
188 | else tok =*next;
| ~~~^~~~~~~
uj5u.com熱心網友回復:
正如我所說,您想要決議,而不是拆分。具體來說,如果您要將輸入拆分為迭代器范圍,您將不得不重復決議例如參考結構以獲得預期(未參考)值的作業。
我會按照你對 Boost Spirit 的規格:
using Attribute = std::pair<std::string /*key*/, //
std::string /*value*/>;
using Line = std::vector<Attribute>;
using File = std::vector<Line>;
語法
現在使用 X3 我們可以撰寫運算式來定義語法:
auto file = x3::skip(x3::blank)[ line % x3::eol ];
在檔案中,std::isblank通常會跳過空格 ( )。
內容由一行或多行以換行符分隔。
auto line = attribute % ';';
一行由一個或多個屬性組成 ';'
auto attribute = field >> -x3::lit('=') >> field;
auto field = quoted | unquoted;
一個屬性是兩個欄位,可以選擇用 分隔=。請注意,每個欄位都是帶引號或不帶引號的值。
現在,事情變得有點棘手:在定義欄位規則時,我們希望它們是“詞素”,即不會跳過任何空格。
auto unquoted = x3::lexeme[ (x3::graph - ';' - '=')];
請注意graph已如何排除空格(請參閱 參考資料
std::isgraph)。此外,我們禁止裸體';'或'='這樣我們就不會遇到下一個屬性/欄位。
對于可能包含空格和/或那些特殊字符的欄位,我們定義了帶引號的詞位:
auto quoted = x3::lexeme['"' >> *quoted_char >> '"'];
所以,這只是""中間有任意數量的帶引號的字符,其中
auto quoted_char = '\\' >> x3::char_ | ~x3::char_('"');
字符可以是任何以\OR 結尾的字符,而不是結束引號。
測驗時間
讓我們練習一下 * Live On Compiler Explorer
for (std::string const& str :
{
R"(a 1)",
R"(b = 2 )",
R"("c"="3")",
R"(a=1;two 222;three "3 3 3")",
R"(b=2;three 333;four "4 4 4"
c=3;four 444;five "5 5 5")",
// special cases
R"("e=" "5")",
R"("f=""7")",
R"("g="="8")",
R"("\"Hello\\ World\\!\"" '8')",
R"("h=10;i=11;" bogus;yup "nope")",
// not ok?
R"(h i j)",
// allowing empty lines/attributes?
"",
"a 1;",
";",
";;",
R"(a=1;two 222;three "3 3 3"
n=1;gjb 222;guerr "3 3 3"
)",
}) //
{
File contents;
if (parse(begin(str), end(str), parser::file, contents))
fmt::print("Parsed:\n\t- {}\n", fmt::join(contents, "\n\t- "));
else
fmt::print("Not Parsed\n");
}
印刷
Parsed:
- {("a", "1")}
Parsed:
- {("b", "2")}
Parsed:
- {("c", "3")}
Parsed:
- {("a", "1"), ("two", "222"), ("three", "3 3 3")}
Parsed:
- {("b", "2"), ("three", "333"), ("four", "4 4 4")}
- {("c", "3"), ("four", "444"), ("five", "5 5 5")}
Parsed:
- {("e=", "5")}
Parsed:
- {("f=", "7")}
Parsed:
- {("g=", "8")}
Parsed:
- {(""Hello\ World\!"", "'8'")}
Parsed:
- {("h=10;i=11;", "bogus"), ("yup", "nope")}
Not Parsed
Not Parsed
Not Parsed
Not Parsed
Not Parsed
Not Parsed
允許空元素
就像替換一樣簡單line:
auto line = -(attribute % ';');
還允許冗余分隔符:
auto line = -(attribute % x3::lit(';')) >> *x3::lit(';');
查看Live On Compiler Explorer
堅持迭代器范圍
我在上面解釋了為什么我認為這是一個壞主意。考慮如何正確解釋此行中的鍵/值:
"\"Hello\\ World\\!\"" '8'
You simply don't want to deal with the grammar outside the parser. However, maybe your data is a 10 gigabyte memory mapped file:
using Field = boost::iterator_range<std::string::const_iterator>;
using Attribute = std::pair<Field /*key*/, //
Field /*value*/>;
And then add x3::raw[] to the lexemes:
auto quoted = x3::lexeme[x3::raw['"' >> *quoted_char >> '"']];
auto unquoted = x3::lexeme[x3::raw[ (x3::graph - ';' - '=')]];
See it Live On Compiler Explorer
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/333192.html
下一篇:洗掉非阿爾法、編輯、添加非阿爾法
