使用boost精神決議固定寬度的數字-有解無憂

我正在使用精神來決議用固定寬度數字填充的類似 fortran 的文本檔案：

1234 0.000000000000D 001234
1234 7.654321000000D 001234
1234                   1234
1234-7.654321000000D 001234

有符號和無符號整數的決議器，但我找不到固定寬度實數的決議器，有人可以幫忙嗎？

這是我在 Coliru 上的直播

#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted.hpp>
#include <iomanip>
namespace qi = boost::spirit::qi;

struct RECORD {
    uint16_t a{};
    double   b{};
    uint16_t c{};
};

BOOST_FUSION_ADAPT_STRUCT(RECORD, a,b,c)

int main() {
    using It = std::string::const_iterator;
    using namespace qi::labels;

    qi::uint_parser<uint16_t, 10, 4, 4> i4;

    qi::rule<It, double()> X19 = qi::double_ //
        | qi::repeat(19)[' '] >> qi::attr(0.0);

    for (std::string const str : {
             "1234 0.000000000000D 001234",
             "1234 7.654321000000D 001234",
             "1234                   1234",
             "1234-7.654321000000D 001234",
         }) {

        It f = str.cbegin(), l = str.cend();

        RECORD rec;
        if (qi::parse(f, l, (i4 >> X19 >> i4), rec)) {
            std::cout << "{a:" << rec.a << ", b:" << rec.b << ", c:" << rec.c
                      << "}\n";
        } else {
            std::cout << "Parse fail (" << std::quoted(str) << ")\n";
        }
    }
}

這顯然不能決議大多數記錄：

Parse fail ("1234 0.000000000000D 001234")
Parse fail ("1234 7.654321000000D 001234")
{a:1234, b:0, c:1234}
Parse fail ("1234-7.654321000000D 001234")

uj5u.com熱心網友回復：

該機制存在，但它隱藏得更深，因為決議浮點數的細節比整數多得多。

qi::double_(和float_) 實際上是的實體qi::real_parser<double, qi::real_policies<double> >。

該政策是關鍵。它們控制接受何種格式的所有細節。

這是RealPolicies 表達要求

表達	語意
`RP::allow_leading_dot`	允許前導點。
`RP::allow_trailing_dot`	允許尾隨點。
`RP::expect_dot`	需要一個點。
`RP::parse_sign(f, l)`	決議前綴符號（例如“-”）。`true`成功則回傳，否則回傳`false`。
`RP::parse_n(f, l, n)`	決議小數點左邊的整數。`true`成功則回傳，否則回傳`false`。如果成功，將結果放入 n 中。
`RP::parse_dot(f, l)`	決議小數點。`true`成功則回傳，否則回傳`false`。
`RP::parse_frac_n(f, l, n, d)`	決議小數點后的分數。`true`成功則回傳，否則回傳`false`。如果成功，將結果放入n，將位數放入d
`RP::parse_exp(f, l)`	決議指數前綴（例如'e'）。`true`成功則回傳，否則回傳`false`。
`RP::parse_exp_n(f, l, n)`	決議實際指數。`true`成功則回傳，否則回傳`false`。如果成功，將結果放入 n 中。
`RP::parse_nan(f, l, n)`	決議一個 NaN。`true`成功則回傳，否則回傳`false`。如果成功，將結果放入 n 中。
`RP::parse_inf(f, l, n)`	決議一個 Inf。`true`成功則回傳，否則回傳`false`。如果成功，將結果放入 n 中。

讓我們實施您的政策：

namespace policies {
    /* mandatory sign (or space) fixed widths, 'D ' or 'D-' exponent leader */
    template <typename T, int IDigits, int FDigits, int EDigits = 2>
    struct fixed_widths_D : qi::strict_ureal_policies<T> {
        template <typename It> static bool parse_sign(It& f, It const& l);

        template <typename It, typename Attr>
        static bool parse_n(It& f, It const& l, Attr& a);

        template <typename It> static bool parse_exp(It& f, It const& l);

        template <typename It>
        static bool parse_exp_n(It& f, It const& l, int& a);

        template <typename It, typename Attr>
        static bool parse_frac_n(It& f, It const& l, Attr& a, int& n);
    };
} // namespace policies

筆記：

我保持屬性型別通用。
I also base the implementation on the strict strict_urealpolicies to reduce the effort. The base class doesn't support signs, and requires a mandatory decimal separator ('.'), which makes it "strict" and rejecting just integral numbers
Your question format expects 1 digit for the integral part, 12 digits for the fraction and 2 for the exponent, but I don't hardcode so we can reuse the policies for other fixed-width formats (IDigits, FDigits, EDigits)

Let's go through our overrides one-by-one:

`bool parse_sign(f, l)`

The format is fixed-width, so want to accept

a leading space or ' ' for positive
a leading '-' for negative

That way the sign always takes one input character:

template <typename It> static bool parse_sign(It& f, It const&l)
{
    if (f != l) {
        switch (*f) {
        case ' ':
        case ' ':   f; break;
        case '-':   f; return true;
        }
    }
    return false;
}

`bool parse_n(f, l, Attr& a)`

The simplest part: we allow only a single-digit (IDigits) unsigned integer part before the separator. Luckily, integer parsing is relatively common and trivial:

template <typename It, typename Attr>
static bool parse_n(It& f, It const& l, Attr& a)
{
    return qi::extract_uint<Attr, 10, IDigits, IDigits, false, true>::call(f, l, a);
}

`bool parse_exp(f, l)`

Also trivial: we require a 'D' always:

template <typename It> static bool parse_exp(It& f, It const& l)
{
    if (f == l || *f != 'D')
        return false;
      f;
    return true;
}

`bool parse_exp_n(f, l, int& a)`

As for the exponent, we want it to be fixed-width meaning that the sign is mandatory. So, before extracting the signed integer of width 2 (EDigits), we make sure a sign is present:

template <typename It>
static bool parse_exp_n(It& f, It const& l, int& a)
{
    if (f == l || !(*f == ' ' || *f == '-'))
        return false;
    return qi::extract_int<int, 10, EDigits, EDigits>::call(f, l, a);
}

`bool parse_frac_n(f, l, Attr&, int& a)`

The meat of the problem, and also the reason to build on the existing parsers. The fractional digits could be considered integral, but there are issues due to leading zeroes being significant as well as the total number of digits might exceed the capacity of any integral type we choose.

So we do a "trick" - we parse an unsigned integer, but ignoring any excess precision that doesn't fit: in fact we only care about the number of digits. We then check that this number is as expected: FDigits.

Then, we hand off to the base class implementation to actually compute the resulting value correctly, for any generic number type T (that satisfies the minimum requirements).

template <typename It, typename Attr>
static bool parse_frac_n(It& f, It const& l, Attr& a, int& n)
{
    It savef = f;

    if (qi::extract_uint<Attr, 10, FDigits, FDigits, true, true>::call(f, l, a)) {
        n = static_cast<int>(std::distance(savef, f));
        return n == FDigits;
    }
    return false;
}

Summary

You can see, by standing on the shoulders of existing, tested code we're already done and good to parse our numbers:

template <typename T>
using X19_type = qi::real_parser<T, policies::fixed_widths_D<T, 1, 12, 2>>;

Now your code runs as expected: Live On Coliru

template <typename T>
using X19_type = qi::real_parser<T, policies::fixed_widths_D<T, 1, 12, 2>>;

int main() {
    using It = std::string::const_iterator;
    using namespace qi::labels;

    qi::uint_parser<uint16_t, 10, 4, 4> i4;
    X19_type<double>                    x19;

    qi::rule<It, double()> X19 = x19 //
        | qi::repeat(19)[' '] >> qi::attr(0.0);

    for (std::string const str : {
             "1234                   1234",
             "1234 0.000000000000D 001234",
             "1234 7.065432100000D 001234",
             "1234-7.006543210000D 001234",
             "1234 0.065432100000D 031234",
             "1234 0.065432100000D-301234",
         }) {

        It f = str.cbegin(), l = str.cend();

        RECORD rec;
        if (qi::parse(f, l, (i4 >> X19 >> i4), rec)) {
            std::cout << "{a:" << rec.a << ", b:" << std::setprecision(12)
                      << rec.b << ", c:" << rec.c << "}\n";
        } else {
            std::cout << "Parse fail (" << std::quoted(str) << ")\n";
        }
    }
}

Prints

{a:1234, b:0, c:1234}
{a:1234, b:0, c:1234}
{a:1234, b:7.0654321, c:1234}
{a:1234, b:-7.00654321, c:1234}
{a:1234, b:65.4321, c:1234}
{a:1234, b:6.54321e-32, c:1234}

Decimals

Now, it's possible to instantiate this parser with precisions that exceed the precision of double. And there are always issues with the conversion from decimal numbers to inexact binary representation. To showcase how the choice for generic T already caters for this, let's instantiate with a decimal type that allows 64 significant decimal fractional digits:

Live On Coliru

using Decimal = boost::multiprecision::cpp_dec_float_100;

struct RECORD {
    uint16_t a{};
    Decimal  b{};
    uint16_t c{};
};

template <typename T>
using X71_type = qi::real_parser<T, policies::fixed_widths_D<T, 1, 64, 2>>;

int main() {
    using It = std::string::const_iterator;
    using namespace qi::labels;

    qi::uint_parser<uint16_t, 10, 4, 4> i4;
    X71_type<Decimal>                   x71;

    qi::rule<It, Decimal()> X71 = x71 //
        | qi::repeat(71)[' '] >> qi::attr(0.0);

    for (std::string const str : {
             "1234                                                                       6789",
             "2345 0.0000000000000000000000000000000000000000000000000000000000000000D 006789",
             "3456 7.0000000000000000000000000000000000000000000000000000000000654321D 006789",
             "4567-7.0000000000000000000000000000000000000000000000000000000000654321D 006789",
             "5678 0.0000000000000000000000000000000000000000000000000000000000654321D 036789",
             "6789 0.0000000000000000000000000000000000000000000000000000000000654321D-306789",
         }) {

        It f = str.cbegin(), l = str.cend();

        RECORD rec;
        if (qi::parse(f, l, (i4 >> X71 >> i4), rec)) {
            std::cout << "{a:" << rec.a << ", b:" << std::setprecision(65)
                      << rec.b << ", c:" << rec.c << "}\n";
        } else {
            std::cout << "Parse fail (" << std::quoted(str) << ")\n";
        }
    }
}

Prints

{a:2345, b:0, c:6789}
{a:3456, b:7.0000000000000000000000000000000000000000000000000000000000654321, c:6789}
{a:4567, b:-7.0000000000000000000000000000000000000000000000000000000000654321, c:6789}
{a:5678, b:6.54321e-56, c:6789}
{a:6789, b:6.54321e-89, c:6789}

Compare how using a binary long double representation would have lost accuracy here:

{a:2345, b:0, c:6789}
{a:3456, b:7, c:6789}
{a:4567, b:-7, c:6789}
{a:5678, b:6.5432100000000000002913506043764438647482181234694313277925965188e-56, c:6789}
{a:6789, b:6.5432100000000000000601529073044049029207066886931600941449474131e-89, c:6789}

Bonus Take: Optionals

In the current RECORD, missing doubles are silently taken to be 0.0. That's maybe not the best:

struct RECORD {
    uint16_t          a{};
    optional<Decimal> b{};
    uint16_t          c{};
};

// ...

qi::rule<It, optional<Decimal>()> X71 = x71 //
    | qi::repeat(71)[' '];

Now the output is Live On Coliru:

{a:1234, b:--, c:6789}
{a:2345, b: 0, c:6789}
{a:3456, b: 7.0000000000000000000000000000000000000000000000000000000000654321, c:6789}
{a:4567, b: -7.0000000000000000000000000000000000000000000000000000000000654321, c:6789}
{a:5678, b: 6.54321e-56, c:6789}
{a:6789, b: 6.54321e-89, c:6789}

Summary / Add Unit Tests!

That's a lot, but possibly not all you need.

Keep in mind that you still need proper unit tests for e.g. X19_type. Think of all edge cases you may encounter/want to accept/want to reject:

I have not changed any of the base policies dealing with Inf or NaN so you might want to close those gaps
You might actually have wanted to accept " 3.141 ", " .999999999999D 0 " etc.?

All these are pretty simple changes to the policies, but, as you know, code without tests is broken.

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/314419.html

標籤：C 解析振奋精神升气浮点转换

上一篇：如何正確評估文字字符向量

下一篇：如何使用非標準格式(yyyyMMdd:Hmm)決議日期時間？