我想更好地了解 YAML 的哪些方面指的是資料編碼與哪些方面指的是語意。
一個簡單的例子:
test1: dGVzdDE=
test2: !!binary |
dGVzdDE=
test3:
- 116
- 101
- 115
- 116
- 49
test4: test1
這些值中的哪一個(如果有)是等價的?
我認為它test1編碼了文字字串 value dGVzdDE=。test2并且test3都對相同的陣列進行編碼,只是使用不同的編碼。我不確定test4,它包含相同的位元組test2,test3但是這是否使它成為等效值,或者stringYAML 中的 a 與位元組陣列不同?
不同的工具似乎會產生不同的答案:
- https://onlineyamltools.com/convert-yaml-to-json表明
test2和test3是等價的,但不同于test4 - https://yaml-online-parser.appspot.com/建議
test2和test4是等價的,但不同于test4 yq所有條目都不同yq < test.yml:
{
"test1": "dGVzdDE=",
"test2": "dGVzdDE=\n",
"test3": [
116,
101,
115,
116,
49
],
"test4": "test1"
}
YAML 規范的意圖是什么?
uj5u.com熱心網友回復:
平等
您要求等效,但這不是規范中的術語,因此無法討論(至少不是沒有定義)。我將改為討論平等,它由規范定義如下:
只有當它們的標記和規范形式逐個字符相等時,兩個標量才相等。集合的相等性是遞回定義的。
您的示例中的一個節點具有標簽!!binary,但其他節點沒有標簽。所以我們必須檢查規范中關于沒有顯式標簽的節點標簽的內容:
標簽和方案
YAML 規范說每個節點都有一個tag。任何沒有顯式標記的節點都會分配一個非特定標記。節點分為標量(從文本內容創建)和集合(序列和映射)。每個沒有顯式標記的非普通標量節點(即引號中的每個標量或通過 or 給出的每個標量|)>都獲得非特定標記!,沒有顯式標記的每個其他節點都獲得非特定標記?。
在加載程序中,規范定義了非特定標簽將通過使用方案決議為特定標簽。該規范描述了一些方案,但不需要實作來支持任何特定的方案。
故障保護方案,被設計為最基本的方案,將決議非特定標簽如下:
- 在標量上
!!str - 在序列上
!!seq - 關于映射到
!!map
就是這樣。
通過考慮非特定標簽的種類、節點在檔案中的位置和節點的內容,允許方案從非特定標簽派生特定標簽。例如,由于其內容, JSON 方案將給true標簽一個標量。!!bool
規范說,非特定標簽!只能決議!!str為標量、!!seq序列和!!map映射,但不需要這樣做。這是大多數實作所支持的,這意味著如果你參考你的標量,你會得到一個字串。這很重要,這樣您就可以給標量"true"參考以避免獲得布林值。
順便說一句,規范并沒有說那里定義的每個步驟都必須按照規范中的定義來執行,它更像是一個合乎邏輯的描述。很多實作實際上并沒有從非特定標簽過渡到特定標簽,而是直接根據方案規則為它們加載的 YAML 資料選擇原生型別。
應用平等
Now that we know how tags are assigned to nodes, let's go over your example:
test1: dGVzdDE=
test2: !!binary |
dGVzdDE=
The two values are immediately not equal because even without the tag, their content differs: Literal block scalars (introduced with |) contain the final linebreak, so the value of test2 is "dGVzdEDE=\n" and therefore not equal to the test1 value. You can introduce the literal scalar with |- instead to chop the final linebreak, which I suppose is your intent. In that case, the scalar content is identical.
Now for the tag: The value of test1 is a plain scalar, hence it has a non-specific tag ?. The question is now: Will this be resolved to !!binary? There could be a scheme that does this, but the spec doesn't define one. But think about it: A scheme that assigns every scalar the tag !!binary if it looks like base64-encoded data would be a very specific one.
As for the other values: The test3 value is a sequence, so obviously not equal to any other value. The test4 value contains content not present anywhere else, therefore also not equal.
But yaml-online-parser does things!
Yes. The YAML spec explicitly states that the target of loading YAML data is native data types. Tags are thought of as generic hints that can be mapped to native data types by a specific implementation. So an !!str for example would be resolved to the target language's string type.
How this mapping to native types is done is implementation-defined (and must be, since the spec cannot cater to every language out there). yaml-online-parser uses PyYAML and what it does is to load the YAML into Python's native data types, and then dump it again. In this process, the !!binary will get loaded into a Python binary string. However, during dumping, this binary string will get interpreted as UTF-8 string and then written as plain scalar. You can argue this is a bug, but it certainly doesn't violate the spec (as the spec doesn't know what a Python binary string is and therefore does not define how it is to be represented).
In any case, this shows that as soon as you transition to native types and back again, everything goes and nothing is certain because native types are outside of the spec. Different implementations will give you different outputs because they are allowed to. !!binary is not a tag defined in the JSON scheme so even translating your input to JSON is not well-defined.
If you want an online tool that shows you canonical YAML representation without loading data into native types and back, you can use the NimYAML testing ground (my work).
Conclusion
The question of whether two YAML inputs are equal is an academic one. Since YAML does allow for different schemes, the question can only be definitely answered in the context of a certain scheme.
However, you will find very few formal scheme definitions outside of the YAML spec. Most applications that do use YAML will document their input structure in a less formal way, and most of the time without discussing YAML tags. This is fine because as discussed before, loading YAML does not need to directly implement the logical process described in the spec.
Your answer for practical purposes should come from the documentation of the application consuming the YAML data. If the documentation is very good, it will answer this, but a lot of YAML-consuming applications just use the default settings of the YAML implementation they use without telling you about this.
So the takeaway is: Know your application and know the YAML implementation it uses.
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/436487.html
