我想為預先存在的資料存盤檔案型別撰寫決議器。有一個正式的語法,我能夠遵循pegen的語法指南來創建一個語法檔案并讓它編譯并生成一個決議器。
我的問題是決議器沒有產生任何輸出,因為(至少我認為這是問題所在)我不知道如何在語法檔案中設定正確的回傳型別。github 資料檔案夾中的示例沒有那么有用。
如何創建正確的回傳型別?
我的語法檔案:
# Basic CIF structure
start: Comments? WhiteSpace? ( DataBlock ( WhiteSpace DataBlock )* ( WhiteSpace )? )?
DataBlock: DataBlockHeading ( WhiteSpace ( DataItems | SaveFrame ) )*
DataBlockHeading: DATA_ ( NonBlankChar )
SaveFrame: SaveFrameHeading ( WhiteSpace DataItems ) WhiteSpace SAVE_
SaveFrameHeading: SAVE_ ( NonBlankChar )
DataItems: Tag WhiteSpace Value | LoopHeader LoopBody
LoopHeader: LOOP_ ( WhiteSpace Tag )
LoopBody: Value ( WhiteSpace Value )*
# Reserved words
DATA_: ('D' | 'd') ('A' | 'a') ('T' | 't') ('A' | 'a') '_'
LOOP_: ('L' | 'l') ('O' | 'o') ('O' | 'o') ('P' | 'p') '_'
GLOBAL_: ('G' | 'g') ('L' | 'l') ('O' | 'o') ('B' | 'b') ('A' | 'a') ('L' | 'l') '_'
SAVE_: ('S' | 's') ('A' | 'a') ('V' | 'v') ('E' | 'e') '_'
STOP_: ('S' | 's') ('T' | 't') ('O' | 'o') ('P' | 'p')'_'
# Tags and values
Tag: '_' ( NonBlankChar)
Value: ( '.' | '?' | Numeric | CharString | TextField )
# Numeric values
Numeric: ( Number | Number '(' UnsignedInteger ')' )
Number: Integer | Float
Integer: ( ' ' | '-' )? UnsignedInteger
Float: ( Integer Exponent | ( ( ' ' | '-' )? ( ( Digit )* '.' UnsignedInteger ) | ( ( Digit ) '.' ) ) ( Exponent )? )
Exponent: ( ('e' | 'E' ) | ( 'e' | 'E' ) ( ' ' | '- ' ) ) UnsignedInteger
UnsignedInteger: ( Digit )
Digit: ( '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' )
# Strings and text fields
CharString: UnquotedString | SingleQuotedString | DoubleQuotedString
UnquotedString: EOL_UnquotedString | NOTEOL_UnquotedString
EOL_UnquotedString: EOL OrdinaryChar ( NonBlankChar )*
NOTEOL_UnquotedString: NOTEOL ( OrdinaryChar | ';' ) ( NonBlankChar )*
SingleQuotedString: single_quote ( AnyPrintChar )* single_quote WhiteSpace
DoubleQuotedString: double_quote ( AnyPrintChar )* double_quote WhiteSpace
TextField: ( SemiColonTextField )
SemiColonTextField: EOL ';' ( ( AnyPrintChar )* EOL ( ( TextLeadChar ( AnyPrintChar )* )? EOL )* ) ';'
# Whitespace and comments
WhiteSpace: ( SP | HT | EOL | TokenizedComments )
Comments: ( '#' ( AnyPrintChar )* EOL )
TokenizedComments: ( SP | HT | EOL ) Comments
# Character sets
OrdinaryChar: ( '!' | '%' | '&' | '(' | ')' | '*' | ' ' | ',' | '-' | '.' | '/' | '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | ':' | '<' | '=' | '>' | '?' | '@' | 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' | 'H' | 'I' | 'J' | 'K' | 'L' | 'M' | 'N' | 'O' | 'P' | 'Q' | 'R' | 'S' | 'T' | 'U' | 'V' | 'W' | 'X' | 'Y' | 'Z' | '\\' | '^' | '`' | 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i' | 'j' | 'k' | 'l' | 'm' | 'n' | 'o' | 'p' | 'q' | 'r' | 's' | 't' | 'u' | 'v' | 'w' | 'x' | 'y' | 'z' | '{' | '|' | '}' | '~' )
NonBlankChar: ( OrdinaryChar | double_quote | '#' | '$' | single_quote | '_' | ';' | '[' | ']' )
TextLeadChar: ( OrdinaryChar | double_quote | '#' | '$' | single_quote | '_' | SP | HT | '[' | ']' )
AnyPrintChar: ( OrdinaryChar | double_quote | '#' | '$' | single_quote | '_' | SP | HT | ';' | '[' | ']' )
# Special things
EOL: NEWLINE #( '\n' | '\n\r' )
NOTEOL: !EOL
SP: ' '
HT: '\t'
double_quote: '"'
single_quote: '\''
我要決議的測驗檔案(header_only.cif):
data_header
我如何生成決議器:
python -m pegen cif.gram -o parser.py
我如何使用我的決議器:
python parser.py -vv header_only.cif
我的輸出:
start() ... (looking at 1.0: NAME:'data_header')
Comments() ... (looking at 1.0: NAME:'data_header')
_loop1_42() ... (looking at 1.0: NAME:'data_header')
_tmp_58() ... (looking at 1.0: NAME:'data_header')
expect('#') ... (looking at 1.0: NAME:'data_header')
... expect('#') -> None
... _tmp_58() -> None
... _loop1_42() -> []
... Comments() -> None
WhiteSpace() ... (looking at 1.0: NAME:'data_header')
_loop1_41() ... (looking at 1.0: NAME:'data_header')
_tmp_57() ... (looking at 1.0: NAME:'data_header')
SP() ... (looking at 1.0: NAME:'data_header')
expect(' ') ... (looking at 1.0: NAME:'data_header')
... expect(' ') -> None
... SP() -> None
HT() ... (looking at 1.0: NAME:'data_header')
expect('\t') ... (looking at 1.0: NAME:'data_header')
... expect('\t') -> None
... HT() -> None
EOL() ... (looking at 1.0: NAME:'data_header')
expect('NEWLINE') ... (looking at 1.0: NAME:'data_header')
... expect('NEWLINE') -> None
... EOL() -> None
TokenizedComments() ... (looking at 1.0: NAME:'data_header')
_loop1_43() ... (looking at 1.0: NAME:'data_header')
_tmp_59() ... (looking at 1.0: NAME:'data_header')
SP() -> None
HT() -> None
EOL() -> None
... _tmp_59() -> None
... _loop1_43() -> []
... TokenizedComments() -> None
... _tmp_57() -> None
... _loop1_41() -> []
... WhiteSpace() -> None
_tmp_1() ... (looking at 1.0: NAME:'data_header')
DataBlock() ... (looking at 1.0: NAME:'data_header')
DataBlockHeading() ... (looking at 1.0: NAME:'data_header')
DATA_() ... (looking at 1.0: NAME:'data_header')
_tmp_8() ... (looking at 1.0: NAME:'data_header')
expect('D') ... (looking at 1.0: NAME:'data_header')
... expect('D') -> None
expect('d') ... (looking at 1.0: NAME:'data_header')
... expect('d') -> None
... _tmp_8() -> None
... DATA_() -> None
... DataBlockHeading() -> None
... DataBlock() -> None
... _tmp_1() -> None
... start() -> [None, None, None]
[None, None, None]
Total time: 0.031 sec; 1 lines (13 bytes); 32 lines/sec
Caches sizes:
token array : 1
cache : 24
uj5u.com熱心網友回復:
Pegen 為“類 python”語言生成決議器。據我所知,它并不是一個通用的決議器生成器。
特別是,它假定被決議語言的詞法結構與 Python 足夠相似,可以使用相同的分詞器。您要決議的語言似乎并非如此。特別是,您的語言沒有等效于NAMEPython 標記器在看到輸入時自動生成的標記data_header,這就是決議失敗的原因。
Pegen 確實允許您定義關鍵字,它們是 的特定實體NAME,但據我所知,它無法指定與大小寫無關的關鍵字。它也沒有一種機制來識別以前綴開頭的名稱類(如“data_”)。這兩項任務都可以使用正則運算式輕松完成。
Python 有大量的決議器生成器,絕大多數都允許基于正則運算式的自定義標記器,這比包含大量單個字符的串列要方便得多。您可能會發現其中一種更適合您的目的。據我所知,您的語言可以使用簡單的自上而下的預測決議器(LL(1) 或“遞回下降”)進行決議,因此任何通用決議器生成器都應該可以正常作業,甚至是 PEG 生成器。
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/436614.html
上一篇:使用Python從基于關鍵字的材料資料檔案中決議資料
下一篇:使用正則運算式將句子分成單詞
