查找包含在前處理器指令中的字串-帶有干擾行-有解無憂

這是對我之前的問題的跟進。

我正在嘗試使用 Python 讀取 C 源檔案以提取加載的頭檔案。
頭檔案在#ifdef TYPEA和#elseOR之間指定#endif。如果有#else-clause，則頭檔案將始終在-clause之前指定#else。

讓我們假設源內容的摘錄如下所示：

source_content = '\n'.join([
    'void abc ( int value) {',
    '  return 5 ** 2.5',
    '}',
    '',
    'abc',
    '',
    '#ifdef TYPEA',                                 # <---- begin identifier, may contain leading/trailing whitespaces
    'some_var_is_set = 42.42;',     # <---- I do not need these lines, but it's ok to get them
    '// some annoying comment',     # <---- I do not need these lines, but it's ok to get them
    '#include "some_header.h"',                     # <---- I want these lines
    '           #include "some_other_header23.h"',  # <---- I want these lines
    '// another comment',           # <---- I do not need these lines, but it's ok to get them
    '    some_other_var = 3.159;',  # <---- I do not need these lines, but it's ok to get them
    '#include"some_header.h"',                      # <---- I want these lines, even though it is a dupe
    '           #else        ',                     # optional stop identifier, may contain leading/trailing whitespaces
    'double in_fact_int = 5;',                      # some irrelevant content
    '         #endif    ',                          # final stop identifier, may contain leading/trailing whitespaces
    '',
    '#ifdef TYPEB',
    '  abc = 23.5;',
    '#endif',
])

我想提取#ifdef TYPEA, #else,之間的第 6-14 行#endif，這樣我的結果是：

desired_match = 'some_var_is_set = 42.42;\n// some annoying comment\n#include "some_header.h"\n           #include "some_other_header23.h"\n// another comment\n    some_other_var = 3.159;\n#include"some_header.h"'

print(desired_match)
# Out:    some_var_is_set = 42.42;
// some annoying comment
#include "some_header.h"
           #include "some_other_header23.h"
// another comment
    some_other_var = 3.159;
#include"some_header.h"

洗掉所有不包括的內容#include會很好，但不是必需的。

我目前的做法是：

import re

pattern = re.compile(
    (
        r'(\s*.*)#ifdef(\s )TYPEA(\s*)'
        r'(?P<sources>(#include.*?)(?=((\s*)#else|(\s*)#endif)))'
    ), re.DOTALL
)
match = re.match(pattern, source_content)

只要封閉的#ifdef/#else/#endif. 但是只要有評論等，就不會回傳匹配項。
的#include在pattern被需要的，因為將存在與開始其他塊#ifdef TYPEA將不包含任何包括陳述句，但只有一個 TYPEA與嵌段包括。

提前致謝！

uj5u.com熱心網友回復：

您可以使用以下正則運算式re.search（注意re.match僅回傳在字串開頭找到的匹配項，因此re.search用途更廣）：

#ifdef\s TYPEA\s*(.*?)(?=\s*#(?:else|endif))

如果您需要多個匹配項，您可以將此正則運算式插入到re.findall.

請參閱正則運算式演示。詳情：

#ifdef - 一個固定的字串
\s - 一個或多個空格
TYPEA - 一個固定的字串
\s* - 零個或多個空格
(.*?)- 第 1 組：盡可能少的零個或多個字符（也匹配使用的換行符re.DOTALL）
(?=\s*#(?:else|endif))- 正前瞻匹配緊跟零個或多個空格的位置，然后是#else或#endif。

也請參見Python 演示。

轉載請註明出處，本文鏈接：https://www.uj5u.com/qianduan/385037.html

標籤：Python 正则表达式

上一篇：使用正則運算式提取字串-str_extract、stringr、regex

下一篇：正則運算式（不允許在文本之間使用點）