賞金將在 7 天后到期。此問題的答案有資格獲得 50聲望賞金。 Bajan想要獎勵現有的答案。
我正在嘗試決議我最初懷疑是來自服務器的 JSON 組態檔。
經過一些嘗試,當我選擇格式化程式為 JavaScript 時,我能夠在 Notepad 中導航和折疊這些部分。
但是,我堅持如何將這些資料轉換/決議為 JSON/另一種格式,沒有在線工具能夠幫助解決這個問題。
如何決議此文本?理想情況下,我嘗試使用 PowerShell,但如果我能弄清楚如何開始轉換,Python 也是一種選擇。
謝謝!
例如,我正在嘗試決議每個服務器,即。test1, test2, test3 并獲取每個塊中列出的資料。
以下是組態檔格式的示例:
servername {
store {
servers {
* {
value<>
port<>
folder<C:\windows>
monitor<yes>
args<-T -H>
xrg<store>
wysargs<-t -g -b>
accept_any<yes>
pdu_length<23622>
}
test1 {
name<test1>
port<123>
root<c:\test>
monitor<yes>
}
test2 {
name<test2>
port<124>
root<c:\test>
monitor<yes>
}
test3 {
name<test3>
port<125>
root<c:\test>
monitor<yes>
}
}
senders
timeout<30>
}
}
編輯:事后為@zett42 添加賞金,以表彰他在為我的問題提供解決方案方面所做的出色作業和努力!
uj5u.com熱心網友回復:
這是將上述組態檔轉換為python中的dict/json的東西。我只是按照@zett42 的建議做一些正則運算式。
import re
import json
lines = open('configfile', 'r').read()
# Quotations around the keys (next 3 lines)
lines2 = re.sub(r'([a-zA-Z\d_*] )\s?{', r'"\1": {', lines)
# Process k<v> as Key, Value pairs
lines3 = re.sub(r'([a-zA-Z\d_*] )\s?<([^<]*)>', r'"\1": "\2"', lines2)
# Process single key word on the line as Key, value pair with empty value
lines4 = re.sub(r'^\s*([a-zA-Z\d_*] )\s*$', r'"\1": ""', lines3, flags=re.MULTILINE)
# Insert replace \n with commas in lines ending with "
lines5 = re.sub(r'"\n', '",', lines4)
# Remove the comma before the closing bracket
lines6 = re.sub(r',\s*}', '}', lines5)
# Remove quotes from numerical values
lines7 = re.sub(r'"(\d )"', r'\1', lines6)
# Add commas after closing brackets when needed
lines8 = re.sub(r'[ \t\r\f] (?!-)', '', lines7)
lines9 = re.sub(r'(?<=})\n(?=")', r",\n", lines8)
# Enclose in brackets and escape backslash for json parsing
lines10 = '{' lines9.replace('\\', '\\\\') '}'
j = json.JSONDecoder().decode(lines10)
編輯:這是一個可能更清潔的替代方案
# Replace line with just key with key<>
lines2 = re.sub(r'^([^{<>}] )$', r'\1<>', lines, flags=re.MULTILINE)
# Remove spaces not within <>
lines3 = re.sub(r'\s(?!.*?>)|\s(?![^<] >)', '', lines2, flags=re.MULTILINE)
# Quotations
lines4 = re.sub(r'([^{<>}] )(?={)', r'"\1":', lines3)
lines5 = re.sub(r'([^:{<>}] )<([^{<>}]*)>', r'"\1":"\2"', lines4)
# Add commas
lines6 = re.sub(r'(?<=")"(?!")', ',"', lines5)
lines7 = re.sub(r'}(?!}|$)', '},', lines6)
# Remove quotes from numbers
lines8 = re.sub(r'"(\d )"', r'\1', lines7)
# Escape \
lines9 = '{' re.sub(r'\\', r'\\\\', lines8) '}'
uj5u.com熱心網友回復:
我想出了一個比我以前只使用PowerShell代碼的解決方案更簡單的解決方案。
使用RegEx 交替運算子 |,我們將所有標記模式組合成一個模式,并使用命名子運算式來確定哪個實際匹配。
其余代碼在結構上類似于 C#/PS 版本。
using namespace System.Text.RegularExpressions
$ErrorActionPreference = 'Stop'
Function ConvertFrom-ServerData {
[CmdletBinding()]
param (
[Parameter(Mandatory, ValueFromPipeline)] [string] $InputObject
)
begin {
# Key can consist of anything except whitespace and < > { }
$keyPattern = '[^\s<>{}] '
# Order of the patterns is important
$pattern = (
"(?<IntKey>$keyPattern)\s*<(?<IntValue>\d )>",
"(?<TrueKey>$keyPattern)\s*<yes>",
"(?<FalseKey>$keyPattern)\s*<no>",
"(?<StrKey>$keyPattern)\s*<(?<StrValue>.*?)>",
"(?<ObjectBegin>$keyPattern)\s*{",
"(?<ObjectEnd>})",
"(?<KeyOnly>$keyPattern)",
"(?<Invalid>\S )" # any non-whitespace sequence that didn't match the valid patterns
) -join '|'
}
process {
# Output is an ordered hashtable
$curObject = $outputObject = [ordered] @{}
# A stack is used to keep track of nested objects.
$stack = [Collections.Stack]::new()
# For each pattern match
foreach( $match in [RegEx]::Matches( $InputObject, $pattern, [RegexOptions]::Multiline ) ) {
# Get the RegEx groups that have actually matched.
$matchGroups = $match.Groups.Where{ $_.Success -and $_.Name.Length -gt 1 }
$key = $matchGroups[ 0 ].Value
switch( $matchGroups[ 0 ].Name ) {
'ObjectBegin' {
$child = [ordered] @{}
$curObject[ $key ] = $child
$stack.Push( $curObject )
$curObject = $child
break
}
'ObjectEnd' {
$curObject = $stack.Pop()
break
}
'IntKey' {
$value = $matchGroups[ 1 ].Value
$intValue = 0
$curObject[ $key ] = if( [int]::TryParse( $value, [ref] $intValue ) ) { $intValue } else { $value }
break
}
'TrueKey' {
$curObject[ $key ] = $true
break
}
'FalseKey' {
$curObject[ $key ] = $false
break
}
'StrKey' {
$value = $matchGroups[ 1 ].Value
$curObject[ $key ] = $value
break
}
'KeyOnly' {
$curObject[ $key ] = $null
break
}
'Invalid' {
Write-Warning "Invalid token at index $($match.Index): $key"
break
}
}
}
$outputObject # Implicit output
}
}
使用示例:
$sampleData = @'
test-server {
store {
servers {
* {
value<>
port<>
folder<C:\windows> monitor<yes>
args<-T -H>
xrg<store>
wysargs<-t -g -b>
accept_any<yes>
pdu_length<23622>
}
test1 {
name<test1>
port<123>
root<c:\test>
monitor<yes>
}
test2 {
name<test2>
port<124>
root<c:\test>
monitor<yes>
}
test3 {
name<test3>
port<125>
root<c:\test>
monitor<yes>
}
}
senders
timeout<30>
}
}
'@
# Call the parser
$objects = $sampleData | ConvertFrom-ServerData
# Uncomment to verify the whole result
#$objects | ConvertTo-Json -Depth 10
# The parser outputs nested hashtables, so we have to use GetEnumerator() to
# iterate over the key/value pairs.
$objects.'test-server'.store.servers.GetEnumerator().ForEach{
"[ SERVER: $($_.Key) ]"
# Convert server values hashtable to PSCustomObject for better output formatting
[PSCustomObject] $_.Value | Format-List
}
輸出:
[ SERVER: * ]
value :
port :
folder : C:\windows
monitor : True
args : -T -H
xrg : store
wysargs : -t -g -b
accept_any : True
pdu_length : 23622
[ SERVER: test1 ]
name : test1
port : 123
root : c:\test
monitor : True
[ SERVER: test2 ]
name : test2
port : 124
root : c:\test
monitor : True
[ SERVER: test3 ]
name : test3
port : 125
root : c:\test
monitor : True
筆記:
- 我進一步放寬了正則運算式。鍵現在可以包含除空格、、
<和之外的任何字符。>{} - 不再需要換行符。這更靈活,但您不能使用嵌入
>字符的字串。讓我知道這是否有問題。 - 我添加了對無效令牌的檢測,這些令牌作為警告輸出。
"(?<Invalid>\S )"如果您想忽略無效標記,請洗掉該行。
uj5u.com熱心網友回復:
編輯:我已經提出了一個更簡單的、僅限 PowerShell 的解決方案,我建議使用它。
我會保留這個答案,因為它可能對其他場景仍然有用。性能上也可能存在差異(我沒有測量過)。
MYousefi已經通過 Python 實作發布了一個有用的答案。
對于PowerShell,我提出了一個無需轉換為 JSON 步驟即可作業的解決方案。相反,我采用并推廣了Jack Vanlightly的基于 RegEx 的標記器代碼(另請參閱相關博客文章)。分詞器(又名詞法分析器)對輸入文本的元素進行拆分和分類,并輸出一個扁平的令牌流(類別)和相關資料。決議器可以使用這些作為輸入來創建輸入文本的結構化表示。
分詞器是用通用 C# 撰寫的,可用于任何可以使用 RegEx 拆分的輸入。C# 代碼使用該Add-Type命令包含在 PowerShell 中,因此不需要 C# 編譯器。
為簡單起見,決議器函式ConvertFrom-ServerData是用 PowerShell 撰寫的。您只直接使用決議器,因此您無需了解任何有關標記器 C# 代碼的資訊。如果您想將代碼用于不同的輸入,您應該只需要修改 PowerShell 決議器代碼。
將以下檔案保存在與 PowerShell 腳本相同的目錄中:
“RegExTokenizer.cs”:
// Generic, precedence-based RegEx tokenizer.
// This code is based on https://github.com/Vanlightly/DslParser
// from Jack Vanlightly (https://jack-vanlightly.com).
// Modifications:
// - Interface improved for ease-of-use from PowerShell.
// - Return all groups from the RegEx match instead of just the value. This simplifies parsing of key/value pairs by requiring only a single token definition.
// - Some code simplifications, e. g. replacing "for" loops by "foreach".
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Text.RegularExpressions;
namespace DslTokenizer {
public class DslToken<TokenType> {
public TokenType Token { get; set; }
public GroupCollection Groups { get; set; }
}
public class TokenMatch<TokenType> {
public TokenType Token { get; set; }
public GroupCollection Groups { get; set; }
public int StartIndex { get; set; }
public int EndIndex { get; set; }
public int Precedence { get; set; }
}
public class TokenDefinition<TokenType> {
private Regex _regex;
private readonly TokenType _returnsToken;
private readonly int _precedence;
public TokenDefinition( TokenType returnsToken, string regexPattern, int precedence ) {
_regex = new Regex( regexPattern, RegexOptions.Multiline | RegexOptions.IgnoreCase | RegexOptions.Compiled );
_returnsToken = returnsToken;
_precedence = precedence;
}
public IEnumerable<TokenMatch<TokenType>> FindMatches( string inputString ) {
foreach( Match match in _regex.Matches( inputString ) ) {
yield return new TokenMatch<TokenType>() {
StartIndex = match.Index,
EndIndex = match.Index match.Length,
Token = _returnsToken,
Groups = match.Groups,
Precedence = _precedence
};
}
}
}
public class PrecedenceBasedRegexTokenizer<TokenType> {
private List<TokenDefinition<TokenType>> _tokenDefinitions = new List<TokenDefinition<TokenType>>();
public PrecedenceBasedRegexTokenizer() {}
public PrecedenceBasedRegexTokenizer( IEnumerable<TokenDefinition<TokenType>> tokenDefinitions ) {
_tokenDefinitions = tokenDefinitions.ToList();
}
// Easy-to-use interface as alternative to constructor that takes an IEnumerable.
public void AddTokenDef( TokenType returnsToken, string regexPattern, int precedence = 0 ) {
_tokenDefinitions.Add( new TokenDefinition<TokenType>( returnsToken, regexPattern, precedence ) );
}
public IEnumerable<DslToken<TokenType>> Tokenize( string lqlText ) {
var tokenMatches = FindTokenMatches( lqlText );
var groupedByIndex = tokenMatches.GroupBy( x => x.StartIndex )
.OrderBy( x => x.Key )
.ToList();
TokenMatch<TokenType> lastMatch = null;
foreach( var match in groupedByIndex ) {
var bestMatch = match.OrderBy( x => x.Precedence ).First();
if( lastMatch != null && bestMatch.StartIndex < lastMatch.EndIndex ) {
continue;
}
yield return new DslToken<TokenType>(){ Token = bestMatch.Token, Groups = bestMatch.Groups };
lastMatch = bestMatch;
}
}
private List<TokenMatch<TokenType>> FindTokenMatches( string lqlText ) {
var tokenMatches = new List<TokenMatch<TokenType>>();
foreach( var tokenDefinition in _tokenDefinitions ) {
tokenMatches.AddRange( tokenDefinition.FindMatches( lqlText ).ToList() );
}
return tokenMatches;
}
}
}
用 PowerShell 撰寫的決議器函式:
$ErrorActionPreference = 'Stop'
Add-Type -TypeDefinition (Get-Content $PSScriptRoot\RegExTokenizer.cs -Raw)
Function ConvertFrom-ServerData {
[CmdletBinding()]
param (
[Parameter(Mandatory, ValueFromPipeline)] [string] $InputObject
)
begin {
# Define the kind of possible tokens.
enum ServerDataTokens {
ObjectBegin
ObjectEnd
ValueInt
ValueBool
ValueString
KeyOnly
}
# Create an instance of the tokenizer from "RegExTokenizer.cs".
$tokenizer = [DslTokenizer.PrecedenceBasedRegexTokenizer[ServerDataTokens]]::new()
# Define a RegEx for each token where 1st group matches key and 2nd matches value (if any).
# To resolve ambiguities, most specific RegEx must come first
# (e. g. ValueInt line must come before ValueString line).
# Alternatively pass a 3rd integer parameter that defines the precedence.
$tokenizer.AddTokenDef( [ServerDataTokens]::ObjectBegin, '^\s*([\w*] )\s*{' )
$tokenizer.AddTokenDef( [ServerDataTokens]::ObjectEnd, '^\s*}\s*$' )
$tokenizer.AddTokenDef( [ServerDataTokens]::ValueInt, '^\s*(\w )\s*<([ -]?\d )>\s*$' )
$tokenizer.AddTokenDef( [ServerDataTokens]::ValueBool, '^\s*(\w )\s*<(yes|no)>\s*$' )
$tokenizer.AddTokenDef( [ServerDataTokens]::ValueString, '^\s*(\w )\s*<(.*)>\s*$' )
$tokenizer.AddTokenDef( [ServerDataTokens]::KeyOnly, '^\s*(\w )\s*$' )
}
process {
# Output is an ordered hashtable
$outputObject = [ordered] @{}
$curObject = $outputObject
# A stack is used to keep track of nested objects.
$stack = [Collections.Stack]::new()
# For each token produced by the tokenizer
$tokenizer.Tokenize( $InputObject ).ForEach{
# $_.Groups[0] is the full match, which we discard by assigning to $null
$null, $key, $value = $_.Groups.Value
switch( $_.Token ) {
([ServerDataTokens]::ObjectBegin) {
$child = [ordered] @{}
$curObject[ $key ] = $child
$stack.Push( $curObject )
$curObject = $child
break
}
([ServerDataTokens]::ObjectEnd) {
$curObject = $stack.Pop()
break
}
([ServerDataTokens]::ValueInt) {
$intValue = 0
$curObject[ $key ] = if( [int]::TryParse( $value, [ref] $intValue ) ) { $intValue } else { $value }
break
}
([ServerDataTokens]::ValueBool) {
$curObject[ $key ] = $value -eq 'yes'
break
}
([ServerDataTokens]::ValueString) {
$curObject[ $key ] = $value
break
}
([ServerDataTokens]::KeyOnly) {
$curObject[ $key ] = $null
break
}
}
}
$outputObject # Implicit output
}
}
使用示例:
$sampleData = @'
servername {
store {
servers {
* {
value<>
port<>
folder<C:\windows>
monitor<yes>
args<-T -H>
xrg<store>
wysargs<-t -g -b>
accept_any<yes>
pdu_length<23622>
}
test1 {
name<test1>
port<123>
root<c:\test>
monitor<yes>
}
test2 {
name<test2>
port<124>
root<c:\test>
monitor<yes>
}
test3 {
name<test3>
port<125>
root<c:\test>
monitor<yes>
}
}
senders
timeout<30>
}
}
'@
# Call the parser
$objects = $sampleData | ConvertFrom-ServerData
# The parser outputs nested hashtables, so we have to use GetEnumerator() to
# iterate over the key/value pairs.
$objects.servername.store.servers.GetEnumerator().ForEach{
"[ SERVER: $($_.Key) ]"
# Convert server values hashtable to PSCustomObject for better output formatting
[PSCustomObject] $_.Value | Format-List
}
輸出:
[ SERVER: * ]
value :
port :
folder : C:\windows
monitor : True
args : -T -H
xrg : store
wysargs : -t -g -b
accept_any : True
pdu_length : 23622
[ SERVER: test1 ]
name : test1
port : 123
root : c:\test
monitor : True
[ SERVER: test2 ]
name : test2
port : 124
root : c:\test
monitor : True
[ SERVER: test3 ]
name : test3
port : 125
root : c:\test
monitor : True
筆記:
- 如果您將輸入傳遞
Get-Content給決議器,請確保使用引數-Raw,例如$objects = Get-Content input.cfg -Raw | ConvertFrom-ServerData. 否則決議器會嘗試自己決議每個輸入行。 - 我選擇將“是”/“否”值轉換為
bool,因此它們輸出為“真”/“假”。洗掉該行$tokenizer.AddTokenDef( 'ValueBool', ...以將它們決議為原樣string并按原樣輸出。 - 沒有值的鍵
<>(示例中的“發送者”)存盤為具有值的鍵$null。 - 正則運算式強制值只能是單行的(如示例資料所示)。這使我們能夠嵌入
>字符而無需轉義它們。
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/476850.html
標籤:javascript Python json 电源外壳 解析
上一篇:新手powershell引數問題
