從|中洗掉3,7和9列的緩慢性使用PowerShell分隔txt檔案-有解無憂

我有包含大量資料的管道分隔資料檔案，我想洗掉 3,7 和 9 列。下面的腳本作業 100% 正常。但它太慢了，22MB 檔案需要 5 分鐘。

Adeel|01|測驗|1234589|日期|金額|00|123345678890|測驗|全部|01| Adeel|00|測驗|1234589|日期|金額|00|123345678890|測驗|全部|00| Adeel|00|測驗|1234589|日期|金額|00|123345678890|測驗|全部|00| Adeel|00|測驗|1234589|日期|金額|00|123345678890|測驗|全部|00| Adeel|05|測驗|1234589|日期|金額|00|123345678890|測驗|全部|05| Adeel|00|測驗|1234589|日期|金額|00|123345678890|測驗|全部|00| Adeel|00|測驗|1234589|日期|金額|00|123345678890|測驗|全部|00| Adeel|00|測驗|1234589|日期|金額|00|123345678890|測驗|全部|00| Adeel|09|測驗|1234589|日期|金額|00|123345678890|測驗|全部|09| Adeel|00|測驗|1234589|日期|金額|00|123345678890|測驗|全部|00| Adeel|00|測驗|1234589|日期|金額|00|123345678890|測驗|全部|00| Adeel|12|測驗|1234589|日期|金額|00|123345678890|測驗|全部|12|

    param
(
    # Input data file
    [string]$Path = 'O:\Temp\test.txt',
    # Columns to be removed, any order, dupes are allowed
    [int[]]$Remove = (3,6)
)

# sort indexes descending and remove dupes
$Remove = $Remove | Sort-Object -Unique -Descending

# read input lines
Get-Content $Path | .{process{
    # split and add to ArrayList which allows to remove items
    $list = [Collections.ArrayList]($_ -split '\|')

    # remove data at the indexes (from tail to head due to descending order)
    foreach($i in $Remove) {
        $list.RemoveAt($i)
    }

    # join and output
    #$list -join '|'
    $contentUpdate=$list -join '|'
    Add-Content "O:\Temp\testoutput.txt" $contentUpdate
}
}

uj5u.com熱心網友回復：

Get-Content比較慢。使用管道會增加額外的開銷。

當性能很重要StreamReader并且StreamWriter可能是更好的選擇時：

param (
    # Input data file
    [string] $InputPath = 'input.txt',
    # Output data file
    [string] $OutputPath = 'output.txt',
    # Columns to be removed, any order, dupes are allowed
    [int[]] $Remove = (1, 2, 2),
    # Column separator
    [string] $Separator = '|',
    # Input file encoding
    [Text.Encoding] $Encoding = [Text.Encoding]::Default
)

$ErrorActionPreference = 'Stop'

# Gets rid of dupes and provides fast lookup ability
$removeSet = [Collections.Generic.HashSet[int]] $Remove

$reader = $writer = $null

try {
    $reader = [IO.StreamReader]::new(( Convert-Path -LiteralPath $InputPath ), $encoding )

    $null = New-Item $OutputPath -ItemType File -Force  # as Convert-Path requires existing path

    while( $line = $reader.ReadLine() ) {

        if( -not $writer ) {
            # Construct writer only after first line has been read, so $reader.CurrentEncoding is available 
            $writer = [IO.StreamWriter]::new(( Convert-Path -LiteralPath $OutputPath ), $false, $reader.CurrentEncoding )
        }

        $columns = $line.Split( $separator )
        $isAppend = $false

        for( $i = 0; $i -lt $columns.Length; $i   ) {
            if( -not $removeSet.Contains( $i ) ) {
                if( $isAppend ) { $writer.Write( $separator ) }
                $writer.Write( $columns[ $i ] )
                $isAppend = $true
            }
        }

        $writer.WriteLine()  # Write (CR)LF
    }
}
finally {
    # Make sure to dispose the reader and writer so files get closed.
    if( $writer ) { $writer.Dispose() }
    if( $reader ) { $reader.Dispose() }
}

Convert-Path使用 .NET 的當前目錄與 PowerShell 不同，因此最好將絕對路徑傳遞給 .NET API。
如果這仍然不夠快，請考慮改用 C# 撰寫。特別是對于這樣的“低級”代碼，C# 往往更快。您可以使用Add-Type -TypeDefinition $csCode.
作為另一種優化，您可以使用and僅提取必要的列，而不是使用String.Split()which 創建比實際需要的更多的子字串。String.IndexOf()String.Substring()
最后同樣重要的是，您可以嘗試使用允許指定緩沖區大小的StreamReader構造StreamWriter函式。

轉載請註明出處，本文鏈接：https://www.uj5u.com/qukuanlian/490720.html

標籤：电源外壳表现

上一篇：PowerShell5.1在這個簡單的例子中我沒有得到預期的輸出

下一篇：使用多個foreach陳述句匯出到csvpowershell腳本