我有包含大量資料的管道分隔資料檔案,我想洗掉 3,7 和 9 列。下面的腳本作業 100% 正常。但它太慢了,22MB 檔案需要 5 分鐘。
Adeel|01|測驗|1234589|日期|金額|00|123345678890|測驗|全部|01| Adeel|00|測驗|1234589|日期|金額|00|123345678890|測驗|全部|00| Adeel|00|測驗|1234589|日期|金額|00|123345678890|測驗|全部|00| Adeel|00|測驗|1234589|日期|金額|00|123345678890|測驗|全部|00| Adeel|05|測驗|1234589|日期|金額|00|123345678890|測驗|全部|05| Adeel|00|測驗|1234589|日期|金額|00|123345678890|測驗|全部|00| Adeel|00|測驗|1234589|日期|金額|00|123345678890|測驗|全部|00| Adeel|00|測驗|1234589|日期|金額|00|123345678890|測驗|全部|00| Adeel|09|測驗|1234589|日期|金額|00|123345678890|測驗|全部|09| Adeel|00|測驗|1234589|日期|金額|00|123345678890|測驗|全部|00| Adeel|00|測驗|1234589|日期|金額|00|123345678890|測驗|全部|00| Adeel|12|測驗|1234589|日期|金額|00|123345678890|測驗|全部|12|
param
(
# Input data file
[string]$Path = 'O:\Temp\test.txt',
# Columns to be removed, any order, dupes are allowed
[int[]]$Remove = (3,6)
)
# sort indexes descending and remove dupes
$Remove = $Remove | Sort-Object -Unique -Descending
# read input lines
Get-Content $Path | .{process{
# split and add to ArrayList which allows to remove items
$list = [Collections.ArrayList]($_ -split '\|')
# remove data at the indexes (from tail to head due to descending order)
foreach($i in $Remove) {
$list.RemoveAt($i)
}
# join and output
#$list -join '|'
$contentUpdate=$list -join '|'
Add-Content "O:\Temp\testoutput.txt" $contentUpdate
}
}
uj5u.com熱心網友回復:
Get-Content比較慢。使用管道會增加額外的開銷。
當性能很重要StreamReader并且StreamWriter可能是更好的選擇時:
param (
# Input data file
[string] $InputPath = 'input.txt',
# Output data file
[string] $OutputPath = 'output.txt',
# Columns to be removed, any order, dupes are allowed
[int[]] $Remove = (1, 2, 2),
# Column separator
[string] $Separator = '|',
# Input file encoding
[Text.Encoding] $Encoding = [Text.Encoding]::Default
)
$ErrorActionPreference = 'Stop'
# Gets rid of dupes and provides fast lookup ability
$removeSet = [Collections.Generic.HashSet[int]] $Remove
$reader = $writer = $null
try {
$reader = [IO.StreamReader]::new(( Convert-Path -LiteralPath $InputPath ), $encoding )
$null = New-Item $OutputPath -ItemType File -Force # as Convert-Path requires existing path
while( $line = $reader.ReadLine() ) {
if( -not $writer ) {
# Construct writer only after first line has been read, so $reader.CurrentEncoding is available
$writer = [IO.StreamWriter]::new(( Convert-Path -LiteralPath $OutputPath ), $false, $reader.CurrentEncoding )
}
$columns = $line.Split( $separator )
$isAppend = $false
for( $i = 0; $i -lt $columns.Length; $i ) {
if( -not $removeSet.Contains( $i ) ) {
if( $isAppend ) { $writer.Write( $separator ) }
$writer.Write( $columns[ $i ] )
$isAppend = $true
}
}
$writer.WriteLine() # Write (CR)LF
}
}
finally {
# Make sure to dispose the reader and writer so files get closed.
if( $writer ) { $writer.Dispose() }
if( $reader ) { $reader.Dispose() }
}
Convert-Path使用 .NET 的當前目錄與 PowerShell 不同,因此最好將絕對路徑傳遞給 .NET API。- 如果這仍然不夠快,請考慮改用 C# 撰寫。特別是對于這樣的“低級”代碼,C# 往往更快。您可以使用
Add-Type -TypeDefinition $csCode. - 作為另一種優化,您可以使用and僅提取必要的列,而不是使用
String.Split()which 創建比實際需要的更多的子字串。String.IndexOf()String.Substring() - 最后同樣重要的是,您可以嘗試使用允許指定緩沖區大小的
StreamReader構造StreamWriter函式。
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/490720.html
