我很想測驗 PowerShell 中異步任務的性能/實用性Start-ThreadJob,Start-Job和Start-Process. 我有一個包含大約 100 個 zip 檔案的檔案夾,因此提出了以下測驗:
New-Item "000" -ItemType Directory -Force # Move the old zip files in here
foreach ($i in $zipfiles) {
$name = $i -split ".zip"
Start-Job -scriptblock {
7z.exe x -o"$name" .\$name
Move-Item $i 000\ -Force
7z.exe a $i .\$name\*.*
}
}
這樣做的問題是它會啟動所有 100 個 zip 的作業,這可能太多了,所以我想設定一個值$numjobs,比如 5,我可以更改它,這樣只會$numjobs同時啟動,并且然后腳本將在下一個 5 塊開始之前檢查所有 5 個結束的作業。然后我想根據值來觀察 CPU 和記憶體$numjobs
我如何告訴一個回圈只運行 5 次,然后等待作業完成后再繼續?
我發現等待作業完成很容易
$jobs = $commands | Foreach-Object { Start-ThreadJob $_ }
$jobs | Receive-Job -Wait -AutoRemoveJobchange
但是我怎么能等待Start-Process任務結束呢?
雖然我想使用PowerShell 5.1,但我作業的企業將在接下來的 3-4 年內與 PowerShell 5.1 緊密相關,我預計沒有機會安裝 PowerShell 7.x(盡管我對Parallel-ForEach自己進行測驗感到好奇)Parallel-ForEach我的家庭系統來比較所有方法)。
uj5u.com熱心網友回復:
ForEach-Object -Parallel并且Start-ThreadJob具有限制可以同時運行的執行緒數量的內置功能,這同樣適用于帶有RunspacePool的Runspace,這是兩個 cmdlet 在幕后使用的。
Start-Job不提供此類功能,因為每個作業都在單獨的行程中運行,而不是前面提到的 cmdlet,它們都在同一行程中的不同執行緒中運行。我個人也不會將其視為并行性替代方案,它非常慢,并且在大多數情況下,線性回圈會比它快。在某些情況下,序列化和反序列化也可能是一個問題。
如何限制運行執行緒數?
兩個 cmdlet 都-ThrottleLimit為此提供了引數。
- https://learn.microsoft.com/en-us/powershell/module/threadjob/start-threadjob?view=powershell-7.2#-throttlelimit
- https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/foreach-object?view=powershell-7.2#-throttlelimit
代碼看起來如何?
$dir = (New-Item "000" -ItemType Directory -Force).FullName
# ForEach-Object -Parallel
$zipfiles | ForEach-Object -Parallel {
$name = [IO.Path]::GetFileNameWithoutExtension($_)
7z.exe x -o $name .\$name
Move-Item $_ $using:dir -Force
7z.exe a $_ .\$name\*.*
} -ThrottleLimit 5
# Start-ThreadJob
$jobs = foreach ($i in $zipfiles) {
Start-ThreadJob {
$name = [IO.Path]::GetFileNameWithoutExtension($using:i)
7z.exe x -o $name .\$name
Move-Item $using:i $using:dir -Force
7z.exe a $using:i .\$name\*.*
} -ThrottleLimit 5
}
$jobs | Receive-Job -Wait -AutoRemoveJob
如何在只有 PowerShell 5.1 可用且無法安裝新模塊的情況下實作相同功能?
RunspacePool提供了相同的功能,無論是使用它的.SetMaxRunspaces(Int32)方法,還是針對提供限制作為引數的RunspaceFactory.CreateRunspacePool多載之一。maxRunspaces
代碼看起來如何?
$dir = (New-Item "000" -ItemType Directory -Force).FullName
$limit = 5
$iss = [initialsessionstate]::CreateDefault2()
$pool = [runspacefactory]::CreateRunspacePool(1, $limit, $iss, $Host)
$pool.ThreadOptions = [Management.Automation.Runspaces.PSThreadOptions]::ReuseThread
$pool.Open()
$tasks = foreach ($i in $zipfiles) {
$ps = [powershell]::Create().AddScript({
param($path, $dir)
$name = [IO.Path]::GetFileNameWithoutExtension($path)
7z.exe x -o $name .\$name
Move-Item $path $dir -Force
7z.exe a $path .\$name\*.*
}).AddParameters(@{ path = $i; dir = $dir })
$ps.RunspacePool = $pool
@{ Instance = $ps; AsyncResult = $ps.BeginInvoke() }
}
foreach($task in $tasks) {
$task['Instance'].EndInvoke($task['AsyncResult'])
$task['Instance'].Dispose()
}
$pool.Dispose()
請注意,對于所有示例,尚不清楚 7zip 代碼是否正確,此答案試圖演示如何在 PowerShell 中完成異步,而不是如何壓縮檔案/檔案夾。
下面是一個幫助函式,可以簡化并行呼叫的程序,嘗試模擬ForEach-Object -Parallel并與 PowerShell 5.1 兼容,但不應將其視為強大的解決方案:
using namespace System.Management.Automation
using namespace System.Management.Automation.Runspaces
using namespace System.Collections.Generic
function Invoke-Parallel {
[CmdletBinding()]
param(
[Parameter(Mandatory, ValueFromPipeline, DontShow)]
[object] $InputObject,
[Parameter(Mandatory, Position = 0)]
[scriptblock] $ScriptBlock,
[Parameter()]
[int] $ThrottleLimit = 5,
[Parameter()]
[hashtable] $ArgumentList
)
begin {
$iss = [initialsessionstate]::CreateDefault2()
if($PSBoundParameters.ContainsKey('ArgumentList')) {
foreach($argument in $ArgumentList.GetEnumerator()) {
$iss.Variables.Add([SessionStateVariableEntry]::new($argument.Key, $argument.Value, ''))
}
}
$pool = [runspacefactory]::CreateRunspacePool(1, $ThrottleLimit, $iss, $Host)
$tasks = [List[hashtable]]::new()
$pool.ThreadOptions = [PSThreadOptions]::ReuseThread
$pool.Open()
}
process {
try {
$ps = [powershell]::Create().AddScript({
$args[0].InvokeWithContext($null, [psvariable]::new("_", $args[1]))
}).AddArgument($ScriptBlock.Ast.GetScriptBlock()).AddArgument($InputObject)
$ps.RunspacePool = $pool
$invocationInput = [PSDataCollection[object]]::new(1)
$invocationInput.Add($InputObject)
$tasks.Add(@{
Instance = $ps
AsyncResult = $ps.BeginInvoke($invocationInput)
})
}
catch {
$PSCmdlet.WriteError($_)
}
}
end {
try {
foreach($task in $tasks) {
$task['Instance'].EndInvoke($task['AsyncResult'])
if($task['Instance'].HadErrors) {
$task['Instance'].Streams.Error
}
$task['Instance'].Dispose()
}
}
catch {
$PSCmdlet.WriteError($_)
}
finally {
if($pool) { $pool.Dispose() }
}
}
}
它是如何作業的一個例子:
# Hashtable Key becomes the Variable Name inside the Runspace!
$outsideVariables = @{ Message = 'Hello from {0}' }
0..10 | Invoke-Parallel {
"[Item $_] - " $message -f [runspace]::DefaultRunspace.InstanceId
Start-Sleep 5
} -ArgumentList $outsideVariables -ThrottleLimit 3
uj5u.com熱心網友回復:
要添加到Santiago Squarzon 的有用答案:
下面是輔助函式Measure-Parallel,它允許您比較以下并行方法的速度:
Start-Job:- 基于子行程:在后臺創建子 PowerShell 行程,這使得這種方法既慢又占用資源。
Start-ThreadJob- 隨PowerShell (Core) (v6 ) 一起提供;可通過Install-Module ThreadJobWindows PowerShell v5.1 安裝:- 基于執行緒:比
Start-Job提供相同功能時要輕得多;另外避免了由于跨行程式列化/反序列化而導致的型別保真度的潛在損失。
- 基于執行緒:比
ForEach-Object-Parallel- 僅在 PowerShell (Core) 7.0 中可用:- 基于執行緒:本質上是一個簡化的包裝器
Start-ThreadJob,支持直接管道輸入和直接輸出,始終同步整體執行(等待所有啟動的執行緒)。
- 基于執行緒:本質上是一個簡化的包裝器
Start-Process- 基于子行程:默認異步呼叫外部程式,在 Windows 上默認在新視窗中呼叫。
- 請注意,這種方法僅在您的并行任務僅包含對外部程式的單個呼叫而不是需要執行PowerShell 代碼塊的情況下才有意義。
- 值得注意的是,使用這種方法捕獲輸出的唯一方法是重定向到檔案,總是作為純文本。
筆記:
鑒于下面的測驗包裝了對外部可執行檔案的單個呼叫(例如
7z.exe在您的情況下),該Start-Process方法將表現最佳,因為它沒有作業管理的開銷。然而,如上所述,這種方法具有基本的局限性。由于其復雜性,圣地亞哥的答案中基于運行空間池的方法不包括在內;如果
Start-ThreadJob或ForEach-Object -Parallel對您可用,您將無需訴諸這種方法。
示例Measure-Parallelism呼叫,對比方法的運行時性能:
# Run 20 jobs / processes in parallel, 5 at a time, comparing
# all approaches.
# Note: Omit the -Approach argument to enter interactive mode.
Measure-Parallel -Approach All -BatchSize 5 -JobCount 20
運行 PowerShell 7.2.6 的 macOS 機器的示例輸出(時間因許多因素而異,但比率應該提供相對性能的感覺):
# ... output from the jobs
JobCount : 20
BatchSize : 5
BatchCount : 4
Start-Job (secs.) : 2.20
Start-ThreadJob (secs.) : 1.17
Start-Process (secs.) : 0.84
ForEach-Object -Parallel (secs.) : 0.94
結論:
ForEach-Object -Parallel添加最少的執行緒/作業管理開銷,其次是Start-ThreadJobStart-Job,由于需要一個額外的子行程 - 對于運行每個任務的隱藏 PowerShell 實體 - 明顯較慢。似乎在 Windows 上,性能差異更加明顯。
Measure-Parallel源代碼:
重要:
該函式對示例輸入物件以及要呼叫的外部程式進行硬編碼——您必須根據需要自己編輯它;在這種情況下,硬編碼的外部程式是平臺原生 shell(
cmd.exe在 Windows 上,/bin/sh在類 Unix 平臺上),它被傳遞一個命令來簡單地回顯每個輸入物件。- 修改函式以接受腳本塊作為引數并通過管道接收作業的輸入物件并不難(盡管這會排除該
Start-Process方法,除非您通過 PowerShell CLI 顯式呼叫該塊- 但在那種情況下Start-Job只能使用)。
- 修改函式以接受腳本塊作為引數并通過管道接收作業的輸入物件并不難(盡管這會排除該
作業/流程輸出的內容直接顯示并無法捕獲。
批量大小,默認為
5,可以用-BatchSize;修改 對于基于執行緒的方法,批量大小也用作-ThrottleLimit引數,即允許同時運行多少個執行緒的限制。默認情況下,運行單個批處理,但您可以通過將并行運行的總數傳遞給間接請求多個批處理-JobCount您可以通過陣列值
-Approach引數選擇方法,它支持Job、ThreadJob、Process、ForEachParallel和All,它結合了前面的所有內容。- 如果
-Approach未指定,則進入互動模式,在該模式下(反復)提示您選擇所需的方法。
- 如果
除了在互動模式下,輸出具有比較時序的自定義物件。
function Measure-Parallel {
[CmdletBinding()]
param(
[ValidateRange(2, 2147483647)] [int] $BatchSize = 5,
[ValidateSet('Job', 'ThreadJob', 'Process', 'ForEachParallel', 'All')] [string[]] $Approach,
[ValidateRange(2, 2147483647)] [int] $JobCount = $BatchSize # pass a higher count to run multiple batches
)
$noForEachParallel = $PSVersionTable.PSVersion.Major -lt 7
$noStartThreadJob = -not (Get-Command -ErrorAction Ignore Start-ThreadJob)
$interactive = -not $Approach
if (-not $interactive) {
# Translate the approach arguments into their corresponding hashtable keys (see below).
if ('All' -eq $Approach) { $Approach = 'Job', 'ThreadJob', 'Process', 'ForEachParallel' }
$approaches = $Approach.ForEach({
if ($_ -eq 'ForEachParallel') { 'ForEach-Object -Parallel' }
else { $_ -replace '^', 'Start-' }
})
}
if ($noStartThreadJob) {
if ($interactive -or $approaches -contains 'Start-ThreadJob') {
Write-Warning "Start-ThreadJob is not installed, omitting its test; install it with ``Install-Module ThreadJob``"
$approaches = $approaches.Where({ $_ -ne 'Start-ThreadJob' })
}
}
if ($noForEachParallel) {
if ($interactive -or $approaches -contains 'ForEach-Object -Parallel') {
Write-Warning "ForEach-Object -Parallel is not available in this PowerShell version (requires v7 ), omitting its test."
$approaches = $approaches.Where({ $_ -ne 'ForEach-Object -Parallel' })
}
}
# Simulated input: Create 'f0.zip', 'f1'.zip', ... file names.
$zipFiles = 0..($JobCount - 1) -replace '^', 'f' -replace '$', '.zip'
# Sample executables to run - here, the native shell is called to simply
# echo the argument given.
# The external program to invoke.
$exe = if ($env:OS -eq 'Windows_NT') { 'cmd.exe' } else { 'sh' }
# The list of its arguments *as a single string* - use '{0}' as the placeholder for where the input object should go.
$exeArgList = if ($env:OS -eq 'Windows_NT') { '/c "echo {0}"' } else { '-c "echo {0}"' }
# A hashtable with script blocks that implement the 3 approaches to parallelism.
$approachImpl = [ordered] @{}
$approachImpl['Start-Job'] = { # child-process-based job
param([array] $batch)
$batch |
ForEach-Object {
Start-Job { Invoke-Expression ($using:exe ' ' ($using:exeArgList -f $args[0])) } -ArgumentList $_
} |
Receive-Job -Wait -AutoRemoveJob # wait for all jobs, relay their output, then remove them.
}
if (-not $noStartThreadJob) {
# If Start-ThreadJob is available, add an approach for it.
$approachImpl['Start-ThreadJob'] = { # thread-based job - requires Install-Module ThreadJob in WinPS
param([array] $batch)
$batch |
ForEach-Object {
Start-ThreadJob -ThrottleLimit $BatchSize { Invoke-Expression ($using:exe ' ' ($using:exeArgList -f $args[0])) } -ArgumentList $_
} |
Receive-Job -Wait -AutoRemoveJob
}
}
if (-not $noForEachParallel) {
# If ForEach-Object -Parallel is supported (v7 ), add an approach for it.
$approachImpl['ForEach-Object -Parallel'] = {
param([array] $batch)
$batch | ForEach-Object -ThrottleLimit $BatchSize -Parallel {
Invoke-Expression ($using:exe ' ' ($using:exeArgList -f $_))
}
}
}
$approachImpl['Start-Process'] = { # direct execution of an external program
param([array] $batch)
$batch |
ForEach-Object {
Start-Process -NoNewWindow -PassThru $exe -ArgumentList ($exeArgList -f $_)
} |
Wait-Process # wait for all processes to terminate.
}
# Partition the array of all indices into subarrays (batches)
$batches = @(
0..([math]::Ceiling($zipFiles.Count / $batchSize) - 1) | ForEach-Object {
, $zipFiles[($_ * $batchSize)..($_ * $batchSize $batchSize - 1)]
}
)
# In interactive use, print verbose messages by default
if ($interactive) { $VerbosePreference = 'Continue' }
:menu while ($true) {
if ($interactive) {
# Prompt for the approach to use.
$choices = $approachImpl.Keys.ForEach({
if ($_ -eq 'ForEach-Object -Parallel') { '&' $_ }
else { $_ -replace '-', '-&' }
}) '&Quit'
$choice = $host.ui.PromptForChoice("Approach", "Select parallelism approach:", $choices, 0)
if ($choice -eq $approachImpl.Count) { break }
$approachKey = @($approachImpl.Keys)[$choice]
}
else {
# Use the given approach(es)
$approachKey = $approaches
}
$tsTotals = foreach ($appr in $approachKey) {
$i = 0; $tsTotal = [timespan] 0
$batches | ForEach-Object {
$ts = Measure-Command { & $approachImpl[$appr] $_ | Out-Host }
Write-Verbose "$batchSize-element '$appr' batch finished in $($ts.TotalSeconds.ToString('N2')) secs."
$tsTotal = $ts
if ( $i -eq $batches.Count) {
# last batch processed.
if ($batches.Count -gt 1) {
Write-Verbose "'$appr' processing of $JobCount items overall finished in $($tsTotal.TotalSeconds.ToString('N2')) secs."
}
$tsTotal # output the overall timing for this approach
}
elseif ($interactive) {
$choice = $host.ui.PromptForChoice("Continue?", "Select action", ('&Next batch', '&Return to Menu', '&Quit'), 0)
if ($choice -eq 1) { continue menu }
if ($choice -eq 2) { break menu }
}
}
}
if (-not $interactive) {
# Output a result object with the overall timings.
$oht = [ordered] @{}; $i = 0
$oht['JobCount'] = $JobCount
$oht['BatchSize'] = $BatchSize
$oht['BatchCount'] = $batches.Count
foreach ($appr in $approachKey) {
$oht[($appr ' (secs.)')] = $tsTotals[$i ].TotalSeconds.ToString('N2')
}
[pscustomobject] $oht
break # break out of the infinite :menu loop
}
}
}
uj5u.com熱心網友回復:
您可以在 foreach 回圈中添加一個計數器并在計數器達到您想要的值時中斷
$numjobs = 5
$counter = 0
foreach ($i in $zipfiles) {
$counter
if ($counter -ge $numjobs) {
break
}
<your code>
}
或使用 Powershells Foreach-Object
$numjobs = 5
$zipfiles | select -first $numjobs | Foreach-Object {
<your code>
}
如果要分批處理整個陣列并等待每個批次完成,則必須保存回傳的物件Start-Job并將其傳遞給Wait-Job如下:
$items = 1..100
$batchsize = 5
while ($true) {
$jobs = @()
$counter = 0
foreach ($i in $items) {
if ($counter -ge $batchsize) {
$items = $items[$batchsize..($items.Length)]
break
}
$jobs = Start-Job -ScriptBlock { Start-Sleep 10 }
$counter
}
foreach ($job in $jobs) {
$job | Wait-Job | Out-Null
}
if (!$items) {
break
}
}
按照設計,陣列具有固定長度,這就是為什么我要重寫整個陣列$items = $items[$batchsize..($items.Length)]
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/513261.html
