我正在并行運行 4 個 bash 腳本,所有 4 個腳本同時運行:
./script1.sh & ./script2.sh & ./script3.sh & ./script4.sh
一旦其中任何一個失敗,我想退出。我試圖使用類似退出代碼的東西,但它似乎不適用于并行腳本。有解決方法嗎?歡迎任何 bash/python 解決方案。
uj5u.com熱心網友回復:
這是一個可以為您完成的腳本。我從這里借用(和修改)non_blocking_wait函式。
#!/bin/bash
# Run your scripts here... Following sleep commands as an example
sleep 5 &
sleep 3 &
sleep 3 &
# Here, we get the pid of each running process an put in the array "pids"
pids=( $(jobs -p | tr '\n' ' ') )
echo "pids = ${pids[@]}"
non_blocking_wait()
{
PID=$1
if [ ! -d "/proc/$PID" ]; then
wait $PID
CODE=$?
else
CODE=127
fi
echo $CODE
}
while true; do
# Check if all processes are still running
n_running=$(jobs -l | grep -c "Running")
if [ "${n_running}" -ne "3" ]; then
# At least one processes finished/returned here,
# check if exited in error
for pid in ${pids[@]}; do
ret=$(non_blocking_wait ${pid})
echo "non_blocking_wait ${pid} ret = ${ret}"
if [ "${ret}" -ne "0" ] && [ "${ret}" -ne "127" ]; then
echo "Process ${pid} exited with error ${ret}"
# Here we can take any desirable action such as
# killing all children and exiting the program:
kill $(jobs -p) > /dev/null 2>&1
exit 1
fi
if [ "${n_running}" -eq "0" ]; then
echo "All processes finished successfully"
exit 0
fi
done
fi
sleep 1
done
如果你只是運行它,它會在所有行程結束時退出 0:
$ ./script.sh
pids = 17913 17914 17915
non_blocking_wait 17913 ret = 127
non_blocking_wait 17914 ret = 0
non_blocking_wait 17915 ret = 0
non_blocking_wait 17913 ret = 127
non_blocking_wait 17914 ret = 0
non_blocking_wait 17915 ret = 0
non_blocking_wait 17913 ret = 0
All processes finished successfully
您可以從其中一個 sleep 命令中洗掉引數以使其失敗并立即看到程式回傳:
$ ./script.sh
sleep: missing operand
Try 'sleep --help' for more information.
pids = 18005 18006 18007
non_blocking_wait 18005 ret = 127
non_blocking_wait 18006 ret = 1
Process 18006 exited with error 1
uj5u.com熱心網友回復:
一種解決方案是使用子流程:
import subprocess
import time
def do_that(scripts):
ps = [subprocess.Popen('./' s, shell=True) for s in scripts]
while True:
done = True
for p in ps:
rc = p.poll()
if rc is None: # Script is still running
done = False
elif rc:
# if rc==0, script success to finish
# otherwise it failed
print('This script run failed:', p.args)
running = set(ps) - {p}
for i in running:
i.terminate()
print('Force terminate', i.args)
return 1
if done:
print('All done.')
return 0
def timeit(func):
def runner(*args, **kwargs):
start = time.time()
res = func(*args, **kwargs)
end = time.time()
print(func.__name__, 'cost:', round(end-start,1))
return res
return runner
@timeit
def main():
scripts = ('script1.sh', 'script2.sh')
do_that(scripts)
if __name__ == '__main__':
main()
uj5u.com熱心網友回復:
TL;博士
parallel --line-buffer --halt now,fail=1 ::: ./script?.sh
echo $?
42
實際答案
在并行運行作業時,我發現考慮GNU Parallel很有用,因為它使您在許多方面變得容易:
- 資源分配
- 跨多個 CPU 和跨網路的負載分布
- 日志記錄和輸出標記
- 錯誤處理 - 這方面在這里特別感興趣
- 調度,重啟
- 輸入和輸出檔案名推導和重命名
- 進度報告
所以,我script1.sh通過script4.sh這樣的方式做了 4 個虛擬作業:
#!/bin/bash
echo "script1.sh starting..."
sleep 5
echo "script1.sh complete"
除了script3.sh在其他人之前失敗:
#!/bin/bash
echo "script3.sh starting..."
sleep 2
echo "script3.sh dying"
exit 42
因此,這是并行運行 4 個作業的默認方式,每個作業的輸出都收集起來并一個接一個地呈現:
parallel ::: ./script*.sh
script3.sh starting...
script3.sh dying
script1.sh starting...
script1.sh complete
script4.sh starting...
script4.sh complete
script2.sh starting...
script2.sh complete
您可以先看到script3.shdies,然后首先收集并顯示其所有輸出,然后是其他人的分組輸出。簡單來說,輸出按作業分組,并在每個作業完成時呈現。
現在讓我們再做一次,但只按行緩沖輸出,而不是等待作業完成并在每個作業的基礎上收集它:
parallel --line-buffer ::: ./script*.sh
script1.sh starting...
script2.sh starting...
script3.sh starting...
script4.sh starting...
script3.sh dying
script1.sh complete
script2.sh complete
script4.sh complete
我們可以清楚地看到,script3.sh在其他人之前死亡并退出,但他們仍然運行到完成。簡而言之,輸出按其出現的順序逐行呈現。
現在我們希望GNU Parallel在任何一個作業死亡時終止所有正在運行的作業:
parallel --line-buffer --halt now,fail=1 ::: ./script?.sh
script2.sh starting...
script1.sh starting...
script3.sh starting...
script4.sh starting...
script3.sh dying
parallel: This job failed:
./script3.sh
你可以看到它script3.sh死了,并且沒有其他作業完成,因為GNU Parallel殺死了它們。
您還可以獲得失敗的退出狀態:
echo $?
42
它比我展示的要靈活得多。您可以更改now為soon而不是殺死其他作業,它不會啟動任何新作業。您可以更改fail=1為success=50%當一半作業成功退出時它會停止,依此類推。
您還可以添加--eta或--bar獲取進度報告并在您的網路中分配作業等。非常值得一讀,在 CPU 越來越胖(更多內核)而不是更高(更多 GHz)的這些天 - 這里有一個優秀的 PDF可用。
注意:默認情況下,GNU Parallel將保持與 CPU 內核一樣多的作業并行運行。因此,如果您的內核少于 4 個,您可能應該添加-j 4到我建議的答案中,告訴它并行運行最多 4 個作業,即使只有 1 個或 2 個內核存在。
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/512825.html
上一篇:使用grep排除單詞
