您好,我需要遍歷成對的檔案并對它們做一些事情。
例如,我有 4 個名為 AA2234_1.fastq.gz AA2234_2.fastq.gz AA3945_1.fastq.gz AA3945_2.fastq.gz
正如你可以正確地告訴他們是AA2234_1.fastq.gz<->AA2234_2.fastq.gz和AA3945_1.fastq.gz<-> AA3945_2.fastq.gz(他們在_簽名前共享名稱)
我有一個command看起來像這樣的語法:
initialize_of_command file1 file2 output_a output_b output_c output_d parameteres
我希望這個腳本fastq.gz在一個目錄中找到帶有擴展名的檔案的數量,將它們除以 2 以找到對的數量,然后使用可能的正則運算式(可能是兩個變數)將這些對匹配在一起,并command為每一對執行一次。
我不知道如何使用正則運算式配對這些檔案以及如何迭代這些對,以便腳本知道它已經迭代了哪些對。
這是我未完成的腳本:
#!/bin/bash
raw_count_of_files=$(ls | grep -c "fastq.gz")
count_of_files=$((raw_count_of_files / 2))
for ((i=1;i<=count_of_files;i ));
do
java -jar /home/aa/git/trimmomatic/src/Trimmomatic/trimmomatic-0.39.jar PE -phred33 AA2234_1.fastq.gz AA2234_2.fastq.gz AA2234_forward_paired.fq.gz AA2234_forward_unpaired.fq.gz AA2234_reverse_paired.fq.gz AA2234_reverse_unpaired.fq.gz SLIDINGWINDOW:4:20 MINLEN:20;
done
另外我希望輸出名稱以輸入檔案的共享名稱命名,在這種情況下是AA2234和AA3945
此腳本的所需輸出應該是 8 個檔案,對應于對命名:
AA2234_forward_paired.fq.gz
AA2234_forward_unpaired.fq.gz
AA2234_reverse_paired.fq.gz
AA2234_reverse_unpaired.fq.gz
和
AA3945_forward_paired.fq.gz
AA3945_forward_unpaired.fq.gz
AA3945_reverse_paired.fq.gz
AA3945_reverse_unpaired.fq.gz
uj5u.com熱心網友回復:
假設檔案名不包含空格,請嘗試:
#!/bin/bash
declare -A hash # associative array to tie basename with files
for f in *fastq.gz; do # search the files with the suffix
base=${f%_*} # remove after "_"
if [[ -z ${hash[$base]} ]]; then # if the variable is not defined
hash[$base]=$f # then store the filename
else
hash[$base] =" $f" # else append the filenmame delimited by the whitespace
fi
done
for base in "${!hash[@]}"; do # loop over the hash keys (basename)
read -r f1 f2 <<< "${hash[$base]}" # split into filenames
echo java -jar /home/aa/git/trimmomatic/src/Trimmomatic/trimmomatic-0.39.jar PE -phred33 "$f1" "$f2" "$base"_forward_paired.fq.gz "$base"_forward_unpaired.fq.gz "$base"_reverse_paired.fq.gz "$base"_reverse_unpaired.fq.gz SLIDINGWINDOW:4:20 MINLEN:20;
done
該腳本將 java 命令列輸出為空運行。如果輸出看起來不錯,請放下echo并運行。
uj5u.com熱心網友回復:
#!/bin/bash
declare -A assoc=()
shopt -s nullglob
for f in *_?.fastq.gz; do
base=${f%_*}
assoc[$base]=${assoc[$base]-}${assoc[$base] }$f
done
set -f
for pair in "${assoc[@]}"; do
set -- $pair
# TODO: Check $# and do something with $1 and $2
done
uj5u.com熱心網友回復:
迭代成對引數的一種方法:
#!/usr/bin/env sh
proc_fastq_pairs() {
# loop while there are fastq files passed as argument
while [ $# -gt 0 ]; do
fq1=$1
# consume 1 argument as file 1
shift
fq2=$1
# consume 1 argument as file 2
shift
initialize_of_command "$fq1" "$fq2" output_a output_b output_c output_d parameteres
done
}
initialize_of_command() {
# dummy command to show passed arguments for debug purpose
printf 'initialize_of_command %s\n' "$*"
}
# Expansion of the globbing pattern ./*.fastq.gz
# is always sorted alphabetically.
# It ensures all similarly named files are kept
# togaether fq1 fq2 ...
proc_fastq_pairs ./*.fastq.gz
或者xargs:
printf '%s\n' ./*.fastq.gz | xargs -L 2 bash -c 'initialize_of_command "$1" "$2" output_a output_b output_c output_d parameteres' _
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/405649.html
標籤:
上一篇:根據情節選擇復制檔案
