我有兩個串列,其中包含 PWD 中所有檔案的絕對路徑。
我通過使用生成了這個串列find "$(pwd)" -type f
清單 1:
/home/ec2-user/eclipsebio_toolkit/scripts/get_gene_counts_for_miRNA_specific_chimeric.py
/home/ec2-user/eclipsebio_toolkit/scripts/get_gene_info_sample_comparisons.py
/home/ec2-user/eclipsebio_toolkit/scripts/get_miRNA_counts_for_gene_specific_chimeric.py
/home/ec2-user/eclipsebio_toolkit/scripts/get_mrna_lengths.py
/home/ec2-user/eclipsebio_toolkit/scripts/get_nonnegative_peaks.py
/home/ec2-user/eclipsebio_toolkit/scripts/get_peak_gene_ids_chimeric.py
/home/ec2-user/eclipsebio_toolkit/scripts/get_peak_gene_ids_w_output.py
/home/ec2-user/eclipsebio_toolkit/scripts/get_peaks_not_sig.py
/home/ec2-user/eclipsebio_toolkit/scripts/get_peaks_overlapping_chimeric_reads.py
/home/ec2-user/eclipsebio_toolkit/scripts/create_ribo_html_report.py
清單 2:
/home/ec2-user/snakemake_eclip/scripts/count_reads_broadfeatures_frombamfi_SRmap.pl
/home/ec2-user/snakemake_eclip/scripts/create_html_report.py
/home/ec2-user/snakemake_eclip/scripts/create_idr_html_report.py
/home/ec2-user/snakemake_eclip/scripts/create_mapped_read_num.py
/home/ec2-user/snakemake_eclip/scripts/create_metagene_plot_from_saturations.py
/home/ec2-user/snakemake_eclip/scripts/create_metagene_plot_from_saturations_peaks.py
/home/ec2-user/snakemake_eclip/scripts/create_metagene_plot_from_saturations_reads.py
/home/ec2-user/snakemake_eclip/scripts/create_peak_norm_manifests.py
/home/ec2-user/snakemake_eclip/scripts/create_pureclip_html_report.py
/home/ec2-user/snakemake_eclip/scripts/create_ribo_html_report.py
我想在這兩個串列之間找到重復的檔案,然后洗掉僅在串列 1 中找到的重復專案(rm從磁盤)。
我曾嘗試使用awk 'NR == FNR{ a[$0] = 1;next } !a[$0]' list1 list2洗掉僅在串列 1 中找到的專案,但這并未考慮絕對路徑。
uj5u.com熱心網友回復:
awk -F'/' 'NR==FNR{ a[$NF]; next } $NF in a' file2 file1 | xargs rm
始終將上述key in a成語用于腳本的第二部分,而不是a[key]像您嘗試做的那樣:
awk 'NR == FNR{ a[$0] = 1;next } !a[$0]' list1 list2
因為您所做的是通過存盤1第一個檔案 () 中存在的每個密鑰來浪費周期和記憶體a[$0] = 1,然后通過存盤第二個檔案中的每個密鑰來浪費更多記憶體!a[key]。
這不需要您在填充時分配任何值,a[key]并通過哈希查找測驗是否key存在,a而無需添加任何其他內容a[]:
key in a
!(key in a)
雖然這確實需要您最初分配一個非零值a[key],然后下面的代碼對keyin進行哈希查找,a[]如果該鍵不存在,則將一個條目添加到a[]indexed bykey然后測驗該條目的值是否為非-零:
a[key]
!a[key]
所以不要這樣做,因為這是浪費時間和記憶。
uj5u.com熱心網友回復:
您的 awk 腳本幾乎是正確的,
awk -F / 'NR == FNR{ a[$NF] = 1;next } a[$NF]' file2 file1
uj5u.com熱心網友回復:
awk -F '/' '{print $NF}' list1 list2 \
| sort \
| uniq -d \
| xargs -I {} echo rm -v /home/ec2-user/eclipsebio_toolkit/scripts/{}
輸出:
rm -v /home/ec2-user/eclipsebio_toolkit/scripts/create_ribo_html_report.py
如果輸出看起來不錯,請洗掉echo.
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/441507.html
下一篇:比較兩個檔案名并提取差異
