Perl比較兩個檔案并列印出現-有解無憂

全部，

我有 2 個檔案 - SC 和 ID。在 ID 中，我只有 2 列由空格分隔。在 SC 中，有更多列，但 ID 中可能存在一對。

例如

ID  

chain_0 123
chain_1 456
chain_2 789

SC  

chain_0 123 toronto ontario canada
chain_1 456 toronto New Delhi India 
chain_2 789 housing_crisis mortgage_rates first_time_buyers miserable

不，我想在 SC 中列印與 ID 中的對匹配的行。我試過跟隨，但這不起作用。

open(ID, '<', $id) or die $!;

while(<ID>){
   my @array = split ' ', $_;
   $output = `awk '\$1 ~ /\<$array[0]\>/' scan_cells | awk '\$2 ~ /\<$array[1]\>/'` ;
   print "$output";
}

close(ID);

謝謝！！1

uj5u.com熱心網友回復：

一種方法是使用grep和，通過bash行程替換和sed，對行進行一些按摩，ID將它們轉換為僅在行首匹配的正則運算式：

grep -f <(sed 's/^/^/; s/[[:space:]]/[[:space:]]/; s/$/[[:space:]]/' ID) SC

并在perl：

#!/usr/bin/env perl
use strict;
use warnings;

# Takes the files as command line arguments
my ($id_file, $sc_file) = @ARGV;

my %ids;

open my $ID, "<", $id_file or die "Unable to open $id_file: $!\n";
while (<$ID>) {
    # Just in case there's a tab instead of a single space between columns
    $_ = join(" ", split);
    $ids{$_} = 1;
}
close $ID;

open my $SC, "<", $sc_file or die "Unable to open $sc_file: $!\n";
while (<$SC>) {
    my @cols = split;
    print if exists $ids{"@cols[0,1]"};
}
close $SC;

這里的想法是將的每一行ID作為鍵存盤在哈希表中，然后對于的每一行SC，查看前兩列是否作為該表中的鍵存在，如果存在，則列印它。

但是，相同的方法可以在中更簡潔地完成awk：

awk 'FNR == NR { ids[$1,$2] = 1; next }
     ($1,$2) in ids' ID SC

uj5u.com熱心網友回復：

在 Perl 程式中使用 awk 幾乎總是一個錯誤。無論您使用 awk 做什么，您都可以在 Perl 中更輕松地完成。

以下是我處理您的問題的方法。創建一個哈希，其中鍵是 ID，值是某個真值（1 最簡單）。然后遍歷 SC 檔案并僅在行的開頭與哈希中的鍵匹配時列印。

像這樣的東西：

#!/usr/bin/perl

# Always :-)
use strict;
use warnings;

# Open the id file
open my $id, '<', 'id' or die $!;

# Read the ids in to an array
chomp( my @ids = <$id> );

# Convert the array into a hash
my %id = map { $_ => 1 } @ids;

# Read a line at a time from the file
# given on the command line.
while (<>) {
  # split the line into fields (on whitespace)
  my @data = split;
  # Print only if the first two fields match
  # a record in %id
  print if $id{"$data[0] $data[1]"};
}

這會硬編碼 ID 檔案的名稱，但您在命令列上傳遞 SD 檔案的名稱。如果你呼叫這個程式idfilter，那么你會像這樣運行它：

$ ./idfilter sc

uj5u.com熱心網友回復：

假設檔案 SC 中感興趣的列也是前 2 列，并且欄位分隔符（空格）相同，則可以將檔案 ID 中的整行存盤在陣列中a[$0]

在處理第二個檔案時，檢查第 1 列（由第 2 列與輸出欄位分隔符連接）是否出現在包含來自檔案 ID 的所有條目的陣列中。

awk 'FNR == NR{a[$0]; next} $1 OFS $2 in a' ID SC

測驗檔案內容：

$ cat ID
chain_0 123
chain_1 456
chain_2 789
chain_9 999

$cat SC
chain_0 123 toronto ontario canada
chain_1 456 toronto New Delhi India
chain_2 789 housing_crisis mortgage_rates first_time_buyers miserable
chain_3 999 housing_crisis mortgage_rates first_time_buyers miserable

輸出

chain_0 123 toronto ontario canada
chain_1 456 toronto New Delhi India
chain_2 789 housing_crisis mortgage_rates first_time_buyers miserable

如果輸出欄位分隔符不同，您還可以使用多維陣列：

awk 'FNR==NR{a[$1, $2];next} 
{
  for (pair in a) {
    split(pair, sep, SUBSEP);
    if ($1 == sep[1] && $2 == sep[2]) print
  }
}
' ID SC

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/455988.html

標籤：重击 perl awk 比较

上一篇：在14位計數后洗掉每個數字

下一篇：圓形射幫#！（太多級別的符號鏈接）