按值而不是索引對Perl陣列進行切片的慣用方法-有解無憂

我有一段代碼從兩個大的排序整數陣列之一中提取一個切片，代表程式作業流中的停止點。我將在這里包括兩者之一。

基本思想是我試圖從這個大陣列中切出一個作業范圍，作為這個程式作業的起點。$min從任務物件中拉出，代表任務的當前進度。$limit是一個可選的用戶覆寫，默認為-1（被忽略）。

目前，我正在使用CPAN 模塊中的firstidx函式List::MoreUtils來檢索開始和結束的索引，然后我使用它們以@steps通常的方式對陣列進行切片。有沒有更快和/或更慣用的方法來做到這一點？特別是，有沒有一種好方法可以通過直接使用$min和$limit值（對于這種$limit == -1情況使用另一個代碼路徑）來做到這一點？

這是代碼：

my @steps = (
    0, 1, 5, 10,
    20, 30, 40, 50, 60, 70, 80, 90, 100,
    200, 300, 400, 500, 600, 700, 800, 900, 1000,
    1200, 1400, 1600, 1800, 2000,
    2500, 3000, 3500, 4000, 4500, 5000,
    6000, 7000, 8000, 9000, 10000,
    11000, 12000, 13000, 14000, 15000, 17500, 20000,
    25000, 30000, 35000, 40000, 45000, 50000,
    60000, 70000, 80000, 90000, 100000
);
my $min_index = firstidx { $_ > $min } @steps;
my $max_index;
if ($limit == -1) {
    $max_index = @steps - 1;
} else {
    $max_index = firstidx { $_ >= $limit } @steps;
}
my @steps_todo = @steps[ $min_index .. $max_index ];

uj5u.com熱心網友回復：

你可以用一個簡單的 foreach 回圈來做到這一點。流控制關鍵字控制何時開始或停止使用@steps陣列。

我將無限制情況更改為定義檢查，因為這樣更有效。

這不是最有效的演算法，但它是解決它的一種簡單的慣用方法。

use diagnostics; # Very verbose warnings
($min, $max, @out) = (1000, 9999);

@steps = (
    0, 1, 5, 10,
    20, 30, 40, 50, 60, 70, 80, 90, 100,
    200, 300, 400, 500, 600, 700, 800, 900, 1000,
    1200, 1400, 1600, 1800, 2000,
    2500, 3000, 3500, 4000, 4500, 5000,
    6000, 7000, 8000, 9000, 10000,
    11000, 12000, 13000, 14000, 15000, 17500, 20000,
    25000, 30000, 35000, 40000, 45000, 50000,
    60000, 70000, 80000, 90000, 100000
);


foreach (@steps) {
    next if $_ < $min;
    last if defined $max and $_ > $max;
    push @out, $_;
}


print "@out";

## output
## 1000 1200 1400 1600 1800 2000 2500 3000 3500 4000 4500 5000 6000 7000 8000 9000

這也是觸發器運算子的一個可能用例..。它看起來像串列范圍運算子，但在標量背景關系中它是觸發器。把它想象成cmp三態運算子，但用于任意運算式。它回傳一個值，該值定義了從“this”到“that”的范圍。稱之為擋板。看看這個樣本。有關詳細資訊，請參閱perlop 范圍運算子部分。運算元可以是任何回傳布林值的東西。

use Data::Dump qw/pp/;
($min, $max) = (1000, 9999);
foreach (@steps) {
    my $x = ($_ >= $min .. $_ < $max);
    printf "%-9s %s\n", $_, pp $x;
}

## Output
0         ""
1         ""
5         ""
10        ""
20        ""
30        ""
40        ""
50        ""
60        ""
70        ""
80        ""
90        ""
100       ""
200       ""
300       ""
400       ""
500       ""
600       ""
700       ""
800       ""
900       ""
1000      "1E0"
1200      "1E0"
1400      "1E0"
1600      "1E0"
1800      "1E0"
2000      "1E0"
2500      "1E0"
3000      "1E0"
3500      "1E0"
4000      "1E0"
4500      "1E0"
5000      "1E0"
6000      "1E0"
7000      "1E0"
8000      "1E0"
9000      "1E0"
10000     1
11000     2
12000     3
13000     4
14000     5
15000     6
17500     7
20000     8
25000     9
30000     10
35000     11
40000     12
45000     13
50000     14
60000     15
70000     16
80000     17
90000     18
100000    19

注意那些回傳“1E0”的值，因為這些值會觸發觸發器的復位。在真正的回圈中，過渡點是它終止的地方。您可以使用引數來選擇獲得“”、“1E0”或序列號的值集并相應地采取行動。

此版本為所需值提供序列號，然后開始超出“E0”重置值。

($_ >= $min .. $_ > $max);

...
800       ""
900       ""
1000      1
1200      2
...
8000      15
9000      16
10000     "17E0"
11000     "1E0"
12000     "1E0"
...

這是使用觸發器解決它的方法。這種方式的優點是一旦達到下限條件就不再進行評估。主要的缺點是它非常神秘，非 Perl 程式員可能無法理解。但這是盡可能緊湊和高效的。

foreach (@steps) {
    ($_ >= $min .. ($_ > $max and last)) or next;
    push @out, $_;
}

print "@out\n";
## 1000 1200 1400 1600 1800 2000 2500 3000 3500 4000 4500 5000 6000 7000 8000 9000

While the left flapper is false, the whole expression is false and we go to the next item. When the left flapper becomes true, the whole expression is now true and we go on and start checking the right flapper. While the right flapper is false (the whole expression is still true), we go on to save the item in @out. When the right flapper becomes true, we terminate the loop. (This is also where it would reset back to false.)

One more edit. :)

If there is no max, here's a good way to do it.


for (my $i = 0; $i <= $#steps; $i  ) {
    next unless $steps[$i] >= $min;
    @out = @steps[$i .. $#steps];
    # Or, if you don't need the original array anymore you can consume it 
    # and be a bit more efficient and save memory
    # @out = splice @steps, $i;
    last;

}

Great question!

uj5u.com熱心網友回復：

不一定是慣用的或高性能的，但您可以構造一個迭代器：

#!/usr/bin/env perl

use strict;
use warnings;

use feature 'say';
use List::MoreUtils qw( firstidx );

sub work_steps {
    my ($steps, $min, $max) = @_;

    my $start = firstidx { $_ > $min } @$steps;
    return sub {} if $start == -1;

    my $end;

    if ($max != -1) {
        $end = firstidx { $_ >= $max } @$steps;
        return sub {} if $end == -1;
    }
    else {
        $end = $#$steps;
    }

    return sub {} if $start > $end;

    my $k = $start;

    return sub {
        return unless ($k >= $start) && ($k <= $end);
        return $steps->[$k  ];
    }
}

sub main {
    my @steps = (
        0, 1, 5, 10,
        20, 30, 40, 50, 60, 70, 80, 90, 100,
        200, 300, 400, 500, 600, 700, 800, 900, 1000,
        1200, 1400, 1600, 1800, 2000,
        2500, 3000, 3500, 4000, 4500, 5000,
        6000, 7000, 8000, 9000, 10000,
        11000, 12000, 13000, 14000, 15000, 17500, 20000,
        25000, 30000, 35000, 40000, 45000, 50000,
        60000, 70000, 80000, 90000, 100000
    );

    my $it = work_steps(\@steps, 222, 9300);

    while (my $step = $it->()) {
        say $step;
    }
}

main();

這避免了額外陣列的構建，從而降低了記憶體壓力。如果您正在處理的串列足夠大，這可能很重要。

您可以傳遞迭代器而不是范圍。這可能會使程式更易于閱讀。

另一方面，在每次迭代中完成的額外作業也可能很重要。

uj5u.com熱心網友回復：

一種慣用的方法是使用 grep 來選擇范圍。這樣做的缺點是它會掃描整個陣列，如果陣列很大并且經常執行 grep，這可能是一個性能點。

由于串列已排序，性能的一種可能性（但肯定不會更短或更簡單）是使用 List::MoreUtils 中的二進制搜索函式來查找范圍的邊界。

轉載請註明出處，本文鏈接：https://www.uj5u.com/shujuku/366851.html

標籤：数组 perl 片

上一篇：Perl腳本中的正則運算式字串提取

下一篇：必須將基因組質量轉換為ASCII時，如何使用Regexs/進行搜索和替換？