PerlHash：如果值在范圍之間，則獲取鍵-有解無憂

我有一個帶有 100K 鍵（主）的二維散列，像這樣，我需要獲取主鍵 - 只有在滿足特定條件時才獲取水果的名稱；

喜歡 - 如果價格在 35 到 55 之間；所需的輸出是Orange 和 Grape。

并且有一個唯一價格范圍的串列（數百個），我需要每個范圍內的水果串列。

為每個價格范圍一次又一次地遍歷哈希需要很多時間。有沒有一種方法可以快速完成，而不是遍歷每個價格范圍的整個哈希？

哈希格式：

$fruits{"Mango"}{color}=Yellow

$fruits{"Mango"}{price}=80

$fruits{"Orange"}{color}=Orange

$fruits{"Orange"}{price}=40

$fruits{"Grape"}{color}=Green

$fruits{"Grape"}{price}=50

uj5u.com熱心網友回復：

下面是一個示例，說明如何通過按數字順序對價格進行排序來對水果進行單次掃描。這應該比為每個價格范圍掃描一次整個哈希更快：

package Main;
use v5.20.0;
use feature qw(say);
use strict;
use warnings;
use experimental qw(signatures);
{
    my %fruits;
    $fruits{Mango}{color}  = "Yellow";
    $fruits{Mango}{price}  = 80;
    $fruits{Orange}{color} = "Orange";
    $fruits{Orange}{price} = 40;
    $fruits{Grape}{color}  = "Green";
    $fruits{Grape}{price}  = 50;
    my @ranges = ( [35, 55], [45, 55], [2, 85] );
    my $self = Main->new(
        fruits => \%fruits,
        ranges => \@ranges
    );
    $self->init_mapping_arrays();
    my $names = $self->get_fruit_names_for_all_ranges();
}


sub init_mapping_arrays( $self ) {
    my @prices;
    my @names;
    for my $fruit (keys %{ $self->{fruits} }) {
        push @names, $fruit;
        push @prices, $self->{fruits}{$fruit}{price};
    }
    my @idx = map { $_->[0] }
      sort { $a->[1] <=> $b->[1] } map { [$_, $prices[$_]] } 0..$#prices;
    $self->{prices} = [@prices[@idx]];
    $self->{names} = [@names[@idx]];
}

sub get_fruit_names_for_all_ranges ($self) {
    my @names;
    my $prices = $self->{prices};
    my $ranges = $self->{ranges};
    for my $i (0..$#$prices) {
        for my $range (0..$#$ranges) {
            if (   ($ranges->[$range][0] <= $prices->[$i])
                && ($ranges->[$range][1] >= $prices->[$i]))
            {
                push @{$names[$range]}, $self->{names}[$i];
            }
        }
    }
    return \@names;
}

sub new( $class, %args ) { bless \%args, $class }

如果這還不夠快，get_fruit_names_for_all_ranges()還可以通過對范圍進行排序來進一步優化 sub。

uj5u.com熱心網友回復：

如果對水果進行排序，則兩次二分搜索將很快找到水果。

sub search_cmp 

my @fruits = (
   { name => "Orange", price => 40, ... },
   ...
);

my @ranges = (
   [ 35, 55 ],
   ...   
);

my @sorted_fruits = sort { $a->{price} <=> $b->{price} } @fruits;

for my $range (@ranges) {
   my $i = binsearch { $a <=> $b->{price} } $range[0], @sorted_fruits, 0;
   $i = ~$i if $i < 0;

   my $j = binsearch { $a <=> $b->{price} } $range[1], @sorted_fruits, $i;
   $j = ~$j - 1 if $j < 0;

   say "[$range->{min}, $range->{max}]: @fruits[$i..$j]";
}

sub _unsigned_to_signed { unpack('j', pack('J', $_[0])) }

sub binsearch(&$\@;$$) {
   my  $compare = $_[0];
   #my $value   = $_[1];
   my  $array   = $_[2];
   my  $min     = $_[3] // 0;
   my  $max     = $_[4] // $#$array;

   return -1 if $max == -1;

   my $ap = do { no strict 'refs'; \*{caller().'::a'} };  local *$ap;
   my $bp = do { no strict 'refs'; \*{caller().'::b'} };  local *$bp;

   *$ap = \($_[1]);
   while ($min <= $max) {
      my $mid = int(($min $max)/2);
      *$bp = \($array->[$mid]);

      my $cmp = $compare->()
         or return $mid;

      if ($cmp < 0) {
         $max = $mid - 1;
      } else {
         $min = $mid   1;
      }
   }

   return _unsigned_to_signed(~$min);
}

性能分析

最好的最壞情況是 O(R * F)，因為每個范圍都可以匹配每個水果。

The naive approach described by the OP asked to replace is O(R * F). So is it as fast as it can be? No, because the naive approach always adopts its worse case.

In practice, if we can assume that each range matches just a few fruits, we can get far far better results on average from the above: O( ( F R ) log F )

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/330337.html

標籤：循环 perl 散列文学编程

上一篇：外部注入的全域變數

下一篇：從第二個欄位中逐行搜索模式并列印出與找到模式的行相對應的第一個欄位