我有一個帶有 100K 鍵(主)的二維散列,像這樣,我需要獲取主鍵 - 只有在滿足特定條件時才獲取水果的名稱;
喜歡 - 如果價格在 35 到 55 之間;所需的輸出是Orange 和 Grape。
并且有一個唯一價格范圍的串列(數百個),我需要每個范圍內的水果串列。
為每個價格范圍一次又一次地遍歷哈希需要很多時間。有沒有一種方法可以快速完成,而不是遍歷每個價格范圍的整個哈希?
哈希格式:
$fruits{"Mango"}{color}=Yellow
$fruits{"Mango"}{price}=80
$fruits{"Orange"}{color}=Orange
$fruits{"Orange"}{price}=40
$fruits{"Grape"}{color}=Green
$fruits{"Grape"}{price}=50
uj5u.com熱心網友回復:
下面是一個示例,說明如何通過按數字順序對價格進行排序來對水果進行單次掃描。這應該比為每個價格范圍掃描一次整個哈希更快:
package Main;
use v5.20.0;
use feature qw(say);
use strict;
use warnings;
use experimental qw(signatures);
{
my %fruits;
$fruits{Mango}{color} = "Yellow";
$fruits{Mango}{price} = 80;
$fruits{Orange}{color} = "Orange";
$fruits{Orange}{price} = 40;
$fruits{Grape}{color} = "Green";
$fruits{Grape}{price} = 50;
my @ranges = ( [35, 55], [45, 55], [2, 85] );
my $self = Main->new(
fruits => \%fruits,
ranges => \@ranges
);
$self->init_mapping_arrays();
my $names = $self->get_fruit_names_for_all_ranges();
}
sub init_mapping_arrays( $self ) {
my @prices;
my @names;
for my $fruit (keys %{ $self->{fruits} }) {
push @names, $fruit;
push @prices, $self->{fruits}{$fruit}{price};
}
my @idx = map { $_->[0] }
sort { $a->[1] <=> $b->[1] } map { [$_, $prices[$_]] } 0..$#prices;
$self->{prices} = [@prices[@idx]];
$self->{names} = [@names[@idx]];
}
sub get_fruit_names_for_all_ranges ($self) {
my @names;
my $prices = $self->{prices};
my $ranges = $self->{ranges};
for my $i (0..$#$prices) {
for my $range (0..$#$ranges) {
if ( ($ranges->[$range][0] <= $prices->[$i])
&& ($ranges->[$range][1] >= $prices->[$i]))
{
push @{$names[$range]}, $self->{names}[$i];
}
}
}
return \@names;
}
sub new( $class, %args ) { bless \%args, $class }
如果這還不夠快,get_fruit_names_for_all_ranges()還可以通過對范圍進行排序來進一步優化 sub。
uj5u.com熱心網友回復:
如果對水果進行排序,則兩次二分搜索將很快找到水果。
sub search_cmp
my @fruits = (
{ name => "Orange", price => 40, ... },
...
);
my @ranges = (
[ 35, 55 ],
...
);
my @sorted_fruits = sort { $a->{price} <=> $b->{price} } @fruits;
for my $range (@ranges) {
my $i = binsearch { $a <=> $b->{price} } $range[0], @sorted_fruits, 0;
$i = ~$i if $i < 0;
my $j = binsearch { $a <=> $b->{price} } $range[1], @sorted_fruits, $i;
$j = ~$j - 1 if $j < 0;
say "[$range->{min}, $range->{max}]: @fruits[$i..$j]";
}
sub _unsigned_to_signed { unpack('j', pack('J', $_[0])) }
sub binsearch(&$\@;$$) {
my $compare = $_[0];
#my $value = $_[1];
my $array = $_[2];
my $min = $_[3] // 0;
my $max = $_[4] // $#$array;
return -1 if $max == -1;
my $ap = do { no strict 'refs'; \*{caller().'::a'} }; local *$ap;
my $bp = do { no strict 'refs'; \*{caller().'::b'} }; local *$bp;
*$ap = \($_[1]);
while ($min <= $max) {
my $mid = int(($min $max)/2);
*$bp = \($array->[$mid]);
my $cmp = $compare->()
or return $mid;
if ($cmp < 0) {
$max = $mid - 1;
} else {
$min = $mid 1;
}
}
return _unsigned_to_signed(~$min);
}
性能分析
最好的最壞情況是 O(R * F),因為每個范圍都可以匹配每個水果。
The naive approach described by the OP asked to replace is O(R * F). So is it as fast as it can be? No, because the naive approach always adopts its worse case.
In practice, if we can assume that each range matches just a few fruits, we can get far far better results on average from the above: O( ( F R ) log F )
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/330337.html
上一篇:外部注入的全域變數
