我正在尋找 Haskell 上 MergeSort 的規范實作以移植到HOVM,我找到了這個StackOverflow 答案。在移植演算法時,我意識到有些事情看起來很愚蠢:該演算法有一個“減半”函式,它只會在遞回和合并之前使用一半的長度將串列一分為二。所以我想:為什么不更好地利用這個傳球,并使用一個樞軸,讓每一半分別比那個樞軸更小和更大呢?這將增加遞回合并呼叫應用于已排序串列的幾率,這可能會加速演算法!
我已經完成了這個更改,產生了以下代碼:
import Data.List
import Data.Word
randomList :: Word32 -> Word32 -> [Word32]
randomList seed 0 = []
randomList seed size = seed : randomList (seed * 1664525 1013904223) (size - 1)
quacksort :: [Word32] -> [Word32]
quacksort [] = []
quacksort [x] = [x]
quacksort (p : x : xs) = split p (p : x : xs) [] [] where
-- Splits the list in two halves of elements smaller/bigger than a pivot
split p [] as bs = merge (quacksort as) (quacksort bs)
split p (x : xs) as bs = quack p (p < x) x xs as bs
-- Helper function for `split`
quack p False x xs as bs = split p xs (x : as) bs
quack p True x xs as bs = split p xs as (x : bs)
-- Merges two lists as a sorted one
merge [] ys = ys
merge xs [] = xs
merge (x : xs) (y : ys) = place (x < y) x xs y ys
-- Helper function for `merge`
place False x xs y ys = y : merge (x : xs) ys
place True x xs y ys = x : merge xs (y : ys)
main :: IO ()
main = do
let l = randomList 0 2000000
let b = quacksort l
print $ sum b
I then benchmarked it and, to my surprise, it was, indeed, 2x faster than Haskell's official Data.List sort. So I wondered why this isn't used in practice, and, suddenly, I realized the obvious: mergesort does NOT perform better on already sorted lists. D'oh. So the whole assumption behind quacksort was failed. Not only that, it would perform terribly for reversely sorted lists, since it would fail to produce two halves of similar size (except if we could guess a really good pivot). So, quacksort is wack in all cases and should never be used in practice. But, then...
Why the hell does it perform 2x faster than Data.List's sort for random lists?
I can't think of a good reason this should be the case. Making each half smaller/bigger than a pivot doesn't change how many times the merge call must be called, so it shouldn't have any positive effect. But reverting it back to a conventional mergesort does make it 2x slower, so, for some reason, the ordered split helps.
uj5u.com熱心網友回復:
您split將串列分成兩個有序的部分,因此merge首先使用它的第一個引數,然后只生成完整的后半部分。換句話說,它相當于 在前半部分進行冗余比較,結果總是如此True。
在真正的合并排序中,合并實際上對隨機資料做了兩倍的作業,因為這兩個部分沒有排序。
盡管split在磁區上花費了一些作業,而在線自下而上的歸并排序根本不會在那里花費任何作業。但是內置排序試圖檢測輸入中的有序運行,顯然額外的作業是不可忽略的。
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/422804.html
標籤:
