請閱讀此問題的最新編輯。
問題:我需要寫一個正確的基準來比較不同的作業使用不同的執行緒池使用的實作(也來自外部庫)執行不同的方法到其他作業使用其它執行緒池的實作,并為作業而沒有任何執行緒。
例如,我有 24 個任務要完成,在基準狀態下有 10000 個隨機字串:
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Fork(1)
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 3)
@Measurement(iterations = 3)
@State(Scope.Benchmark)
public class ThreadPoolSamples {
@Param({"24"})
int amountOfTasks;
private static final int tts = Runtime.getRuntime().availableProcessors() * 2;
private String[] strs = new String[10000];
@Setup
public void setup() {
for (int i = 0; i < strs.length; i ) {
strs[i] = String.valueOf(Math.random());
}
}
}
以及作為內部類的兩個狀態,表示作業(字串連接)和 ExecutorService 設定和關閉:
@State(Scope.Thread)
public static class Work {
public String doWork(String[] strs) {
StringBuilder conc = new StringBuilder();
for (String str : strs) {
conc.append(str);
}
return conc.toString();
}
}
@State(Scope.Benchmark)
public static class ExecutorServiceState {
ExecutorService service;
@Setup(Level.Iteration)
public void setupMethod() {
service = Executors.newFixedThreadPool(tts);
}
@TearDown(Level.Iteration)
public void downMethod() {
service.shutdownNow();
service = null;
}
}
More strict question is: How to write correct benchmark to measure average time of doWork(); first: without any threading, second: using .execute() method and third: using .submit() method getting results of futures later. Implementation that I tried to wrote:
@Benchmark
public void noThreading(Work w, Blackhole bh) {
for (int i = 0; i < amountOfTasks; i ) {
bh.consume(w.doWork(strs));
}
}
@Benchmark
public void executorService(ExecutorServiceState e, Work w, Blackhole bh) {
for (int i = 0; i < amountOfTasks; i ) {
e.service.execute(() -> bh.consume(w.doWork(strs)));
}
}
@Benchmark
public void noThreadingResult(Work w, Blackhole bh) {
String[] strss = new String[amountOfTasks];
for (int i = 0; i < amountOfTasks; i ) {
strss[i] = w.doWork(strs);
}
bh.consume(strss);
}
@Benchmark
public void executorServiceResult(ExecutorServiceState e, Work w, Blackhole bh) throws ExecutionException, InterruptedException {
Future[] strss = new Future[amountOfTasks];
for (int i = 0; i < amountOfTasks; i ) {
strss[i] = e.service.submit(() -> {return w.doWork(strs);});
}
for (Future future : strss) {
bh.consume(future.get());
}
}
After benchmarking this implementation on my PC (2 Cores, 4 threads) I got:
Benchmark (amountOfTasks) Mode Cnt Score Error Units
ThreadPoolSamples.executorService 24 avgt 3 255102,966 ± 4460279,056 ns/op
ThreadPoolSamples.executorServiceResult 24 avgt 3 19790020,180 ± 7676762,394 ns/op
ThreadPoolSamples.noThreading 24 avgt 3 18881360,497 ± 340778,773 ns/op
ThreadPoolSamples.noThreadingResult 24 avgt 3 19283976,445 ± 471788,642 ns/op
noThreading and executorService maybe correct (but i am still unsure) and noThreadingResult and executorServiceResult doesn't look correct at all.
EDIT:
I find out some new details, but i think the result is still incorrect: as answered user17280749 in this answer that the thread pool wasn't waiting for submitted tasks to complete, but there wasn't only one issue: javac also somehow optimises doWork() method in the Work class (prob the result of that operation was predictable by JVM), so for simplicity I used Thread.sleep() as "work" and also setted amountOfTasks new two params: "1" and "128" to demonstrate that on 1 task threading will be slower than noThreading, and 24 and 128 will be approx. four times faster than noThreading, also to the correctness of measurement I setted thread pools starting up and shutting down in benchmark:
package io.denery;
import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.infra.Blackhole;
import java.util.concurrent.*;
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Fork(1)
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 3)
@Measurement(iterations = 3)
@State(Scope.Benchmark)
public class ThreadPoolSamples {
@Param({"1", "24", "128"})
int amountOfTasks;
private static final int tts = Runtime.getRuntime().availableProcessors() * 2;
@State(Scope.Thread)
public static class Work {
public void doWork() {
try {
Thread.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
@Benchmark
public void noThreading(Work w) {
for (int i = 0; i < amountOfTasks; i ) {
w.doWork();
}
}
@Benchmark
public void fixedThreadPool(Work w)
throws ExecutionException, InterruptedException {
ExecutorService service = Executors.newFixedThreadPool(tts);
Future[] futures = new Future[amountOfTasks];
for (int i = 0; i < amountOfTasks; i ) {
futures[i] = service.submit(w::doWork);
}
for (Future future : futures) {
future.get();
}
service.shutdown();
}
@Benchmark
public void cachedThreadPool(Work w)
throws ExecutionException, InterruptedException {
ExecutorService service = Executors.newCachedThreadPool();
Future[] futures = new Future[amountOfTasks];
for (int i = 0; i < amountOfTasks; i ) {
futures[i] = service.submit(() -> {
w.doWork();
});
}
for (Future future : futures) {
future.get();
}
service.shutdown();
}
}
And the result of this benchmark is:
Benchmark (amountOfTasks) Mode Cnt Score Error Units
ThreadPoolSamples.cachedThreadPool 1 avgt 3 1169075,866 ± 47607,783 ns/op
ThreadPoolSamples.cachedThreadPool 24 avgt 3 5208437,498 ± 4516260,543 ns/op
ThreadPoolSamples.cachedThreadPool 128 avgt 3 13112351,066 ± 1905089,389 ns/op
ThreadPoolSamples.fixedThreadPool 1 avgt 3 1166087,665 ± 61193,085 ns/op
ThreadPoolSamples.fixedThreadPool 24 avgt 3 4721503,799 ± 313206,519 ns/op
ThreadPoolSamples.fixedThreadPool 128 avgt 3 18337097,997 ± 5781847,191 ns/op
ThreadPoolSamples.noThreading 1 avgt 3 1066035,522 ± 83736,346 ns/op
ThreadPoolSamples.noThreading 24 avgt 3 25525744,055 ± 45422,015 ns/op
ThreadPoolSamples.noThreading 128 avgt 3 136126357,514 ± 200461,808 ns/op
We see that error doesn't really huge, and thread pools with task 1 are slower than noThreading, but if you compare 25525744,055 and 4721503,799 the speedup is: 5.406 and it is faster somehow than excpected ~4, and if you compare 136126357,514 and 18337097,997 the speedup is: 7.4, and this fake speedup is growing with amountOfTasks, and i think it is still incorrect. I think to look at this using PrintAssembly to find out is there are any JVM optimisations.
EDIT:
As mentioned user17294549 in this answer, I used Thread.sleep() as imitation of real work and it doesn't correct because:
for real work: only 2 tasks can run simultaneously on a 2-core system for Thread.sleep(): any number of tasks can run simultaneously on a 2-core system
我記得 Blackhole.consumeCPU(long tokens) JMH 方法“燃燒周期”和模仿作品,有JMH 示例和檔案。所以我將作業改為:
@State(Scope.Thread)
public static class Work {
public void doWork() {
Blackhole.consumeCPU(4096);
}
}
以及此更改的基準:
Benchmark (amountOfTasks) Mode Cnt Score Error Units
ThreadPoolSamples.cachedThreadPool 1 avgt 3 301187,897 ± 95819,153 ns/op
ThreadPoolSamples.cachedThreadPool 24 avgt 3 2421815,991 ± 545978,808 ns/op
ThreadPoolSamples.cachedThreadPool 128 avgt 3 6648647,025 ± 30442,510 ns/op
ThreadPoolSamples.cachedThreadPool 2048 avgt 3 60229404,756 ± 21537786,512 ns/op
ThreadPoolSamples.fixedThreadPool 1 avgt 3 293364,540 ± 10709,841 ns/op
ThreadPoolSamples.fixedThreadPool 24 avgt 3 1459852,773 ± 160912,520 ns/op
ThreadPoolSamples.fixedThreadPool 128 avgt 3 2846790,222 ± 78929,182 ns/op
ThreadPoolSamples.fixedThreadPool 2048 avgt 3 25102603,592 ± 1825740,124 ns/op
ThreadPoolSamples.noThreading 1 avgt 3 10071,049 ± 407,519 ns/op
ThreadPoolSamples.noThreading 24 avgt 3 241561,416 ± 15326,274 ns/op
ThreadPoolSamples.noThreading 128 avgt 3 1300241,347 ± 148051,168 ns/op
ThreadPoolSamples.noThreading 2048 avgt 3 20683253,408 ± 1433365,542 ns/op
我們看到 fixedThreadPool 在某種程度上比沒有執行緒的示例慢,并且當 amountOfTasks 較大時,fixedThreadPool 和 noThreading 示例之間的差異更小。里面發生了什么?我在這個問題的開頭看到了與字串連接相同的現象,但我沒有報告。(順便說一句,感謝誰讀了這本小說并試圖回答這個問題,你真的幫了我)
uj5u.com熱心網友回復:
這是我在我的機器上得到的(也許這可以幫助您了解問題所在):
這是基準(我稍微修改了它):
package io.denery;
import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.infra.Blackhole;
import org.openjdk.jmh.Main;
import java.util.concurrent.*;
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Fork(1)
@Threads(1)
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 5)
@Measurement(iterations = 5)
@State(Scope.Benchmark)
public class ThreadPoolSamples {
@Param({"1", "24", "128"})
int amountOfTasks;
private static final int tts = Runtime.getRuntime().availableProcessors() * 2;
private static void doWork() {
Blackhole.consumeCPU(4096);
}
public static void main(String[] args) throws Exception {
Main.main(args);
}
@Benchmark
public void noThreading() {
for (int i = 0; i < amountOfTasks; i ) {
doWork();
}
}
@Benchmark
public void fixedThreadPool(Blackhole bh) throws Exception {
runInThreadPool(amountOfTasks, bh, Executors.newFixedThreadPool(tts));
}
@Benchmark
public void cachedThreadPool(Blackhole bh) throws Exception {
runInThreadPool(amountOfTasks, bh, Executors.newCachedThreadPool());
}
private static void runInThreadPool(int amountOfTasks, Blackhole bh, ExecutorService threadPool)
throws Exception {
Future<?>[] futures = new Future[amountOfTasks];
for (int i = 0; i < amountOfTasks; i ) {
futures[i] = threadPool.submit(ThreadPoolSamples::doWork);
}
for (Future<?> future : futures) {
bh.consume(future.get());
}
threadPool.shutdownNow();
threadPool.awaitTermination(5, TimeUnit.MINUTES);
}
}
規格和版本:
JMH version: 1.33
VM version: JDK 17.0.1, OpenJDK 64-Bit Server
Linux 5.14.14
CPU: Intel(R) Core(TM) i5-2320 CPU @ 3.00GHz, 4 Cores, No Hyper-Threading
結果:
Benchmark (amountOfTasks) Mode Cnt Score Error Units
ThreadPoolSamples.cachedThreadPool 1 avgt 5 92968.252 ± 2853.687 ns/op
ThreadPoolSamples.cachedThreadPool 24 avgt 5 547558.977 ± 88937.441 ns/op
ThreadPoolSamples.cachedThreadPool 128 avgt 5 1502909.128 ± 40698.141 ns/op
ThreadPoolSamples.fixedThreadPool 1 avgt 5 97945.026 ± 435.458 ns/op
ThreadPoolSamples.fixedThreadPool 24 avgt 5 643453.028 ± 135859.966 ns/op
ThreadPoolSamples.fixedThreadPool 128 avgt 5 998425.118 ± 126463.792 ns/op
ThreadPoolSamples.noThreading 1 avgt 5 10165.462 ± 78.008 ns/op
ThreadPoolSamples.noThreading 24 avgt 5 245942.867 ± 10594.808 ns/op
ThreadPoolSamples.noThreading 128 avgt 5 1302173.090 ± 5482.655 ns/op
uj5u.com熱心網友回復:
請參閱此問題的答案以了解如何在 Java 中撰寫基準測驗。
... executorService 可能是正確的(但我仍然不確定)...
Benchmark (amountOfTasks) Mode Cnt Score Error Units ThreadPoolSamples.executorService 24 avgt 3 255102,966 ± 4460279,056 ns/op
它看起來不像一個正確的結果:錯誤4460279,056比基值大 17 倍255102,966。
你還有一個錯誤:
@Benchmark
public void executorService(ExecutorServiceState e, Work w, Blackhole bh) {
for (int i = 0; i < amountOfTasks; i ) {
e.service.execute(() -> bh.consume(w.doWork(strs)));
}
}
您將任務提交給ExecutorService,但不等待它們完成。
uj5u.com熱心網友回復:
看看這段代碼:
@TearDown(Level.Iteration)
public void downMethod() {
service.shutdownNow();
service = null;
}
您不會等待執行緒停止。閱讀檔案了解詳細資訊。
因此,您的某些基準測驗可能會與cachedThreadPool之前基準測驗中產生的另外 128 個執行緒并行運行。
所以為了簡單起見,我使用 Thread.sleep() 作為“作業”
你確定嗎?
實際作業和Thread.sleep()以下有很大區別:
- 對于實際作業:只有 2 個任務可以在 2 核系統上同時運行
- for
Thread.sleep():任意數量的任務可以在 2 核系統上同時運行
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/349087.html
