如何合理地估算線程池大小？ · java學習摘記

[如何合理地估算線程池大小？](http://ifeve.com/how-to-calculate-threadpool-size/) 感謝網友【[蔣小強](http://weibo.com/u/1761654130)】投稿。 **如何合理地估算線程池大小？** 這個問題雖然看起來很小，卻并不那么容易回答。大家如果有更好的方法歡迎賜教，先來一個天真的估算方法：假設要求一個系統的TPS（Transaction Per Second或者Task Per Second）至少為20，然后假設每個Transaction由一個線程完成，繼續假設平均每個線程處理一個Transaction的時間為4s。那么問題轉化為： **如何設計線程池大小，使得可以在1s內處理完20個Transaction？** 計算過程很簡單，每個線程的處理能力為0.25TPS，那么要達到20TPS，顯然需要20/0.25=80個線程。很顯然這個估算方法很天真，因為它沒有考慮到CPU數目。一般服務器的CPU核數為16或者32，如果有80個線程，那么肯定會帶來太多不必要的線程上下文切換開銷。再來第二種簡單的但不知是否可行的方法（N為CPU總核數）： * 如果是CPU密集型應用，則線程池大小設置為N+1 * 如果是IO密集型應用，則線程池大小設置為2N+1 如果一臺服務器上只部署這一個應用并且只有這一個線程池，那么這種估算或許合理，具體還需自行測試驗證。接下來在這個文檔：服務器性能IO優化中發現一個估算公式： | `1` | `最佳線程數目 = （（線程等待時間+線程CPU時間）/線程CPU時間）* CPU數目` | 比如平均每個線程CPU運行時間為0.5s，而線程等待時間（非CPU運行時間，比如IO）為1.5s，CPU核心數為8，那么根據上面這個公式估算得到：((0.5+1.5)/0.5)*8=32。這個公式進一步轉化為： | `1` | `最佳線程數目 = （線程等待時間與線程CPU時間之比 + 1）* CPU數目` | 可以得出一個結論： **線程等待時間所占比例越高，需要越多線程。線程CPU時間所占比例越高，需要越少線程。** 上一種估算方法也和這個結論相合。一個系統最快的部分是CPU，所以決定一個系統吞吐量上限的是CPU。增強CPU處理能力，可以提高系統吞吐量上限。但根據短板效應，真實的系統吞吐量并不能單純根據CPU來計算。那要提高系統吞吐量，就需要從“系統短板”（比如網絡延遲、IO）著手： * 盡量提高短板操作的并行化比率，比如多線程下載技術 * 增強短板能力，比如用NIO替代IO 第一條可以聯系到Amdahl定律，這條定律定義了串行系統并行化后的加速比計算公式： | `1` | `加速比=優化前系統耗時 / 優化后系統耗時` | 加速比越大，表明系統并行化的優化效果越好。Addahl定律還給出了系統并行度、CPU數目和加速比的關系，加速比為Speedup，系統串行化比率（指串行執行代碼所占比率）為F，CPU數目為N： | `1` | `Speedup <=?``1`?`/ (F + (``1``-F)/N)` | 當N足夠大時，串行化比率F越小，加速比Speedup越大。寫到這里，我突然冒出一個問題。 **是否使用線程池就一定比使用單線程高效呢？** 答案是否定的，比如Redis就是單線程的，但它卻非常高效，基本操作都能達到十萬量級/s。從線程這個角度來看，部分原因在于： * 多線程帶來線程上下文切換開銷，單線程就沒有這種開銷 * 鎖當然“Redis很快”更本質的原因在于：Redis基本都是內存操作，這種情況下單線程可以很高效地利用CPU。而多線程適用場景一般是：存在相當比例的IO和網絡操作。所以即使有上面的簡單估算方法，也許看似合理，但實際上也未必合理，都需要結合系統真實情況（比如是IO密集型或者是CPU密集型或者是純內存操作）和硬件環境（CPU、內存、硬盤讀寫速度、網絡狀況等）來不斷嘗試達到一個符合實際的合理估算值。最后來一個“Dark Magic”估算方法（因為我暫時還沒有搞懂它的原理），使用下面的類： | `001` | `package`?`pool_size_calculate;` | | `002` | ? | | `003` | `import`?`java.math.BigDecimal;` | | `004` | `import`?`java.math.RoundingMode;` | | `005` | `import`?`java.util.Timer;` | | `006` | `import`?`java.util.TimerTask;` | | `007` | `import`?`java.util.concurrent.BlockingQueue;` | | `008` | ? | | `009` | `/**` | | `010` | `* A class that calculates the optimal thread pool boundaries. It takes the` | | `011` | `* desired target utilization and the desired work queue memory consumption as` | | `012` | `* input and retuns thread count and work queue capacity.` | | `013` | `*` | | `014` | `* @author Niklas Schlimm` | | `015` | `*` | | `016` | `*/` | | `017` | `public`?`abstract`?`class`?`PoolSizeCalculator {` | | `018` | ? | | `019` | `/**` | | `020` | `* The sample queue size to calculate the size of a single {@link Runnable}` | | `021` | `* element.` | | `022` | `*/` | | `023` | `private`?`final`?`int`?`SAMPLE_QUEUE_SIZE =?``1000``;` | | `024` | ? | | `025` | `/**` | | `026` | `* Accuracy of test run. It must finish within 20ms of the testTime` | | `027` | `* otherwise we retry the test. This could be configurable.` | | `028` | `*/` | | `029` | `private`?`final`?`int`?`EPSYLON =?``20``;` | | `030` | ? | | `031` | `/**` | | `032` | `* Control variable for the CPU time investigation.` | | `033` | `*/` | | `034` | `private`?`volatile`?`boolean`?`expired;` | | `035` | ? | | `036` | `/**` | | `037` | `* Time (millis) of the test run in the CPU time calculation.` | | `038` | `*/` | | `039` | `private`?`final`?`long`?`testtime =?``3000``;` | | `040` | ? | | `041` | `/**` | | `042` | `* Calculates the boundaries of a thread pool for a given {@link Runnable}.` | | `043` | `*` | | `044` | `* @param targetUtilization` | | `045` | `*??????????? the desired utilization of the CPUs (0 <= targetUtilization <=?? *??????????? 1)???? * @param targetQueueSizeBytes?? *??????????? the desired maximum work queue size of the thread pool (bytes)???? */`?????`protected`?`void`?`calculateBoundaries(BigDecimal targetUtilization,??????????? BigDecimal targetQueueSizeBytes) {????? calculateOptimalCapacity(targetQueueSizeBytes);???????? Runnable task = creatTask();??????? start(task);??????? start(task);?``// warm up phase?????? long cputime = getCurrentThreadCPUTime();?????? start(task); // test intervall????? cputime = getCurrentThreadCPUTime() - cputime;????? long waittime = (testtime * 1000000) - cputime;???????? calculateOptimalThreadCount(cputime, waittime, targetUtilization);? }?? private void calculateOptimalCapacity(BigDecimal targetQueueSizeBytes) {??????? long mem = calculateMemoryUsage();????? BigDecimal queueCapacity = targetQueueSizeBytes.divide(new BigDecimal(????????????? mem), RoundingMode.HALF_UP);??????? System.out.println("Target queue memory usage (bytes): "??????????????? + targetQueueSizeBytes);??????? System.out.println("createTask() produced "???????????????? + creatTask().getClass().getName() + " which took " + mem?????????????? + " bytes in a queue");???????? System.out.println("Formula: " + targetQueueSizeBytes + " / " + mem);?????? System.out.println("* Recommended queue capacity (bytes): "???????????????? + queueCapacity);?? }?? /**????? * Brian Goetz' optimal thread count formula, see 'Java Concurrency in?? * Practice' (chapter 8.2)?? *?????? * @param cpu??? *??????????? cpu time consumed by considered task?? * @param wait?? *??????????? wait time of considered task?? * @param targetUtilization????? *??????????? target utilization of the system?? */???? private void calculateOptimalThreadCount(long cpu, long wait,?????????? BigDecimal targetUtilization) {???????? BigDecimal waitTime = new BigDecimal(wait);???????? BigDecimal computeTime = new BigDecimal(cpu);?????? BigDecimal numberOfCPU = new BigDecimal(Runtime.getRuntime()??????????????? .availableProcessors());??????? BigDecimal optimalthreadcount = numberOfCPU.multiply(targetUtilization)???????????????? .multiply(????????????????????? new BigDecimal(1).add(waitTime.divide(computeTime,????????????????????????????? RoundingMode.HALF_UP)));??????? System.out.println("Number of CPU: " + numberOfCPU);??????? System.out.println("Target utilization: " + targetUtilization);???????? System.out.println("Elapsed time (nanos): " + (testtime * 1000000));??????? System.out.println("Compute time (nanos): " + cpu);???????? System.out.println("Wait time (nanos): " + wait);?????? System.out.println("Formula: " + numberOfCPU + " * "??????????????? + targetUtilization + " * (1 + " + waitTime + " / "???????????????? + computeTime + ")");?????? System.out.println("* Optimal thread count: " + optimalthreadcount);??? }?? /**????? * Runs the {@link Runnable} over a period defined in {@link #testtime}.???? * Based on Heinz Kabbutz' ideas???? * ([http://www.javaspecialists.eu/archive/Issue124.html](http://www.javaspecialists.eu/archive/Issue124.html)).??? *?????? * @param task?? *??????????? the runnable under investigation?? */???? public void start(Runnable task) {????? long start = 0;???????? int runs = 0;?????? do {??????????? if (++runs > 5) {` | | `046` | `throw`?`new`?`IllegalStateException(``"Test not accurate"``);` | | `047` | `}` | | `048` | `expired =?``false``;` | | `049` | `start = System.currentTimeMillis();` | | `050` | `Timer timer =?``new`?`Timer();` | | `051` | `timer.schedule(``new`?`TimerTask() {` | | `052` | `public`?`void`?`run() {` | | `053` | `expired =?``true``;` | | `054` | `}` | | `055` | `}, testtime);` | | `056` | `while`?`(!expired) {` | | `057` | `task.run();` | | `058` | `}` | | `059` | `start = System.currentTimeMillis() - start;` | | `060` | `timer.cancel();` | | `061` | `}?``while`?`(Math.abs(start - testtime) > EPSYLON);` | | `062` | `collectGarbage(``3``);` | | `063` | `}` | | `064` | ? | | `065` | `private`?`void`?`collectGarbage(``int`?`times) {` | | `066` | `for`?`(``int`?`i =?``0``; i < times; i++) {` | | `067` | `System.gc();` | | `068` | `try`?`{` | | `069` | `Thread.sleep(``10``);` | | `070` | `}?``catch`?`(InterruptedException e) {` | | `071` | `Thread.currentThread().interrupt();` | | `072` | `break``;` | | `073` | `}` | | `074` | `}` | | `075` | `}` | | `076` | ? | | `077` | `/**` | | `078` | `* Calculates the memory usage of a single element in a work queue. Based on` | | `079` | `* Heinz Kabbutz' ideas` | | `080` | `* ([http://www.javaspecialists.eu/archive/Issue029.html](http://www.javaspecialists.eu/archive/Issue029.html)).` | | `081` | `*` | | `082` | `* @return memory usage of a single {@link Runnable} element in the thread` | | `083` | `*???????? pools work queue` | | `084` | `*/` | | `085` | `public`?`long`?`calculateMemoryUsage() {` | | `086` | `BlockingQueue queue = createWorkQueue();` | | `087` | `for`?`(``int`?`i =?``0``; i < SAMPLE_QUEUE_SIZE; i++) {` | | `088` | `queue.add(creatTask());` | | `089` | `}` | | `090` | `long`?`mem0 = Runtime.getRuntime().totalMemory()` | | `091` | `- Runtime.getRuntime().freeMemory();` | | `092` | `long`?`mem1 = Runtime.getRuntime().totalMemory()` | | `093` | `- Runtime.getRuntime().freeMemory();` | | `094` | `queue =?``null``;` | | `095` | `collectGarbage(``15``);` | | `096` | `mem0 = Runtime.getRuntime().totalMemory()` | | `097` | `- Runtime.getRuntime().freeMemory();` | | `098` | `queue = createWorkQueue();` | | `099` | `for`?`(``int`?`i =?``0``; i < SAMPLE_QUEUE_SIZE; i++) {` | | `100` | `queue.add(creatTask());` | | `101` | `}` | | `102` | `collectGarbage(``15``);` | | `103` | `mem1 = Runtime.getRuntime().totalMemory()` | | `104` | `- Runtime.getRuntime().freeMemory();` | | `105` | `return`?`(mem1 - mem0) / SAMPLE_QUEUE_SIZE;` | | `106` | `}` | | `107` | ? | | `108` | `/**` | | `109` | `* Create your runnable task here.` | | `110` | `*` | | `111` | `* @return an instance of your runnable task under investigation` | | `112` | `*/` | | `113` | `protected`?`abstract`?`Runnable creatTask();` | | `114` | ? | | `115` | `/**` | | `116` | `* Return an instance of the queue used in the thread pool.` | | `117` | `*` | | `118` | `* @return queue instance` | | `119` | `*/` | | `120` | `protected`?`abstract`?`BlockingQueue createWorkQueue();` | | `121` | ? | | `122` | `/**` | | `123` | `* Calculate current cpu time. Various frameworks may be used here,` | | `124` | `* depending on the operating system in use. (e.g.` | | `125` | `*?[http://www.hyperic.com/products/sigar](http://www.hyperic.com/products/sigar)). The more accurate the CPU time` | | `126` | `* measurement, the more accurate the results for thread count boundaries.` | | `127` | `*` | | `128` | `* @return current cpu time of current thread` | | `129` | `*/` | | `130` | `protected`?`abstract`?`long`?`getCurrentThreadCPUTime();` | | `131` | ? | | `132` | `}` | 然后自己繼承這個抽象類并實現它的三個抽象方法，比如下面是我寫的一個示例（任務是請求網絡數據），其中我指定期望CPU利用率為1.0（即100%），任務隊列總大小不超過100,000字節： | `01` | `package`?`pool_size_calculate;` | | `02` | ? | | `03` | `import`?`java.io.BufferedReader;` | | `04` | `import`?`java.io.IOException;` | | `05` | `import`?`java.io.InputStreamReader;` | | `06` | `import`?`java.lang.management.ManagementFactory;` | | `07` | `import`?`java.math.BigDecimal;` | | `08` | `import`?`java.net.HttpURLConnection;` | | `09` | `import`?`java.net.URL;` | | `10` | `import`?`java.util.concurrent.BlockingQueue;` | | `11` | `import`?`java.util.concurrent.LinkedBlockingQueue;` | | `12` | ? | | `13` | `public`?`class`?`SimplePoolSizeCaculatorImpl?``extends`?`PoolSizeCalculator {` | | `14` | ? | | `15` | `@Override` | | `16` | `protected`?`Runnable creatTask() {` | | `17` | `return`?`new`?`AsyncIOTask();` | | `18` | `}` | | `19` | ? | | `20` | `@Override` | | `21` | `protected`?`BlockingQueue createWorkQueue() {` | | `22` | `return`?`new`?`LinkedBlockingQueue(``1000``);` | | `23` | `}` | | `24` | ? | | `25` | `@Override` | | `26` | `protected`?`long`?`getCurrentThreadCPUTime() {` | | `27` | `return`?`ManagementFactory.getThreadMXBean().getCurrentThreadCpuTime();` | | `28` | `}` | | `29` | ? | | `30` | `public`?`static`?`void`?`main(String[] args) {` | | `31` | `PoolSizeCalculator poolSizeCalculator =?``new`?`SimplePoolSizeCaculatorImpl();` | | `32` | `poolSizeCalculator.calculateBoundaries(``new`?`BigDecimal(``1.0``),?``new`?`BigDecimal(``100000``));` | | `33` | `}` | | `34` | ? | | `35` | `}` | | `36` | ? | | `37` | `/**` | | `38` | `* 自定義的異步IO任務` | | `39` | `* @author Will` | | `40` | `*` | | `41` | `*/` | | `42` | `class`?`AsyncIOTask?``implements`?`Runnable {` | | `43` | ? | | `44` | `@Override` | | `45` | `public`?`void`?`run() {` | | `46` | `HttpURLConnection connection =?``null``;` | | `47` | `BufferedReader reader =?``null``;` | | `48` | `try`?`{` | | `49` | `String getURL =?``"[http://baidu.com](http://baidu.com/)"``;` | | `50` | `URL getUrl =?``new`?`URL(getURL);` | | `51` | ? | | `52` | `connection = (HttpURLConnection) getUrl.openConnection();` | | `53` | `connection.connect();` | | `54` | `reader =?``new`?`BufferedReader(``new`?`InputStreamReader(` | | `55` | `connection.getInputStream()));` | | `56` | ? | | `57` | `String line;` | | `58` | `while`?`((line = reader.readLine()) !=?``null``) {` | | `59` | `// empty loop` | | `60` | `}` | | `61` | `}` | | `62` | ? | | `63` | `catch`?`(IOException e) {` | | `64` | ? | | `65` | `}?``finally`?`{` | | `66` | `if``(reader !=?``null``) {` | | `67` | `try`?`{` | | `68` | `reader.close();` | | `69` | `}` | | `70` | `catch``(Exception e) {` | | `71` | ? | | `72` | `}` | | `73` | `}` | | `74` | `connection.disconnect();` | | `75` | `}` | | `76` | ? | | `77` | `}` | | `78` | ? | | `79` | `}` | 得到的輸出如下： | `01` | `Target queue memory usage (bytes): 100000` | | `02` | `createTask() produced pool_size_calculate.AsyncIOTask which took 40 bytes in a queue` | | `03` | `Formula: 100000 / 40` | | `04` | `* Recommended queue capacity (bytes): 2500` | | `05` | `Number of CPU: 4` | | `06` | `Target utilization: 1` | | `07` | `Elapsed time (nanos): 3000000000` | | `08` | `Compute time (nanos): 47181000` | | `09` | `Wait time (nanos): 2952819000` | | `10` | `Formula: 4 * 1 * (1 + 2952819000 / 47181000)` | | `11` | `* Optimal thread count: 256` | 推薦的任務隊列大小為2500，線程數為256，有點出乎意料之外。我可以如下構造一個線程池： | `1` | `ThreadPoolExecutor pool =` | | `2` | `new`?`ThreadPoolExecutor(``256``,?``256``, 0L, TimeUnit.MILLISECONDS,?``new`?`LinkedBlockingQueue(``2500``));` | **原創文章，轉載請注明：**?轉載自[并發編程網 – ifeve.com](http://ifeve.com/)**本文鏈接地址:**?[如何合理地估算線程池大小？](http://ifeve.com/how-to-calculate-threadpool-size/)