8.2 異步計算 · 動手學深度學習--pytorch

# 8.2 異步計算此節內容對應PyTorch的版本本人沒怎么用過，網上參考資料也比較少，所以略:)，有興趣的可以去看看[原文](https://zh.d2l.ai/chapter_computational-performance/async-computation.html)。關于PyTorch的異步執行我只在[官方文檔](https://pytorch.org/docs/stable/notes/cuda.html)找到了一段: > By default, GPU operations are asynchronous. When you call a function that uses the GPU, the operations are enqueued to the particular device, but not necessarily executed until later. This allows us to execute more computations in parallel, including operations on CPU or other GPUs. In general, the effect of asynchronous computation is invisible to the caller, because (1) each device executes operations in the order they are queued, and (2) PyTorch automatically performs necessary synchronization when copying data between CPU and GPU or between two GPUs. Hence, computation will proceed as if every operation was executed synchronously. You can force synchronous computation by setting environment variable CUDA_LAUNCH_BLOCKING=1. This can be handy when an error occurs on the GPU. (With asynchronous execution, such an error isn’t reported until after the operation is actually executed, so the stack trace does not show where it was requested.) 大致翻譯一下就是: 默認情況下，PyTorch中的 GPU 操作是異步的。當調用一個使用 GPU 的函數時，這些操作會在特定的設備上排隊但不一定會在稍后立即執行。這就使我們可以并行更多的計算，包括 CPU 或其他 GPU 上的操作。一般情況下，異步計算的效果對調用者是不可見的，因為（1）每個設備按照它們排隊的順序執行操作，（2）在 CPU 和 GPU 之間或兩個 GPU 之間復制數據時，PyTorch會自動執行必要的同步操作。因此，計算將按每個操作同步執行的方式進行。可以通過設置環境變量`CUDA_LAUNCH_BLOCKING = 1`來強制進行同步計算。當 GPU 產生error時，這可能非常有用。（異步執行時，只有在實際執行操作之后才會報告此類錯誤，因此堆棧跟蹤不會顯示請求的位置。）