# 擴展PyTorch
# 擴展PyTorch
本篇文章中包含如何擴展 `torch.nn`, `torch.autograd`和 使用我們的 `C 庫`編寫自定義的`C`擴展。
## 擴展 torch.autograd
如果你想要添加一個新的 `Operation` 到`autograd`的話,你的`Operation`需要繼承 `class Function`。`autograd`使用`Function`計算結果和梯度,同時編碼 `operation`的歷史。每個新的 `operation(function)` 都需要實現三個方法:
- `__init__ (optional)` - 如果你的`operation`包含非`Variable`參數,那么就將其作為`__init__`的參數傳入到`operation`中。例如:`AddConstant Function`加一個常數,`Transpose Function`需要指定哪兩個維度需要交換。如果你的`operation`不需要額外的參數,你可以忽略`__init__`。
- `forward()` - 在里面寫執行此`operation`的代碼。可以有任意數量的參數。如果你對某些參數指定了默認值,則這些參數是可傳可不傳的。記住:`forward()`的參數只能是`Variable`。函數的返回值既可以是 `Variable`也可以是`Variables`的`tuple`。同時,請參考 `Function`\[function\]的 `doc`,查閱有哪些 方法是只能在`forward`中調用的。
- `backward()` - 梯度計算公式。 參數的個數和`forward`返回值的個數一樣,每個參數代表傳回到此`operation`的梯度. `backward()`的返回值的個數應該和此`operation`輸入的個數一樣,每個返回值對應了輸入值的梯度。如果`operation`的輸入不需要梯度,或者不可導,你可以返回`None`。 如果`forward()`存在可選參數,你可以返回比輸入更多的梯度,只是返回的是`None`。
下面是 `Linear` 的實現代碼:
```
# Inherit from Function
class Linear(Function):
# bias is an optional argument
def forward(self, input, weight, bias=None):
self.save_for_backward(input, weight, bias)
output = input.mm(weight.t())
if bias is not None:
output += bias.unsqueeze(0).expand_as(output)
return output
# This function has only a single output, so it gets only one gradient
def backward(self, grad_output):
# This is a pattern that is very convenient - at the top of backward
# unpack saved_tensors and initialize all gradients w.r.t. inputs to
# None. Thanks to the fact that additional trailing Nones are
# ignored, the return statement is simple even when the function has
# optional inputs.
input, weight, bias = self.saved_tensors
grad_input = grad_weight = grad_bias = None
# These needs_input_grad checks are optional and there only to
# improve efficiency. If you want to make your code simpler, you can
# skip them. Returning gradients for inputs that don't require it is
# not an error.
if self.needs_input_grad[0]:
grad_input = grad_output.mm(weight)
if self.needs_input_grad[1]:
grad_weight = grad_output.t().mm(input)
if bias is not None and self.needs_input_grad[2]:
grad_bias = grad_output.sum(0).squeeze(0)
return grad_input, grad_weight, grad_bias
```
現在,為了可以更簡單的使用自定義的`operation`,我們建議將其用一個簡單的 `helper function` 包裝起來。 functions:
```
def linear(input, weight, bias=None):
# First braces create a Function object. Any arguments given here
# will be passed to __init__. Second braces will invoke the __call__
# operator, that will then use forward() to compute the result and
# return it.
return Linear()(input, weight, bias)
```
你可能想知道你剛剛實現的 `backward`方法是否正確的計算了梯度。你可以使用 小的有限的差分進行數值估計。
```
from torch.autograd import gradcheck
# gradchek takes a tuple of tensor as input, check if your gradient
# evaluated with these tensors are close enough to numerical
# approximations and returns True if they all verify this condition.
input = (Variable(torch.randn(20,20).double(), requires_grad=True),)
test = gradcheck.gradcheck(Linear(), input, eps=1e-6, atol=1e-4)
print(test)
```
## 擴展 torch.nn
`nn` 包含兩種接口 - `modules`和他們的`functional`版本。通過這兩個接口,你都可以擴展`nn`。但是我們建議,在擴展`layer`的時候,使用`modules`, 因為`modules`保存著參數和`buffer`。如果不需要參數的話,那么建議使用`functional`(激活函數,pooling,這些都不需要參數)。
增加一個`operation`的 `functional`版本已經在上面一節介紹完畢。
增加一個模塊(`module`)。 由于`nn`重度使用`autograd`。所以,添加一個新`module`需要實現一個 用來執行 計算 和 計算梯度 的`Function`。從現在開始,假定我們想要實現一個`Linear module`,記得之前我們已經實現了一個`Linear Funciton`。 只需要很少的代碼就可以完成這個工作。 現在,我們需要實現兩個方法:
- `__init__ (optional)` - 輸入參數,例如`kernel sizes`, `numbers of features`, 等等。同時初始化 `parameters`和`buffers`。
- `forward()` - 實例化一個執行`operation`的`Function`,使用它執行`operation`。和`functional wrapper(上面實現的那個簡單的wrapper)`十分類似。
`Linear module`實現代碼:
```
class Linear(nn.Module):
def __init__(self, input_features, output_features, bias=True):
self.input_features = input_features
self.output_features = output_features
# nn.Parameter is a special kind of Variable, that will get
# automatically registered as Module's parameter once it's assigned
# as an attribute. Parameters and buffers need to be registered, or
# they won't appear in .parameters() (doesn't apply to buffers), and
# won't be converted when e.g. .cuda() is called. You can use
# .register_buffer() to register buffers.
# nn.Parameters can never be volatile and, different than Variables,
# they require gradients by default.
self.weight = nn.Parameter(torch.Tensor(input_features, output_features))
if bias:
self.bias = nn.Parameter(torch.Tensor(output_features))
else:
# You should always register all possible parameters, but the
# optional ones can be None if you want.
self.register_parameter('bias', None)
# Not a very smart way to initialize weights
self.weight.data.uniform_(-0.1, 0.1)
if bias is not None:
self.bias.data.uniform_(-0.1, 0.1)
def forward(self, input):
# See the autograd section for explanation of what happens here.
return Linear()(input, self.weight, self.bias)
#注意這個Linear是之前實現過的Linear
```
## 編寫自定義`C`擴展
Coming soon. For now you can find an example at [GitHub](https://github.com/pytorch/extension-ffi).
- PyTorch 中文文檔
- 主頁
- 自動求導機制
- CUDA語義
- 擴展PyTorch
- 多進程最佳實踐
- 序列化語義
- torch
- torch.Tensor
- torch.Storage
- torch.nn
- torch.nn.functional
- torch.autograd
- torch.optim
- torch.nn.init
- torch.multiprocessing
- torch.legacy
- torch.cuda
- torch.utils.ffi
- torch.utils.data
- torch.utils.model_zoo
- torchvision
- torchvision.datasets
- torchvision.models
- torchvision.transforms
- torchvision.utils
- 致謝