Cell, as the basic unit of neural network construction, corresponds to the concept of neural network layer, and the abstract encapsulation of Tensor computation operation can represent the neural network structure more accurately and clearly. In addition to the basic Tensor computation flow definition, the neural network layer contains functions such as parameter management and state management. Parameter is the core of neural network training and is usually used as an internal member variable of the neural network layer. In this section, we systematically introduce parameters, neural network layers and their related usage.
Parameter is a special class of Tensor, which is a variable whose value can be updated during model training. MindSpore provides the mindspore.Parameter
class for Parameter construction. In order to distinguish between Parameter for different purposes, two different categories of Parameter are defined below. In order to distinguish between Parameter for different purposes, two different categories of Parameter are defined below:
required_grad
needs to be set to True
.mean
and var
variables in BatchNorm), when requires_grad
needs to be set to False
.Parameter is set to
required_grad=True
by default.
We construct a simple fully-connected layer as follows:
import numpy as np
import mindspore
from mindspore import nn
from mindspore import ops
from mindspore import Tensor, Parameter
class Network(nn.Cell):
def __init__(self):
super().__init__()
self.w = Parameter(Tensor(np.random.randn(5, 3), mindspore.float32), name='w') # weight
self.b = Parameter(Tensor(np.random.randn(3,), mindspore.float32), name='b') # bias
def construct(self, x):
z = ops.matmul(x, self.w) + self.b
return z
net = Network()
In the __init__
method of Cell
, we define two parameters w
and b
and configure name
for namespace management. Use self.attr
in the construct
method to call directly to participate in Tensor operations.
After constructing the neural network layer by using Cell+Parameter, we can use various methods to obtain the Parameter managed by Cell.
To get a particular parameter individually, just call a member variable of a Python class directly.
print(net.b.asnumpy())
[-1.2192779 -0.36789745 0.0946381 ]
Trainable parameters can be obtained by using the Cell.trainable_params
method, and this interface is usually called when configuring the optimizer.
print(net.trainable_params())
[Parameter (name=w, shape=(5, 3), dtype=Float32, requires_grad=True), Parameter (name=b, shape=(3,), dtype=Float32, requires_grad=True)]
Use the Cell.get_parameters()
method to get all parameters, at which point a Python iterator will be returned.
print(type(net.get_parameters()))
<class 'generator'>
Or you can call Cell.parameters_and_names
to return the parameter names and parameters.
for name, param in net.parameters_and_names():
print(f"{name}:\n{param.asnumpy()}")
w:
[[ 4.15680408e-02 -1.20311625e-01 5.02573885e-02]
[ 1.22175144e-04 -1.34980649e-01 1.17642188e+00]
[ 7.57667869e-02 -1.74758151e-01 -5.19092619e-01]
[-1.67846107e+00 3.27240258e-01 -2.06452996e-01]
[ 5.72323874e-02 -8.27963874e-02 5.94243526e-01]]
b:
[-1.2192779 -0.36789745 0.0946381 ]
Parameter is a special kind of Tensor, so its value can be modified by using the Tensor index modification.
net.b[0] = 1.
print(net.b.asnumpy())
[ 1. -0.36789745 0.0946381 ]
The Parameter.set_data
method can be called to override the Parameter by using a Tensor with the same Shape. This method is commonly used for Cell traversal initialization by using Initializer.
net.b.set_data(Tensor([3, 4, 5]))
print(net.b.asnumpy())
[3. 4. 5.]
The main role of parameters is to update their values during model training, which involves parameter modification during runtime after backward propagation to obtain gradients, or when untrainable parameters need to be updated. Due to the compiled design of MindSpore's computational graph, it is necessary at this point to use the mindspore.ops.assign
interface to assign parameters. This method is commonly used in Custom Optimizer scenarios. The following is a simple sample modification of parameter values during runtime:
import mindspore as ms
@ms.jit
def modify_parameter():
b_hat = ms.Tensor([7, 8, 9])
ops.assign(net.b, b_hat)
return True
modify_parameter()
print(net.b.asnumpy())
[7. 8. 9.]
ParameterTuple, variable tuple, used to store multiple Parameter, is inherited from tuple tuples, and provides cloning function.
The following example provides the ParameterTuple creation method:
from mindspore.common.initializer import initializer
from mindspore import ParameterTuple
# Creation
x = Parameter(default_input=ms.Tensor(np.arange(2 * 3).reshape((2, 3))), name="x")
y = Parameter(default_input=initializer('ones', [1, 2, 3], ms.float32), name='y')
z = Parameter(default_input=2.0, name='z')
params = ParameterTuple((x, y, z))
# Clone from params and change the name to "params_copy"
params_copy = params.clone("params_copy")
print(params)
print(params_copy)
(Parameter (name=x, shape=(2, 3), dtype=Int64, requires_grad=True), Parameter (name=y, shape=(1, 2, 3), dtype=Float32, requires_grad=True), Parameter (name=z, shape=(), dtype=Float32, requires_grad=True))
(Parameter (name=params_copy.x, shape=(2, 3), dtype=Int64, requires_grad=True), Parameter (name=params_copy.y, shape=(1, 2, 3), dtype=Float32, requires_grad=True), Parameter (name=params_copy.z, shape=(), dtype=Float32, requires_grad=True))
Some Tensor operations in neural networks do not behave the same during training and inference, e.g., nn.Dropout
performs random dropout during training but not during inference, and nn.BatchNorm
requires updating the mean
and var
variables during training and fixing their values unchanged during inference. So we can set the state of the neural network through the Cell.set_train
interface.
When set_train
is set to True, the neural network state is train
, and the default value of set_train
interface is True
:
net.set_train()
print(net.phase)
train
When set_train
is set to False, the neural network state is predict
:
net.set_train(False)
print(net.phase)
predict
Normally, the neural network layer interface and function interface provided by MindSpore can meet the model construction requirements, but since the AI field is constantly updating, it is possible to encounter new network structures without built-in modules. At this point, we can customize the neural network layer through the function interface provided by MindSpore, Primitive operator, and can use the Cell.bprop
method to customize the reverse. The following are the details of each of the three customization methods.
MindSpore provides a large number of basic function interfaces, which can be used to construct complex Tensor operations, encapsulated as neural network layers. The following is an example of Threshold
with the following equation:
$$ y =\begin{cases} x, &\text{ if } x > \text{threshold} \\ \text{value}, &\text{ otherwise } \end{cases} $$
It can be seen that Threshold
determines whether the value of the Tensor is greater than the threshold
value, keeps the value whose judgment result is True
, and replaces the value whose judgment result is False
. Therefore, the corresponding implementation is as follows:
class Threshold(nn.Cell):
def __init__(self, threshold, value):
super().__init__()
self.threshold = threshold
self.value = value
def construct(self, inputs):
cond = ops.gt(inputs, self.threshold)
value = ops.fill(inputs.dtype, inputs.shape, self.value)
return ops.select(cond, inputs, value)
Here ops.gt
, ops.fill
, and ops.select
are used to implement judgment and replacement respectively. The following custom Threshold
layer is implemented:
m = Threshold(0.1, 20)
inputs = mindspore.Tensor([0.1, 0.2, 0.3], mindspore.float32)
m(inputs)
Tensor(shape=[3], dtype=Float32, value= [ 2.00000000e+01, 2.00000003e-01, 3.00000012e-01])
It can be seen that inputs[0] = threshold
, so it is replaced with 20
.
In special scenarios, we not only need to customize the forward logic of the neural network layer, but also want to manually control the computation of its reverse, which we can define through the Cell.bprop
interface. The function will be used in scenarios such as new neural network structure design and backward propagation speed optimization. In the following, we take Dropout2d
as an example to introduce custom Cell reverse.
class Dropout2d(nn.Cell):
def __init__(self, keep_prob):
super().__init__()
self.keep_prob = keep_prob
self.dropout2d = ops.Dropout2D(keep_prob)
def construct(self, x):
return self.dropout2d(x)
def bprop(self, x, out, dout):
_, mask = out
dy, _ = dout
if self.keep_prob != 0:
dy = dy * (1 / self.keep_prob)
dy = mask.astype(mindspore.float32) * dy
return (dy.astype(x.dtype), )
dropout_2d = Dropout2d(0.8)
dropout_2d.bprop_debug = True
The bprop
method has three separate input parameters:
Generally we need to calculate the reverse result according to the reverse derivative formula based on the forward output and the reverse result of the front layer, and return it. The reverse calculation of Dropout2d
requires masking the reverse result of the front layer based on the mask
matrix of the forward output, and then scaling according to keep_prob
. The final implementation can get the correct calculation result.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。