KERAS, a deep learning framework-building a neural network like a building block, is mainly divided into seven parts, each part can be realized only by a few keras API functions, and users can build a neural network model layer by layer like a building block.
1. Create model Create model
2. Add Level Add Layer
3. Compile the model
4. Data filling and fitting
5. Model evaluation
6. Model prediction
7. Save the model Save the model
The following sections will introduce each part in detail. . .
Keras has three main models: sequential model, functional model and subclass model.
Before you start to create the model, you need to introduce tensorflow and keras modules, and then create a sequential model.
The sequential API is defined as follows:
The layers parameter can be empty, and then the layer is added to the model by the add method, and the layer in the model is removed by the pop method accordingly.
To create a function API model, you can call Keras. Specify a model with multiple inputs and multiple outputs.
Karas. Model definition:
Layer is the basic component of neural network. A layer contains the calculation method of tensor-in/tensor-out and some states, which are stored in TensorFlow variable (that is, the weight of the layer).
Layers are mainly divided into six categories: basic layer, core layer, roll base layer, pool layer, circulation layer and fusion layer.
Derived classes can be implemented in the following ways:
** init (): define the attributes of the layer and create the static variables of the layer.
** build(self, input_shape): create an input-related variable, and you can call add_weight ().
** call(self, *args, **kwargs): after ensuring that build () has been called, call it in call.
** get_config(self): Returns the dictionary type containing the configuration used to initialize this layer.
Create a SimpleDense derived class and add a trainable weight to the build () function. Execute y = input * w+b.
Result output:
Create a derived class of ComputeSum and add untrained weights to the init function. After adding elements of the input matrix along axis 0, x = x=x+self.total
Result output:
The core layer is the most commonly used layer for data conversion and processing.
The dense layer is the so-called fully connected neural network layer, referred to as the fully connected layer. Each neuron in the fully connected layer is fully connected with all neurons in its previous layer.
Dense implements the following operations: output = activation (dot (input, kernel)+bias), where activation is the activation function calculated element by element, kernel is the weight matrix created by the network layer, and bias is the bias vector created by it (only useful when use_bias is True).
Apply the activation function to the output. Arithmetic processing of input signals after they enter neurons.
The comparison curves of sigmoid, tanh, ReLU and softplus are shown in the following figure:
Activation function can be realized by setting a separate activation layer activation, or by passing activation parameters when constructing layer objects:
Dropout randomly sets the ratio of input units to 0 every time it updates in training, which helps to prevent over-fitting. Inputs that are not set to 0 will be amplified by 1 /( 1-rate), so the sum of all inputs remains unchanged.
Note that the Dropout layer is only applied when the training is set to True, so no value will be discarded in the reasoning process. When using model.fit, the training is automatically set to True accordingly.
Level the input. Does not affect the batch size. Note: If the input shape is (batch,) and there is no characteristic axis, flattening will increase the channel size, while the output shape is (batch, 1).
Adjust the input to a specific size
Encapsulates an arbitrary expression as a Layer object. In the Lambda layer, so that any TensorFlow function can be used when building the model. λ layer is most suitable for simple operation or rapid experiment. The Lambda layer is preserved by serializing Python bytecodes.
Overwrite the sequence with the coverage value to skip the time step.
For each time step of the input tensor (the first dimension of the tensor), if the value of the input tensor is equal to mask_value in all time steps, the time step will be masked (skipped) in all downstream layers. If any downstream layer does not support overlay, but still receives such input overlay information, an exception will be thrown.
For example:
Embedding is a method to transform discrete variables into continuous vector representations. This layer can only be used as the first layer in the model.
Embedding has the following three main purposes: to find the nearest neighbor in the embedding space, which can be used to make recommendations according to users' interests. As an input to the supervised learning task. Used to visualize the relationship between different discrete variables.
For example:
Output result:
From the introduction of Wikipedia, we can know that convolution is the definition between two functions (? Follow? ) to generate new functions. So what? And then what? The convolution of can be written as? , the mathematical definition is as follows:
Convolution can have different explanations:? It can be regarded as a kernel in deep learning and a filter in signal processing. And then what? It can be what we call a feature in machine learning or a signal in signal processing. F and g (? ) to be right? Weighted summation.
One-dimensional time domain convolution operation;
Two-dimensional image convolution operation;
The purpose of convolution operation is to extract different features of input. The first convolution layer may only extract some low-level features such as edges, straight lines and corners, and more layers of networks can iteratively extract more complex features from low-level features.
One-dimensional convolution layer (real-time domain convolution) is used for neighborhood filtering of one-dimensional input signal.
For example:
Result output:
2D convolution layer (e.g. spatial convolution of images).
For example:
Result output:
3D convolution layer (for example, spatial convolution on volume)
For example:
Result output:
Depth separable 1D convolution. In this layer, the channels are deeply convolved, and then the mixed channels are convolved point by point. If use_bias is True and a deviation initializer is provided, it will add a deviation vector to the output. It then optionally applies an activation function to produce the final output.
Depth-separable 2D convolution. Separable convolution includes first performing depth-space convolution (which acts on each input channel separately) and then performing point convolution, which will mix the resulting output channels. The depth_multiplier parameter controls how many output channels are generated for each input channel in the depth step.
Intuitively, separable convolution can be understood as a method to decompose the convolution kernel into two smaller kernels, or an extreme version of the Inception block.
Transpose the convolution layer (sometimes called deconvolution). The demand for transposed convolution usually comes from the desire to convert something with the convolution output size into something with the convolution input size by using a transformation opposite to the normal convolution direction, while maintaining a convolution-compatible connection mode.
Pool layer imitates human visual system to reduce the dimension of data, and represents images with higher-level features. The purpose of sharing is to reduce information redundancy; Scale invariance and rotation invariance of lifting model. Prevent overfitting.
There is usually a maximum pool layer and an average pool layer.
There are three forms of pool layer: 1D for one-dimensional data, 2D for two-dimensional image data and 3D for image data with time series data.
Recurrent Neural Network (RNN) is based on the idea of memory model. It is expected that the network can remember the previous features and infer the following results according to the features, and the overall network structure is constantly circulating, so it is named RNN.
Long-term and short-term $ TERM memory was first published in 1997. Because of its unique design structure, LSTM is suitable for processing and predicting important events with long intervals and delays in time series.
For example:
Result output:
GRU gated cycle unit -Cho et al. 20 14.
In LSTM, three gate functions are introduced: input gate, forgetting gate and output gate to control input value, memory value and output value. In GRU model, there are only two doors: update door and reset door. Compared with LSTM, GRU lacks a "gate" and several parameters, but it can achieve the same function as LSTM. Considering the computing power and time cost of hardware, we often choose a more "practical" GRU.
For example:
Result output:
Cyclic neural network layer base class.
Specifies the description of the initial state of RNN.
By calling RNN layers with the keyword parameter initial_state, their initial states can be symbolically specified. The value of initial_state should be a tensor or tensor list representing the initial state of RNN layer.
You can specify the initial state of the RNN layer digitally by calling the reset_states method with the keyword parameter states. The value of states should be a Numpy array or a list of Numpy arrays representing the initial state of RNN layer.
Explanation of passing external constants to RNN
You can use the constants keyword parameter of RNN. Call (and RNN.call) to pass the "external" constant to the cell. This requires the cell.call method to accept the same keyword parameter constants. These constants can be used to adjust the cell transformation with additional static input (which does not change with time), and can also be used for attention mechanism.
For example:
Before training the model, we need to configure the learning process, which is done by compiling method.
He receives three parameters: optimizer opt.
The making of letters of recommendation is related to whether doctoral students can successfully get the opportunity to pursue doctoral studies, and