🐬

【TF Tutorial】Chapter 3: Deep Dive into Keras Layers

2024/07/16に公開

3. Deep Dive into Keras Layers

3.1 Dense Layers

Dense layers are also known as fully connected layers. They are the basic building block of neural networks where each neuron is connected to every other neuron in the previous and the next layer. This layer performs a linear operation followed by an activation function.

Formula: y = f(Wx + b)
W: weights matrix
x: input vector
b: bias vector
f: activation function (e.g. ReLU, sigmoid)

Typically, the inputs to a Dense layer in Keras (and most other deep learning frameworks) are 2D arrays, or 3D arrays that include the specified time steps.

・2D array as input

# Example: 2D input to a Dense layer
import tensorflow as tf
from tensorflow.keras.layers import Dense

# Input shape: (batch_size, input_dim)
input_data = tf.random.normal((32, 64))  # 32 batch size, each of 64 dimensions

dense_layer = Dense(units=128, activation='relu')
output_data = dense_layer(input_data)

print(output_data.shape)  # Output shape: (32, 128)

・3D array as input

# Example: 3D input to a Dense layer
import tensorflow as tf
from tensorflow.keras.layers import Dense

# Input shape: (batch_size, timesteps, input_dim)
input_data = tf.random.normal((32, 10, 64))  # 32 batch size, each with 10 timesteps of 64 dimensions

# Flatten the input data along the timesteps
flattened_input = tf.reshape(input_data, (32 * 10, 64))

dense_layer = Dense(units=128, activation='relu')
output_data = dense_layer(flattened_input)

# Reshape the output back to (batch_size, timesteps, units)
output_data = tf.reshape(output_data, (32, 10, 128))

print(output_data.shape)  # Output shape: (32, 10, 128)
Flatten Layer

Dense layer requires 2d input, so we should transform the input shape before passing to it to a Dense layer.

Flatten layer can make the input to a 2d array.

from tensorflow.keras.layers import Flatten

# Assuming input_data is of shape (batch_size, timesteps, input_dim)
input_data_3d = tf.random.normal((32, 10, 64))  # 32 batch size, each with 10 timesteps of 64 dimensions
input_data_4d = tf.random.normal((32, 10, 10,  64))  

flattened_input_3d = Flatten()(input_data_3d)
flattened_input_4d = Flatten()(input_data_4d)

print(flattened_input_3d.shape)
print(flattened_input_4d.shape)

# output
# (32, 640)
# (32, 6400)

If you wanna apply dense layer while keeping the input shape, try to use TimeDistributed method. It allows to apply dense without changing the dim.

from tensorflow.keras.layers import TimeDistributed, Dense

input_data_3d = tf.random.normal((32, 10, 64))
input_data_4d = tf.random.normal((32, 10, 10,  64))

time_distributed_dense = TimeDistributed(Dense(128, activation='relu'))
output_data_3d = time_distributed_dense(input_data_3d)  # Keeps the timesteps dimension
output_data_4d = time_distributed_dense(input_data_4d)  # Keeps the timesteps dimension

print(output_data_3d.shape)
print(output_data_4d.shape)

# output
# (32, 10, 128)
# (32, 10, 10, 128)

3.2 Convolutional Layers

A convolutional layer consists of multiple filters (kernels) that slide over the input data, performing a convolution operation.
They apply a set of filters to the input, extracting high-level features.

Formula: y = f(conv(x, W) + b)
conv: convolution operation
W: filter/kernel
x: input data
b: bias
f: activation functions

・Conv2D in keras

from keras.layers import Conv2D

# Example Conv2D layer
conv_layer = Conv2D(filters=32,          # Number of filters
                    kernel_size=(3, 3),  # Size of the filter (3x3)
                    strides=(1, 1),      # Stride of the convolution
                    padding='same',      # Padding type
                    activation='relu')   # Activation function

・padding='same': This means the input will be padded with zeros so that the output feature map has the same width and height as the input.

A conv layer is typically used together with a pooling(after conv) layer to make the model robust and stable.

3.3 Pooling Layers

Pooling layers are used to reduce the spatial dimensions (width and height) of the input volume, thus reducing the number of parameters of the model and making the model robust.

Basic Types:
・Max Pooling: Selects the maximum value from the region covered by the filter.
・Average Pooling: Computes the average of all values from the region covered by the filter.

else of these, many types of polling exist. I'll also write an article about that.

3.4 Recurrent Layers

Recurrent layers are used for sequential data, such as time series or natural language processing. They maintain a state (memory) that captures information about previous elements in the sequence.

The image of RNN:
RNN is a big block that includes 2 weight matrices for input and previous output(hidden state), and these matrices are updated when training the model.
Like repeatedly using a single model, with two weight matrices that learn the characteristics of a sequence and sequentially return an appropriate output for each time step's single input.

Basic Types:
Simple RNN, LSTM (Long Short-Term Memory), GRU (Gated Recurrent Unit).

Simple RNN Formula: h_t = tanh(W_h h_{t-1} + W_x x_t + b)
x_t: input at time t.
h_{t-1}: hidden state from previous time step.
W_h, W_t: weight matrices.
b: bias.
tanh: hyperbolic tangent as activation function.

LSTM and GRU are improved(adding gate unit) RNN for handling to Vanishing/Exploding Gradient Problem.

Key Differences Between LSTM and GRU

・Complexity:
LSTMs are more complex with separate cell states and hidden states, while GRUs have a simpler structure with fewer gates.
・Performance:
GRUs often perform similarly to LSTMs but can train faster and require fewer computational resources due to their simpler structure.
・Use Cases:
The choice between LSTM and GRU often depends on the specific problem and computational constraints. LSTMs are generally preferred for tasks requiring learning long-term dependencies, while GRUs are used when computational efficiency is a priority.

3.5 Custom Layers

Custom layers allow you to create layers with unique functionalities that are not provided by standard layers in Keras.

To define custom layers, need to inherit from tf.keras.layers.Layer and override the build and call methods.

・build(input_shape): called once when the layer is first used. It defines and initializes the layer's weights.
・call(inputs): defines the forward pass of the layer.

・Example of Dense layer

import tensorflow as tf

class MyCustomLayer(tf.keras.layers.Layer):
    def __init__(self, units=32):
        super(MyCustomLayer, self).__init__()
        self.units = units

    def build(self, input_shape):
        self.w = self.add_weight(shape=(input_shape[-1], self.units),
                                 initializer='random_normal',
                                 trainable=True)
        self.b = self.add_weight(shape=(self.units,),
                                 initializer='random_normal',
                                 trainable=True)

    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b

# Usage
model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(4,)),
    MyCustomLayer(units=10)
])

・self.units:
This specifies the number of output units for the layer.
・self.w:
This creates a weight matrix with a shape of (input_shape[-1], self.units), where input_shape[-1] is the size of the last dimension of the input tensor. The weights are initialized with a random normal distribution and are trainable.
・self.b:
This creates a bias vector with a shape of (self.units,), also initialized with a random normal distribution and is trainable.

Discussion