LSTM: Understanding Output Types

INTRODUCTION

In this tutorial, we will focus on the outputs of the LSTM layer in Keras. To create powerful models, especially for solving Seq2Seq learning problems, LSTM is the key layer. To use LSTM effectively in models, we need to understand how it generates different results with respect to given parameters. Therefore, in this tutorial, we will learn and use 3 important parameters (units, return_sequences, and return_state).

At the end of the tutorial, you will be able to manage the LSTM layer to satisfy the model requirements correctly.

If you would like to follow up on Deep Learning tutorials, please subscribe to my YouTube Channel or follow my blog on muratkarakaya.net. Thank you!

Photo by Victor Barrios on Unsplash

Before starting, I would like to mention that I already prepared several tutorials for having a better understanding of LSTM. You can access these videos by following the playlists below:

All About LSTM
Seq2Seq Learning Problem
Applied Machine Learning with Keras

NOTE: You can watch this blog on YouTube

NOTE: You can access the full code on CoLab or on GitHub Pages.

INPUT

Let’s generate a sample input with time dimension as below:

Generated sequences as follows

One Sample Input Sequence in raw format:
X[0]=[5, 0, 9, 9]

In one_hot_encoded format:
X[0]=[[0 0 0 0 0 1 0 0 0 0]
 [1 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 1]
 [0 0 0 0 0 0 0 0 0 1]]

Shape of an input to LSTM (X[0].shape):  (1, 4, 10)

Shape of Input Batch to LSTM (X_train.shape):  (100, 4, 10)

QUICK RECAP OF LSTM

Internal Structure

Roll-Out Representation of LSTM for each Time Step

LSTM OUTPUTS

LSTM can return 4 different sets of results/states according to the given parameters:

Default: Last Hidden State (Hidden State of the last time step)
return_sequences=True : All Hidden States (Hidden State of ALL the time steps)
return_state=True : Last Hidden State+ Last Hidden State (again!) + Last Cell State (Cell State of the last time step)
return_sequences=True + return_state=True: All Hidden States (Hidden State of ALL the time steps) + Last Hidden State + Last Cell State (Cell State of the last time step)

Using these 4 different results/states we can stack LSTM layers in various ways

LSTM Default return value:

Output is only the hidden state at the last time step.

Because return_sequences and return_states parameters are default (False).

The size of output is 2D array of real numbers.

The first dimension is indicating the number of samples in the batch given to the LSTM layer

The second dimension is the dimensionality of the output space defined by the units parameter in Keras LSTM implementation.

Example Code:

Since, in the following examples, the LSTM unit parameter (dimensionality of the output space) is set to 16, the last hidden state will have a dimension of 16.

Therefore, the Output Shape becomes (None, 16) & output is a tensor for 16 real numbers for each sample in the batch!

None is a placeholder for the batch_size

# define model
numberOfLSTMunits= 16

input =Input(shape=(n_timesteps_in, n_features))
state_h= LSTM(numberOfLSTMunits) (input)
model1 = Model(inputs=input, outputs=state_h)
model1.summary()Model: "functional_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 4, 10)]           0         
_________________________________________________________________
lstm (LSTM)                  (None, 16)                1728      
=================================================================
Total params: 1,728
Trainable params: 1,728
Non-trainable params: 0
_________________________________________________________________result=model1.predict(X_train)
print('input shape:  ', X_train.shape)
print('state_h shape: ', result.shape)
print('result for the first sample/input: \n', result[0])input shape:   (100, 4, 10)
state_h shape:  (100, 16)
result for the first sample/input: 
 [-0.00609508 -0.02659022  0.05976189  0.04919129  0.03741886  0.05084493
 -0.15495221 -0.11518779 -0.06793577  0.11113258  0.00511333 -0.08341488
 -0.12279458  0.04131598 -0.05322764  0.03711022]

LSTM return_sequences=True value:

When the return_sequences parameter is True, it will output all the hidden states of each time steps.

The output is a 3D array of real numbers.

The first dimension is indicating the number of samples in the batch given to the LSTM layer

The second dimension is the number of time steps in the input sequence. By indexing the second dimension you can access all the hidden states of the units at a given time step

The third dimension is the dimensionality of the output space defined by the units parameter in Keras LSTM implementation.

The content of the array is all the hidden states of each time steps of the LSTM layer

Example Code:

Since we have 4 time steps and unit (dimensionality of the output space) is set to 16, the output shape will be (None, 4, 16).

Because LSTM returns 1 hidden state for each time step.

numberOfLSTMunits= 16

input =Input(shape=(n_timesteps_in, n_features))
all_state_h= LSTM(numberOfLSTMunits, return_sequences=True) (input)
model1 = Model(inputs=input, outputs=all_state_h)
model1.summary()Model: "functional_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         [(None, 4, 10)]           0         
_________________________________________________________________
lstm_1 (LSTM)                (None, 4, 16)             1728      
=================================================================
Total params: 1,728
Trainable params: 1,728
Non-trainable params: 0
_________________________________________________________________result=model1.predict(X_train)

print('input shape:  ', X_train.shape)
print('all_state_h shape: ', result.shape)
print('\nhidden states for the first sample: \n', result[0])
print('\nhidden states for the first sample at the second time step: \n', result[0][1])input shape:   (100, 4, 10)
all_state_h shape:  (100, 4, 16)

hidden states for the first sample: 
 [[ 0.06709018  0.00891957 -0.01000023  0.07295389  0.01851347 -0.07033152
   0.05792924  0.04002878  0.01420666 -0.0249275  -0.01301446  0.06948114
   0.00695276 -0.01454245 -0.07097618  0.07586081]
 [-0.02606905 -0.0483641  -0.01385659  0.04102324  0.0630736  -0.05909642
   0.08160708 -0.01253194 -0.04479993  0.03183461 -0.08493855  0.03182492
   0.01251994 -0.05334771 -0.02792171  0.04365619]
 [ 0.04251021 -0.03666312 -0.01969296  0.10111081  0.07083635 -0.10021838
   0.11462061  0.03320621 -0.01611687 -0.0081213  -0.0701735   0.092085
   0.01042888 -0.05635907 -0.09346859  0.1107368 ]
 [ 0.06886856  0.01675864 -0.01337116  0.01318128  0.08766495  0.00020673
   0.0516593  -0.00284591  0.04314535 -0.08270847 -0.03351395  0.10928006
  -0.00974036 -0.06649923 -0.09381317  0.05247972]]

hidden states for the first sample at the second time step: 
 [-0.02606905 -0.0483641  -0.01385659  0.04102324  0.0630736  -0.05909642
  0.08160708 -0.01253194 -0.04479993  0.03183461 -0.08493855  0.03182492
  0.01251994 -0.05334771 -0.02792171  0.04365619]

LSTM return_state=True value:

When return_state parameter is True, it will output the last hidden state twice and the last cell state as the output from LSTM layer.

The output is a three 2D-arrays of real numbers.

The first dimension is indicating the number of samples (batch size) given to the LSTM layer

The second dimension is the dimensionality of the output space defined by a unit parameter in the Keras LSTM layer.

It returns 3 arrays in the result:

The LSTM hidden state of the last time step: (None, 16) It is 16 because the dimensionality of the output space (unit parameter) is set to 16.
The LSTM hidden state of the last time step (again):(None, 16)
The LSTM cell state of the last time step: (None, 16) refers last cell state value whose dimensionality of the output space (unit parameter) is set to 16.

Example Code:

Since we set unit parameter (dimensionality of the output space) to 16, the output shape will be (None, 16) for all 3 tensors.

# define model
numberOfLSTMunits= 16

input =Input(shape=(n_timesteps_in, n_features))
LSTM_output, state_h, state_c= LSTM(numberOfLSTMunits, return_state=True) (input)
model1 = Model(inputs=input, outputs=[LSTM_output, state_h, state_c])
model1.summary()Model: "functional_13"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_9 (InputLayer)         [(None, 4, 10)]           0         
_________________________________________________________________
lstm_8 (LSTM)                [(None, 16), (None, 16),  1728      
=================================================================
Total params: 1,728
Trainable params: 1,728
Non-trainable params: 0
_________________________________________________________________model1.get_layer(index=1).output_shape[(None, 16), (None, 16), (None, 16)]print("Input layer output shape: ", model1.get_layer(index=0).output_shape)
print("LSTM layer output shape: ", model1.get_layer(index=1).output_shape)
results=model1.predict(X_train)
results=array(results)

print("\nWith batch of data:")
print('input shape:  ', X_train.shape)
print('result is 3 2D-array: ', results.shape)
print('\nLSTM_output is in the first array: ', results[0].shape)
print('\nstate_h which is exactly the same with LSTM_output is in the second array: ', results[1].shape)
print('\nIs the content of LSTM_output and state_h  exactly the same?\n ', results[0]==results[1])
print('\nstate_c is in the third array: ', results[2].shape)Input layer output shape:  [(None, 4, 10)]
LSTM layer output shape:  [(None, 16), (None, 16), (None, 16)]

With batch of data:
input shape:   (100, 4, 10)
result is 3 2D-array:  (3, 100, 16)

LSTM_output is in the first array:  (100, 16)

state_h which is exactly the same with LSTM_output is in the second array:  (100, 16)

Is the content of LSTM_output and state_h  exactly the same?
  [[ True  True  True ...  True  True  True]
 [ True  True  True ...  True  True  True]
 [ True  True  True ...  True  True  True]
 ...
 [ True  True  True ...  True  True  True]
 [ True  True  True ...  True  True  True]
 [ True  True  True ...  True  True  True]]

state_c is in the third array:  (100, 16)

LSTM return_state=True + return_sequences=True value:

return_state and return_sequences parameters can be True at the same time.

In this situation, LSTM layer returns 3 results:

(as return_sequences=True)

the hidden states for each input time step,

(as return_state=True) 2. the hidden state output for the last time step and 3. the cell state for the last time step.

# define model
numberOfLSTMunits= 16

input =Input(shape=(n_timesteps_in, n_features))
all_state_h, state_h, state_c= LSTM(numberOfLSTMunits, return_sequences=True, return_state=True) (input)
model1 = Model(inputs=input, outputs=[all_state_h, state_h, state_c])
model1.summary()Model: "functional_15"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_10 (InputLayer)        [(None, 4, 10)]           0         
_________________________________________________________________
lstm_9 (LSTM)                [(None, 4, 16), (None, 16 1728      
=================================================================
Total params: 1,728
Trainable params: 1,728
Non-trainable params: 0
_________________________________________________________________print("Input layer output shape: ", model1.get_layer(index=0).output_shape)
print("LSTM layer output shape: ", model1.get_layer(index=1).output_shape)Input layer output shape:  [(None, 4, 10)]
LSTM layer output shape:  [(None, 4, 16), (None, 16), (None, 16)]results=model1.predict(X_train)
print("\nWith batch of data:")
print('input shape:  ', X_train.shape)
print('result is 3 2D-array len (results): ', len (results))
print('\nall_state_h is in the first array: ', results[0].shape)
print('\nstate_h  is in the second array: ', results[1].shape)
print('\nstate_c is in the third array: ', results[2].shape)With batch of data:
input shape:   (100, 4, 10)
result is 3 2D-array len (results):  3

all_state_h is in the first array:  (100, 4, 16)

state_h  is in the second array:  (100, 16)

state_c is in the third array:  (100, 16)

CONCLUSION

There are 4 possible outputs from the LSTM layer
Important parameters are
units (dimensionality of the output space)
return_sequences
return_state
return_sequences and return_state parameters default values are FALSE
A combination of TRUE and FALSE values for return_sequences and return_state parameters generates a different set of outputs
units (dimensionality of the output space) parameter defines how many numbers in the resulting tensor (representing a hidden or cell state value) will be

If you want to learn more about LSTM or how to use these outputs for solving problems, please check out my YouTube channel, especially the following playlists:
All About LSTM
Seq2Seq Learning Problem
Applied Machine Learning with Keras