Part E: Text Classification with an Embedding Layer in a Feed-Forward Network

Multi-Topic Text Classification with Various Deep Learning Models

Author: Murat Karakaya
Date created….. 17 09 2021
Date published… 16 03 2022
Last modified…. 16 03 2022

Description: This is the Part E of the tutorial series “Multi-Topic Text Classification with Various Deep Learning Models” which covers all the phases of multi-class text classification:

Exploratory Data Analysis (EDA),
Text preprocessing
TF Data Pipeline
Keras TextVectorization preprocessing layer
Multi-class (multi-topic) text classification
Deep Learning model design & end-to-end model implementation
Performance evaluation & metrics
Generating classification report
Hyper-parameter tuning
etc.

We will design various Deep Learning models by using

the Keras Embedding layer,
Convolutional (Conv1D) layer,
Recurrent (LSTM) layer,
Transformer Encoder block, and
pre-trained transformer (BERT).

We will cover all the topics related to solving Multi-Class Text Classification problems with sample implementations in Python / TensorFlow / Keras environment.

We will use a Kaggle Dataset in which there are 32 topics and more than 400K total reviews.

If you would like to learn more about Deep Learning with practical coding examples,

Please subscribe to the Murat Karakaya Akademi YouTube Channel or
Do not forget to turn on notifications so that you will be notified when new parts are uploaded.
Follow my blog on muratkarakaya.net

You can access all the codes, videos, and posts of this tutorial series from the links below.

PART E: TEXT CLASSIFICATION WITH AN EMBEDDING LAYER IN A FEED-FORWARD NETWORK

In this part, we will use a Keras embedding layer in a Feed-Forward Network (FFN). During training, we will train this embedding layer as well.

As you remember, we have introduced the embedding concept in Part A. You can review the details of embedding and how Deep Learning models apply it for text classification in the following Murat Karakaya Akademi YouTube playlists:

Word Embedding in Keras
Word Embedding Hakkında Herşey (in Turkish)

If you are not familiar with the “classification with Deep Learning” topic, you can find the 5-part tutorials in the below Murat Karakaya Akademi YouTube playlists:

How to solve Classification Problems in Deep Learning with Tensorflow & Keras
Derin Öğrenmede Sınıflandırma Çeşitleri Nelerdir? Keras ve TensorFlow ile Nasıl Yapılmalı? (in Turkish)

A basic DL model can use an embedding layer as an initial layer and a couple of dense layers for classification, as seen in the below model.

Please remember that in Part D,

we built a TF data pipeline
we configured a Keras TextVectorization layer for text preprocessing and tokenization
we adopted the Keras TextVectorization layer onto the training dataset
we applied the Keras TextVectorization layer to the train, validation, and test datasets
we finalized the TF data pipeline by configuring it

In this part, we will use these train, validation, and test datasets.

Keras Embedding Layer

Keras Embedding Layer turns positive integers (indexes) into dense vectors of fixed size.

Important Arguments:

input_dim: Integer. Size of the vocabulary, i.e. maximum integer index + 1.
output_dim: Integer. Dimension of the dense embedding.
input_length: Length of input sequences, when it is constant. This argument is required if you are going to connect Flatten then Dense layers upstream (without it, the shape of the dense outputs cannot be computed).

Input shape: 2D tensor with shape: (batch_size, input_length).

Output shape: 3D tensor with shape: (batch_size, input_length, output_dim).

For more information, visit the Keras official website.

Let’s build a simple model with an embedding layer

I will use the Keras Functional API to build the model. You can learn the details of this API here.

In Part D, thanks to the Keras TextVectorization layer, we converted the reviews into a max_len fixed-size vector of integers.

Therefore, the input layer of the model would expect a sequence of max_len integers.

Moreover, in Part D, we also decided on the size of the dictionary: vocab_size. We will use this parameter to set up the embedding layer.

# Embedding size for each token
embed_dim = 16 
# Hidden layer size in feed forward network
feed_forward_dim = 64  
def create_model_FFN():
    inputs_tokens = layers.Input(shape=(max_len,), dtype=tf.int32)
    embedding_layer = layers.Embedding(input_dim=vocab_size, 
                                       output_dim=embed_dim, 
                                       input_length=max_len)
    x = embedding_layer(inputs_tokens)
    x = layers.Flatten()(x)
    dense_layer = layers.Dense(feed_forward_dim, activation='relu')
    x = dense_layer(x)
    x = layers.Dropout(.5)(x)
    outputs = layers.Dense(number_of_categories)(x)

    model = keras.Model(inputs=inputs_tokens, outputs=outputs, name='model_FFN')
    loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
    metric_fn  = tf.keras.metrics.SparseCategoricalAccuracy()
    model.compile(optimizer="adam", loss=loss_fn, metrics=metric_fn)  
    return model
  
model_FFN=create_model_FFN()model_FFN.summary()Model: "model_FFN"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_2 (InputLayer)        [(None, 40)]              0         
                                                                 
 embedding (Embedding)       (None, 40, 16)            1600000   
                                                                 
 flatten (Flatten)           (None, 640)               0         
                                                                 
 dense (Dense)               (None, 64)                41024     
                                                                 
 dropout (Dropout)           (None, 64)                0         
                                                                 
 dense_1 (Dense)             (None, 32)                2080      
                                                                 
=================================================================
Total params: 1,643,104
Trainable params: 1,643,104
Non-trainable params: 0
_________________________________________________________________
tf.keras.utils.plot_model(model_FFN,show_shapes=True)

Train

As you know, in Part D, we have the train, validation, and test datasets ready to input any ML/DL models. Now, we will use the train and validation datasets to train the model.

history=model_FFN.fit(train_ds, validation_data=val_ds ,verbose=2, epochs=7)Epoch 1/7
95/95 - 6s - loss: 3.4562 - sparse_categorical_accuracy: 0.0531 - val_loss: 3.4359 - val_sparse_categorical_accuracy: 0.1328 - 6s/epoch - 60ms/step
Epoch 2/7
95/95 - 1s - loss: 3.2865 - sparse_categorical_accuracy: 0.2319 - val_loss: 3.1461 - val_sparse_categorical_accuracy: 0.2937 - 519ms/epoch - 5ms/step
Epoch 3/7
95/95 - 1s - loss: 2.6160 - sparse_categorical_accuracy: 0.4597 - val_loss: 2.3218 - val_sparse_categorical_accuracy: 0.5766 - 512ms/epoch - 5ms/step
Epoch 4/7
95/95 - 1s - loss: 1.6017 - sparse_categorical_accuracy: 0.6891 - val_loss: 1.5879 - val_sparse_categorical_accuracy: 0.7250 - 503ms/epoch - 5ms/step
Epoch 5/7
95/95 - 1s - loss: 0.9447 - sparse_categorical_accuracy: 0.8222 - val_loss: 1.2255 - val_sparse_categorical_accuracy: 0.7578 - 519ms/epoch - 5ms/step
Epoch 6/7
95/95 - 0s - loss: 0.5667 - sparse_categorical_accuracy: 0.9066 - val_loss: 1.0433 - val_sparse_categorical_accuracy: 0.7812 - 493ms/epoch - 5ms/step
Epoch 7/7
95/95 - 0s - loss: 0.3656 - sparse_categorical_accuracy: 0.9444 - val_loss: 0.9470 - val_sparse_categorical_accuracy: 0.7922 - 498ms/epoch - 5ms/step
time: 9.25 s (started: 2022-03-01 12:16:44 +00:00)

Let’s observe the accuracy and the loss values during the training at each epoch:

You can observe the underfitting and overfitting behaviors of the model and decide how to handle these situations by tuning the hyper-parameters.

Save the trained model

You can save a Keras model for future use. If you want to learn about saving a Keras model, you can check out my tutorial in English or Turkish.

TF SavedModel vs HDF5: How to save TF & Keras Models with Custom Objects?
TF SavedModel vs HDF5: TF & Keras Modellerini Nasıl Saklamalıyız?

tf.keras.models.save_model(model_FFN, 'MultiClassTextClassification_FFN')

Test

We can test the trained model with the test dataset:

loss, accuracy = model_FFN.evaluate(test_ds)
print("Test accuracy: ", accuracy)1319/1319 [==============================] - 20s 14ms/step - loss: 0.9454 - sparse_categorical_accuracy: 0.7822
Test accuracy:  0.7822332382202148
time: 19.6 s (started: 2022-03-01 12:16:54 +00:00)

Predictions

We can use the trained model predict() method to predict the class of the given reviews as follows:

preds = model_FFN.predict(test_ds)
preds = preds.argmax(axis=1)time: 2.08 s (started: 2022-03-01 12:17:14 +00:00)

We can also get the actual (true) class of the given reviews as follows:

actuals = test_ds.unbatch().map(lambda x,y: y)  
actuals=list(actuals.as_numpy_iterator())time: 21.5 s (started: 2022-03-01 12:17:16 +00:00)

By comparing the preds and the actuals values, we can measure the model performance as below.

Classification Report

Since we are dealing with multi-class text classification, it is a good idea to generate a classification report to observe the performance of the model for each class. We can use the SKLearn classification_report() method to build a text report showing the main classification metrics.

The report is the summary of the precision, recall, and F1 scores for each class.

The reported averages include:

macro average (averaging the unweighted mean per label),
weighted average (averaging the support-weighted mean per label),
sample average (only for multilabel classification),
micro average (averaging the total true positives, false negatives, and false positives) is only shown for multi-label or multi-class with a subset of classes because it corresponds to accuracy otherwise and would be the same for all metrics.

from sklearn import metrics
print(metrics.classification_report(actuals, preds, digits=4))precision    recall  f1-score   support

           0     0.7651    0.7397    0.7522      2708
           1     0.9261    0.7760    0.8444      2455
           2     0.7431    0.8073    0.7739      2730
           3     0.7478    0.6846    0.7148      2559
           4     0.8234    0.8254    0.8244      2767
           5     0.9853    0.7942    0.8795      2619
           6     0.7741    0.8040    0.7888      2724
           7     0.3566    0.5044    0.4178      2379
           8     0.9659    0.8850    0.9237      2756
           9     0.9601    0.7219    0.8241      2233
          10     0.8999    0.8864    0.8931      2738
          11     0.4577    0.8471    0.5943      2603
          12     0.6731    0.8191    0.7390      2687
          13     0.7845    0.6696    0.7225      2267
          14     0.9442    0.8515    0.8955      2681
          15     0.4817    0.6544    0.5549      2682
          16     0.8431    0.8029    0.8225      2770
          17     0.9079    0.9556    0.9312      2725
          18     0.7222    0.5961    0.6531      2508
          19     0.7111    0.8067    0.7559      2716
          20     0.8709    0.8655    0.8682      2751
          21     0.7579    0.6581    0.7045      2735
          22     0.9314    0.8113    0.8672      2661
          23     0.9468    0.7955    0.8646      2572
          24     0.8791    0.8062    0.8410      2750
          25     0.6670    0.7359    0.6998      2651
          26     0.8484    0.8128    0.8303      2693
          27     0.9519    0.9057    0.9282      2663
          28     0.9488    0.7520    0.8390      2661
          29     0.6972    0.7515    0.7234      2592
          30     0.9252    0.7779    0.8452      2625
          31     0.9441    0.8403    0.8892      2755

    accuracy                         0.7822     84416
   macro avg     0.8076    0.7795    0.7877     84416
weighted avg     0.8088    0.7822    0.7898     84416

time: 670 ms (started: 2022-03-01 12:17:38 +00:00)

In multi-class classification, you need to be careful with the number of samples in each class (support value in the above table). If there is an imbalance among the classes you need to apply some actions.

Moreover, observe the precision, recall, and F1 scores of each class and compare them with the average values of these metrics.

If you would like to learn more about these metrics and how to handle imbalanced datasets, please refer to the following tutorials on the Murat Karakaya Akademi YouTube channel :)

Confusion Matrix

A confusion Matrix is used to know the performance of a Machine learning model at classification. The results are presented in a matrix form. The confusion Matrix gives a comparison between Actual and Predicted values. The numbers on the diagonal are the number of the correct predictions.

Below, you can observe the distribution of predictions over the classes.

We can also visualize the confusion matrix as the ratios of the predictions over the classes. The ratios on the diagonal are the ratios of the correct predictions.

Please, observe the correctly classified sample ratios of each class.

You may need to take some actions for some classes if the correctly predicted sample ratios are relatively low.

Summary

In this part,

we created a Deep Learning model with an Embedding layer to classify text into multi-classes,
we trained and tested the model,
we observed the model performance using Classification Report and Confusion Matrix,
and we pointed out the importance of the balanced dataset and the performance on each class.

In the next part, we will design another Deep Learning model by using a convolutional layer (Conv1D).

Do you have any questions or comments? Please share them in the comment section.

Thank you for your attention!

Tuesday, November 1, 2022