Showing posts with label Text Preprocessing. Show all posts

Tuesday, November 8, 2022

Building an Efficient TensorFlow Input Pipeline for Character-Level Text Generation

This tutorial is the second part of the “Text Generation in Deep Learning with Tensorflow & Keras” series.

In this tutorial series, we have been covering all the topics related to Text Generation with sample implementations in Python. In this tutorial, we will focus on how to build an Efficient TensorFlow Input Pipeline for Character-Level Text Generation.

First, we will download a sample corpus (text file). After opening the file and reading it line-by-line, we will convert it to a single line of text. Then, we will split the text into input character sequence (X) and output character (y).

Using tf.data.Dataset and Keras TextVectorization methods, we will

preprocess the text,
convert the characters into integer representation,
prepare the training dataset,
and optimize the data pipeline.

Thus, in the end, we will be ready to train a Language Model for character-level text generation.

If you would like to learn more about Deep Learning with practical coding examples, please subscribe to Murat Karakaya Akademi YouTube Channel or follow my blog on muratkarakaya.net. Do not forget to turn on notifications so that you will be notified when new parts are uploaded.

Photo by Harry Grout on Unsplash

Building an Efficient TensorFlow Input Pipeline for Word-Level Text Generation

This tutorial is the third part of the “Text Generation in Deep Learning with Tensorflow & Keras” series.

In this series, we have been covering all the topics related to Text Generation with sample implementations in Python. This tutorial will focus on how to build an Efficient TensorFlow Input Pipeline for Word-Level Text Generation. First, we will download a sample corpus (text file). After opening the file and reading it line-by-line, we will split the text into words. Then, we will generate pairs including an input word sequence (X) and an output word (y).

Using tf.data API and Keras TextVectorization methods, we will

preprocess the text,
convert the words into integer representation,
prepare the training dataset from the pairs,
and optimize the data pipeline.

Thus, in the end, we will be ready to train a Language Model for word-level text generation.

If you are ready, let’s get started!

Photo by Quinten de Graaf on Unsplash

Bookmarks to the selected Deep Learning / Machine Learning Resources on the Web

Author: Murat Karakaya
Date created: 19 May 2020
Last modified: 15 Dec 2021
Description: In this post, I share my bookmarks classified according to specific topics in Deep Learning / Machine Learning. Thus, you can save your time searching for similar information on the web. If you have any comments or updates please feel free to share with me!

If you are interested in Deep Learning / Machine learning, you can find hundreds of video tutorials with Python code samples in Jupyter notebooks at the following links:

Multi-Class Text Classification with a GPT3 Transformer block: An End-to-End Example

Author: Murat Karakaya & Cansen Çağlayan
Date created: 05 Oct 2021
Last modified: 19 Oct 2021
Description: This tutorial has 2 parts as explained below. Part A: Data Analysis & Text Preprocessing and Part B: Text Classification.

Photo by Håkon Grimstad on Unsplash

Keras Text Vectorization Layer: Configure, Adapt, Use, Save, Load, and Deploy

Author: Murat Karakaya
Date created: 05 Oct 2021
Last modified: 18 March 2023
Description: This is a tutorial about how to build, adapt, use, save, load, and deploy the Keras TextVectorization layer. You can access this tutorial on YouTube in English and Turkish. “TensorFlow Keras Text Vectorization Katmanı” / “TensorFlow Keras Text Vectorization Layer”.

In this tutorial, we will download a Kaggle Dataset in which there are 32 topics and more than 400K total reviews. We will use this dataset for a multi-class text classification task.

Our main aim is to learn how to effectively use the Keras TextVectorization layer in Text Processing and Text Classification.

The tutorial has 5 parts:

PART A: BACKGROUND
PART B: KNOW THE DATA
PART C: USE KERAS TEXT VECTORIZATION LAYER
PART D: BUILD AN END-TO-END MODEL
PART E: DEPLOY END-TO-END MODEL TO HUGGINGFACE SPACES USING GRADIO
SUMMARY

At the end of this tutorial, we will cover:

What a Keras TextVectorization layer is
Why we need to use a Keras TextVectorization layer in Natural Language Processing (NLP) tasks
How to employ a Keras TextVectorization layer in Text Preprocessing
How to integrate a Keras TextVectorization layer to a trained model
How to save and load a Keras TextVectorization layer and a model with a Keras TextVectorization layer
How to integrate a Keras TextVectorization layer with TensorFlow Data Pipeline API (tf.data)
How to design, train, save, and load an End-to-End model using Keras TextVectorization layer
How to deploy the End-to-End model with a Keras TextVectorization layer implemented with a custom standardize (custom_standardization) function using the Gradio library and the HuggingFace Spaces

Accessible on:

Multi-Topic (Multi-Class) Text Classification With Various Deep Learning Models Tutorial Series

Index Page

This is the index page of the “Multi-Topic (Multi-Class) Text Classification With Various Deep Learning Models” tutorial series.

Author: Murat Karakaya
Date created….. 17 Sept 2021
Date published… 11 March 2022
Last modified…. 09 April 2023

Description: This is a tutorial series that covers all the phases of text classification: Exploratory Data Analysis (EDA) of text, text preprocessing, and multi-class (multi-topic) text classification using the TF Data Pipeline and the Keras TextVectorization preprocessing layer.

We will design various Deep Learning models by using the Keras Embedding layer, Convolutional (Conv1D) layer, Recurrent (LSTM) layer, Transformer Encoder block, and pre-trained transformer (BERT).

We will use a Kaggle Dataset with 32 topics and more than 400K reviews.

We will cover all the topics related to solving Multi-Class Text Classification problems with sample implementations in Python TensorFlow Keras.

You can access the codes, videos, and posts from the below links.

If you would like to learn more about Deep Learning with practical coding examples, please subscribe to the Murat Karakaya Akademi YouTube Channel or follow my blog on muratkarakaya.net. Remember to turn on notifications so that you will be notified when new parts are uploaded.

PARTS

In this tutorial series, there will be several parts to cover the “Text Classification with various Deep Learning Models” in detail as follows.

You can access all these parts on YouTube in ENGLISH or TURKISH!

You can access the complete codes as Colab Notebooks using the links given in each video description (Eng/TR) or you can visit the Murat Karakaya Akademi Github Repo.

PART A: A PRACTICAL INTRODUCTION TO TEXT CLASSIFICATION
PART B: EXPLORATORY DATA ANALYSIS (EDA) OF THE DATASET
PART C: PREPARE THE DATASET
PART D: PREPROCESSING TEXT WITH TF DATA PIPELINE AND KERAS TEXT VECTORIZATION LAYER
PART E: MULTI-CLASS TEXT CLASSIFICATION WITH A FEED-FORWARD NETWORK (FFN) USING AN EMBEDDING LAYER
PART F: MULTI-CLASS TEXT CLASSIFICATION WITH A FEED-FORWARD NETWORK (FFN) USING AN 1 DIMENSIONAL CONVOLUTION (CONV1D) LAYER
PART G: MULTI-CLASS TEXT CLASSIFICATION WITH A FEED-FORWARD NETWORK (FFN) USING A RECURRENT (LSTM) LAYER
PART H: MULTI-CLASS TEXT CLASSIFICATION WITH A TRANSFORMER ENCODER BLOCK
PART I: MULTI-CLASS TEXT CLASSIFICATION WITH A PRE-TRAINED (BERT) TRANSFORMER
PART J: THE IMPACT OF TRAIN DATA SIZE ON THE PERFORMANCE OF MULTI-CLASS TEXT CLASSIFIERS
PART K: HYPERPARAMETER OPTIMIZATION (TUNING), UNDERFITTING, AND OVERFITTING

Comments or Questions?

Please share your Comments or Questions.

Thank you in advance.

Do not forget to check out the following parts!

Take care!

You can access Murat Karakaya Akademi via:

YouTube
Facebook
Instagram
LinkedIn
GitHub
Kaggle
muratkarakaya.net

Part A: A Practical Introduction to Text Classification

Multi-Topic Text Classification with Various Deep Learning Models

Author: Murat Karakaya
Date created….. 17 09 2021
Date published… 11 03 2022
Last modified…. 12 03 2022

Description: This is the Part A of the tutorial series that covers all the phases of text classification:

Exploratory Data Analysis (EDA),
Text preprocessing
TF Data Pipeline
Keras TextVectorization preprocessing layer
Multi-class (multi-topic) text classification
Deep Learning model design & end-to-end model implementation
Performance evaluation & metrics
Generating classification report
Hyper-parameter tuning
etc.

We will design various Deep Learning models by using

the Keras Embedding layer,
Convolutional (Conv1D) layer,
Recurrent (LSTM) layer,
Transformer Encoder block, and
pre-trained transformer (BERT).

We will cover all the topics related to solving Multi-Class Text Classification problems with sample implementations in Python / TensorFlow / Keras environment.

We will use a Kaggle Dataset in which there are 32 topics and more than 400K total reviews.

If you would like to learn more about Deep Learning with practical coding examples,

Please subscribe to the Murat Karakaya Akademi YouTube Channel or
Follow my blog on muratkarakaya.net
Do not forget to turn on notifications so that you will be notified when new parts are uploaded.

PARTS

In this tutorial series, there are several parts to cover Text Classification with various Deep Learning Models topics. You can access all the parts from this index page.

Tuesday, November 8, 2022

Building an Efficient TensorFlow Input Pipeline for Character-Level Text Generation

Building an Efficient TensorFlow Input Pipeline for Character-Level Text Generation

Building an Efficient TensorFlow Input Pipeline for Word-Level Text Generation

Friday, November 4, 2022

Bookmarks to the selected Deep Learning / Machine Learning Resources on the Web

Bookmarks to the selected Deep Learning / Machine Learning Resources on the Web