Building an Efficient TensorFlow Input Pipeline for Character-Level Text Generation
This tutorial is the second part of the “Text Generation in Deep Learning with Tensorflow & Keras” series.
In this tutorial series, we have been covering all the topics related to Text Generation with sample implementations in Python. In this tutorial, we will focus on how to build an Efficient TensorFlow Input Pipeline for Character-Level Text Generation.
First, we will download a sample corpus (text file). After opening the file and reading it line-by-line, we will convert it to a single line of text. Then, we will split the text into input character sequence (X) and output character (y).
Using tf.data.Dataset and Keras TextVectorization methods, we will
- preprocess the text,
- convert the characters into integer representation,
- prepare the training dataset,
- and optimize the data pipeline.
Thus, in the end, we will be ready to train a Language Model for character-level text generation.
If you would like to learn more about Deep Learning with practical coding examples, please subscribe to Murat Karakaya Akademi YouTube Channel or follow my blog on muratkarakaya.net. Do not forget to turn on notifications so that you will be notified when new parts are uploaded.
Photo by Harry Grout on Unsplash