Saturday, November 5, 2022

Fundamentals of Controllable Text Generation


Fundamentals of Controllable Text Generation

Author: Murat Karakaya
Date created: 21 April 2021
Last modified: 24 May 2021
Description: This is an introductory tutorial on Controllable Text Generation in Deep Learning which is the second part of the “Controllable Text Generation with Transformers” series. This series will focus on developing TensorFlow (TF) / Keras implementation of Controllable Text Generation from scratch. You can access all these parts from my blog at

Before getting started, I assume that you have already reviewed:

Please ensure that you have completed the above tutorial series to easily follow the below discussions.

Accessible on:

Photo by Chris Leipelt on Unsplash

PART A2: Fundamentals of Controllable Text Generation

What is the difference between Text Generation and Controllable Text Generation?

When generating samples from a Language Model (LM) by iteratively sampling the next token, we do not have much control over attributes of the output text, such as the topic, the style, the sentiment, etc.

However, many applications would demand good control over the LM output.

For example, if we plan to use an LM to generate reading materials for kids, we would like to guide the output stories to be safe, educational, and easily understood by children.

The main difference between Text Generation and Controllable Text Generation is the ability to control the LM output attributes.

What is Controllable Text Generation?

Controllable text generation is the task of generating natural sentences whose attributes can be controlled. For example, we can define some attributes of the text to be generated such as:

  • stylistic: politeness, sentiment, formality, etc.
  • demographic: attributes of the person writing the text such as gender, age, etc.
  • content: topics, information, keywords, entities, tense, etc.
  • ordering: information, events, etc. in summaries or stories

For example, in this work, the authors train an LM such that it can control the tense (present or past) and attitude (positive or negative) of the generated text. Some experiment results are given below:

Below, you see another sample for generating poems from this work.

Application Areas

Neural Machine Translation: Machine translation is automatically translating from one language to another.

Image Captioning: When an image is given, the model generates a text describing the image. You can access the below demo from this link.

Dialogue response generation (Chatbot) task controlling

  • persona,
  • various aspects of the response such as politeness, formality, authority etc,
  • grounding the responses in an external source of information
  • and controlling the topic sequence

The below example is taken from Andrea Madotto, Zhaojiang Lin, Yejin Bang, Pascale Fung, The Adapter-Bot: All-In-One Controllable Conversational Model

Story generation where you can control

  • the ending
  • the persona
  • the plot
  • the topic sequence

The below example is taken from Nanyun Peng, Marjan Ghazvininejad, Jonathan May, Kevin Knight, Towards Controllable Story Generation

Modify text such as emails, reports, etc. written by a human

  • formality
  • politeness
  • pulling disparate source documents into a coherent unified one

The below example is taken from Zhiqiang Hu, Roy Ka-Wei Lee, Charu C. Aggarwal, Aston Zhang, Text Style Transfer: A Review and Experimental Evaluation

How can we design and train a Controllable Text Generator?

Actually, there are many proposals to design a Controllable Text Generator. However, we first need to understand the guidelines and principles.

Below, I will share the best practices to design a Controllable Text Generation model. But first, let’s learn an important Machine Learning approach.

What is Semi-Supervised Learning?

According to Wikipedia:

Semi-supervised learning is an approach to machine learning that combines a small amount of labeled data with a large amount of unlabeled data during training. Semi-supervised learning falls between unsupervised learning (with no labeled training data) and supervised learning (with only labeled training data).

We can use the Semi-supervised learning approach to train the Controllable Text Generator as discussed below.

A Generic Methodology: Semi-Supervised Learning

Labeled data are crucial for today’s machine learning. Thanks to the Internet, collecting data is easy, but scalably labeling that data is very hard, time-consuming, and expensive.

Labeling the collected data is only feasible to generate labels for important problems where the reward is worth the effort, like machine translation, speech recognition, self-driving, etc.

Therefore, Machine learning researchers aim at developing unsupervised learning algorithms to learn a good representation of a dataset, which can then be used to solve tasks using only a few labeled examples.

In designing our training a Controllable Text Generator, we can follow a 2-step approach:

  • Step 1: We train a large unsupervised next-step-prediction Language model on large amounts of unlabelled data.
    We expect that the language model will have a good representation of the unlabelled data.
  • Step 2: we can use these language models with smaller labeled data to re-purpose them to generate controllable text.

The above approach is an example of Semi-supervised learning.

Moreover, in Step 1, we can use a pre-trained language model instead of training from scratch.

In practice, we often have access to a pre-trained generative language model such as Bert or GPT. These models have already learned the distribution over token sequences by optimizing for the next token prediction.

The most crucial part of designing a controllable text generator is to decide how to use, modify, or re-purpose these pre-trained models.

We can now talk about some details about these steps.

How to train an unsupervised language model with unlabelled text data?

In general, we have two options:

  • Option A Causal Language Models
  • Option B Masked Language Models

Option A: Causal Language Models

Causal language modeling is the task of predicting the next token with high accuracy for a given input token sequence. In this way, the model only attends to the left context (tokens on the left of the mask). Such training is particularly interesting for generation / predicting the next (rightmost) token.

For these kinds of models, we mostly use the Decoder part of a transformer since it has a masked self-attention layer that masks the right-side of a token while training.

The below image indicates the masked self-attention layer on a token sequence.

The most famous application of this approach is the GPT family of language models.

The above image is taken from the Jay Alammar blog which I strongly recommend you if you want to understand how a transformer works in detail.


Observing the above image, one can ask the question:

You mentioned that we are using unlabeled data in training but in the image we have the labels! Are there any mistakes?

Nope! There is no mistake. However, we need to clarify what is “labeling” in the context of supervised learning.

In my definition,

labeling is attaching a metadata to the data which does not exist in the raw data

For instance, in an image data set, each image file actually has RGB values for each pixel. We can attach to an image file a label according to the content of the file, e.g. dog. The raw data (image file) does not have this metadata. Someone should label the data using outside information!

On the other hand, in our example, the corpus has the text as the raw data. We can tokenize the raw data and use the same token as the label to predict the next one. However, this label does exist in the raw data. Thus, we do not consider this kind of process as “labeling”.

Option B: Masked Language Models

We can train a language model with unlabeled data to predict a token given at any position, not only at the end of the sequence (rightmost)!

The most famous application of this approach is the Bert family of language models.

For these kinds of models, we can use the Encoder part of a transformer since it can apply self-attention to the whole token sequence during training.

If you want to go deeper about the Transformer architecture, please visit this blog by Jay Alammar, The Illustrated GPT-2 (Visualizing Transformer Language Models)

How to tune an unsupervised Language Model as a Controllable Language Model?

Below, I will briefly explain the methods to fine-tune an unsupervised Language Model as a Controllable Language Model. If you need a more comprehensive explanation, take a look at this video: Plug and Play Language Models: A Simple Approach to Controlled Text Generation and this blog post.

In the upcoming parts, we will implement these approaches in TensorFlow & Keras.

What are the strategies to employ a pre-trained Language Model as a Controllable Language Model?

There many proposals in the literature to apply some transformation on pre-trained Language Models to have a Controllable Language Model. The classification of these proposals is not straightforward yet.

One way to classify these strategies suggested by this post as follows:

  1. Apply guided decoding strategies and select desired outputs at test time.
  2. Optimize for the most desired outcomes via good prompt design.
  3. Fine-tune the base model or steerable layers to do conditioned content generation.

Which parts of a Language Model can be modified to control the generated text?

In this work (videopaper), the authors provide a new schema of the pipeline of the generation process by classifying it into five modules. The control of attributes in the generation process requires modification of these five modules (see below).

According to this work, we have five modules that can be used for controlling the generation process:

  1. External Input module is responsible for the initialization $h_0 $, of the generation process.
  2. Sequential Input module is the input $x_t$ at each time step of the generation.
  3. Generator Operations module performs consistent operations or calculations on all the input at each time step.
  4. Output module is the output of which is further projected onto the vocabulary space to predict the token $o_t$ at each time step.
  5. Training Objective module takes care of the loss functions used for training the generator.

Don’t Worry!

We will delve into these strategies when we implement them by using Python Tensorflow Keras in upcoming parts.

You can check the above references to the sample studies for each strategy. In addition, you can visit Lilian Weng’s blog post and watch this video for a summary of the related work.

Moreover, please note that controlling (conditioning) a language model is still an open research question. Each strategy above has certain pros & cons.

What kinds of Deep Learning Models can be used as a Controllable Text Generator?

You can implement the above strategies by modifying some existing Neural Network paradigms.

One famous approach in Neural NLP is to apply sequence-to-sequence learning (Seq2Seq). The implementation of Seq2Seq learning mostly depends on the encoder-decoder paradigm. To apply the encoder-decoder paradigm, we can use different methods such as:

  • Recurrent Neural Networks (RNN: e.g., LSTM / GRU)
  • Transformers (e.g., GPT / BERT)

We have also other paradigms in Artificial Neural Networks such as:

  • Generative Adversarial Methods (GAN),
  • Reinforcement Learning (RL),
  • Variational Autoencoder (VAE),
  • etc.

We can adopt these architectures to build a Controllable Text Generation model. We can decide the location of the modification on an existing model following the above two classifications.

Don’t Worry!

We will discuss the above concepts and models when we implement sample Controllable Text Generation models by using Python Tensorflow Keras in the upcoming parts.

Controllable Text Generation Summary

  • So far, we have reviewed the important conceptsapproaches, and methods related to the Controllable Text Generation in Deep Learning.
  • We will design and build a TensorFlow Data Pipeline for training a Controllable Text Generator in the next part.
  • After then, we will implement a Controllable Text Generator with a Recurrent Neural Network (LSTM) model in Python, TensorFlow & Keras. We will adopt the LSTM Language Model model’s Sequential Input to control the generated text.

Thus, if you want to learn how to code a Controllable Text Generator with various Deep Learning models, please follow up on all the parts of the “Controllable Text Generation Tutorial series.

Comments or Questions?

Please share your Comments or Questions.

Thank you in advance.

Do not forget to check out the next parts!

Take care!


Language Models:

Text Generation:

Controllable Text Generation:

Controllable Text Generation with Guided Decoding (Sampling) Strategy:

Controllable Text Generation with Smart Prompt Design Strategy:

Controllable Text Generation with Fine-tuning Strategy:

You can access Murat Karakaya Akademi via: