Member-only story

T5: Text-to-Text Transformers (Part One)

Creating a unified framework for language modeling

Cameron R. Wolfe, Ph.D.

Published in

TDS Archive

14 min readJun 27, 2023

The transfer learning paradigm is comprised of two main stages. First, we pre-train a deep neural network over a bunch of data. Then, we fine-tune this model (i.e., train it some more) over a more specific, downstream dataset. The exact implementation of these stages may take many different forms. In computer vision, for example, we often pre-train models on the ImageNet dataset using a supervised learning objective. Then, these models perform supervised fine-tuning on the downstream dataset (i.e., the task that we are actually trying to solve). Alternatively, in natural language processing (NLP), we often perform self-supervised pre-training over an unlabeled textual corpus.

Combining large, deep neural networks with massive (pre-)training datasets often leads to impressive results. This finding was found to be especially true for NLP. Given that raw textual data is freely available on the internet, we can simply download a massive textual corpus, pre-train a large neural net on this data, then fine-tune the model on a variety of downstream tasks (or just use zero/few-shot learning techniques). This large-scale transfer learning approach was initially explored by BERT [2], which pre-trained a transformer encoder over unlabeled data using a masking objective, then…

T5: Text-to-Text Transformers (Part One)

Creating a unified framework for language modeling

Create an account to read the full story.

Published in TDS Archive

Written by Cameron R. Wolfe, Ph.D.

No responses yet