Member-only story

T5: Text-to-Text Transformers (Part One)

Creating a unified framework for language modeling

Cameron R. Wolfe, Ph.D.
TDS Archive
(Photo by Patrick Tomasso on Unsplash)

The transfer learning paradigm is comprised of two main stages. First, we pre-train a deep neural network over a bunch of data. Then, we fine-tune this model (i.e., train it some more) over a more specific, downstream dataset. The exact implementation of these stages may take many different forms. In computer vision, for example, we often pre-train models on the ImageNet dataset using a supervised learning objective. Then, these models perform supervised fine-tuning on the downstream dataset (i.e., the task that we are actually trying to solve). Alternatively, in natural language processing (NLP), we often perform self-supervised pre-training over an unlabeled textual corpus.

Combining large, deep neural networks with massive (pre-)training datasets often leads to impressive results. This finding was found to be especially true for NLP. Given that raw textual data is freely available on the internet, we can simply download a massive textual corpus, pre-train a large neural net on this data, then fine-tune the model on a variety of downstream tasks (or just use zero/few-shot learning techniques). This large-scale transfer learning approach was initially explored by BERT [2], which pre-trained a transformer encoder over unlabeled data using a masking objective, then…

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Already have an account? Sign in

No responses yet

What are your thoughts?