Using Transformers for Computer Vision

Are Vision Transformers actually useful?

Cameron R. Wolfe, Ph.D.
Towards Data Science
13 min readOct 5, 2022

--

A basic depiction of a vision transformer architecture (created by author)

What are Vision Transformers?

Transformers are a type of deep learning architecture, based primarily upon the self-attention module, that were originally proposed for sequence-to-sequence tasks (e.g., translating a sentence from one language to another). Recent deep learning research…

--

--