Quick guide to Transformer Architectures

Quick Refresher Rest assured, we’re not revisiting the Transformer model architecture and paper for the 100th time. However, this model is so versatile that it’s easy to forget the breadth of its applications. Here’s a quick refresher: It is an architecture based on the self-attention mechanism. Despite of its drawbacks like difficult training and high … Continue reading Quick guide to Transformer Architectures