Winter 2024

EE 046010 -  Advanced Topics in Deep Learning: Transformers

📣 Please note that this course will be conducted in English. Students' presentations of the papers must also be delivered in English.

The Transformer is a deep learning architecture that is used to represent time series in a highly efficient manner. It has played a major role in the success of language models such as ChatGPT, in recent state-of-the-art automatic speech recognition systems, and in vision.

The course will begin by covering the fundamental concepts behind the development of the Transformer and the Transformer model, in great detail. We will then delve into the most important research that has been conducted in this field, both theoretically and practically. Lastly, we will cover how Transformers are applied in natural language processing (NLP), speech processing and recognition, and computer vision. During the second part of the course, students will have the opportunity to present papers that have been selected from the field and prepare a mini-project.

The topics that will be covered include:

  1. Introduction
  2. RNN, LSTM, sequence-to-sequence, attention mechanism, LAS Transformers as database query, self-attention mechanism
  3. Components of Transformers and their usage
  4. Transformers implementation in NLP
  5. Transformers Implementation in Speech
  6. Transformers Implementation in Vision
  7. Prompts
  8. In-context learning
  9. The alignment problem and its limitation
  10. Tuning Transformers using Reinforcement learning from human feedback (RLHF)
  11. In-context learning
  12. Flash Networks; RWKV; interpretability


There will be 12 meetings: the first 7-8 meetings will be given by the lecturer (me). The last meetings will be given by the course participants (see note about working in pairs under the grade composition). Each student will have to present a paper and possible future directions. The article described in class and the proposed continuation directions will be the basis for the project. Discussions will take place during all the meetings, so attendance in the lectures is mandatory.

Grade composition:

  • Active participation and task execution related to the study material: ~10%
  • Presentation of an article from a list (or another article with the lecturer's approval): 20%-25%
  • A project based on the article chosen for presentation in class: 65%-70%
    The purpose of the project is to thoroughly examine a scientific article and propose a specific extension. Examples of possible extensions can be improving the performance of the algorithm proposed in the article, applying the algorithm to different data or another domain, etc.
    - A deadline for submitting a report summarizing the project results will be announced later.
    - The project submission will be at the end of the semester and after the exam period, although it is recommended to work on the project during the semester.
    - Depending on the number of registrants, we recommend working in pairs, both for the purpose of presenting an article and for executing the project.