Machine Learning

❯

❯

Paper Template

Oct 13, 20251 min read

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

📌 Summary

BERT introduces a masked language model and next sentence prediction for pre-training. Key innovation is bidirectional context.

📃 Citation

Devlin et al., 2018 – https://arxiv.org/abs/1810.04805

🧠 Key Ideas

Masked Language Modeling (MLM)
Next Sentence Prediction (NSP)
Fine-tuning on downstream tasks

🔍 What’s New?

Bidirectionality in pretraining
Outperformed SOTA on GLUE, SQuAD

💬 Discussion

Why MLM instead of traditional LM?
Do we still need NSP today?

👥 Notes by Group

Mehrdad’s Notes

Graph View

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
📌 Summary
📃 Citation
🧠 Key Ideas
🔍 What’s New?
💬 Discussion
👥 Notes by Group

Created with Quartz v4.5.1 © 2025

GitHub
Discord Community