Internship Report > Worklog - AWS Learning Journey > Week 10 - Transfer Learning, BERT & T5 > Day 48 - BERT Bidirectional Context

Day 48 - BERT Bidirectional Context

Date: 2025-11-12 (Wednesday)
Status: “Done”

How BERT Learns

BERT pre-trains with bidirectional context so each token sees both left and right neighbors.

Masked Language Modeling (MLM)

Randomly mask ~15% tokens; predict the original token.
Loss encourages contextual embeddings that consider surrounding words.

Input:  learning from deep learning is like watching the sunset with my best [MASK]
Target: friend

Next Sentence Prediction (NSP)

Task: predict if sentence B follows sentence A.
Helps sentence-level coherence for tasks like QA and classification.

Downstream Use

Start from pre-trained weights.
Option A: freeze encoder, train a lightweight head (feature-based).
Option B: fine-tune encoder + head with a small learning rate.

Tips

Keep max_seq_length aligned to task data; long docs may need chunking.
Watch for catastrophic forgetting; gradual unfreezing can help.
Small batch? Use gradient accumulation to stabilize updates.

Practice Targets for Today

Prepare a BERT QA fine-tune plan (dataset, max length, lr, epochs).
Decide whether to freeze lower layers for your data size.
Add evaluation checkpoints (dev set EM/F1) to catch overfitting early.