Internship Report > Worklog - AWS Learning Journey > Week 10 - Transfer Learning, BERT & T5

Week 10 - Transfer Learning, BERT & T5

Week: 2025-11-10 to 2025-11-14
Status: “In progress”

Week 10 Overview

This week dives into transfer learning for NLP and how modern QA systems benefit from pre-trained transformers. We will contrast classical training with feature-based reuse and fine-tuning, then study two flagship models: BERT for bidirectional context and T5 for text-to-text multitask learning, along with practical QA setups (context-based vs. closed-book).

Key Topics

Transfer Learning Fundamentals

Classical vs. transfer learning pipelines
Reusing pre-trained weights to speed convergence
Feature-based representations vs. fine-tuning
Benefits: faster training, better predictions, less labeled data

Question Answering Modes

Context-based QA (span extraction with provided context)
Closed-book QA (generate answers without context)
How pre-training quality shapes QA performance

BERT Bidirectional Context

Masked language modeling for contextual embeddings
Next sentence prediction for sentence-level coherence
Using both left and right context to predict tokens
Typical downstream uses: QA, sentiment, classification

T5 Text-to-Text Multitask

Unified text-to-text framing for multiple tasks
Prompting the same model for rating, QA, summarization, translation
Scaling with large corpora (e.g., C4 vs. Wikipedia)
Multitask transfer to improve generalization

Training & Data Strategy

Labeled vs. unlabeled data mix; self-supervised masking
Freezing backbone vs. adding task heads
Fine-tuning recipes for downstream tasks (QA, summarization, translation)

Learning Objectives

✅ Explain transfer learning and when to prefer it over training from scratch
✅ Distinguish feature-based reuse from full fine-tuning
✅ Compare context-based QA and closed-book QA setups
✅ Summarize how BERT and T5 pre-train and transfer across tasks
✅ Identify why transfer learning reduces data needs and training time

Daily Breakdown

Day	Focus	Topics
46	Transfer Learning Intro	Classical vs. transfer pipeline, reuse weights, feature-based vs. fine-tuning, benefits
47	Question Answering	Context-based span QA vs. closed-book QA, data needs, evaluation cues
48	BERT Bidirectionality	Masked LM, next sentence prediction, leveraging both contexts for token prediction
49	T5 Multitask Model	Text-to-text prompts, multitask sharing, scaling data (C4 vs. Wikipedia)
50	Fine-tuning Practice	Freezing layers vs. adding heads, downstream tasks: QA, summarization, translation

Prerequisites

Solid grasp of transformer architecture from Week 9
Comfortable with attention mechanisms and encoder-decoder flow
Basic familiarity with PyTorch or TensorFlow for fine-tuning

Next Steps

Read the BERT and T5 papers to internalize pre-training objectives
Fine-tune a pre-trained BERT QA model (e.g., SQuAD-style span extraction)
Experiment with T5 prompts for QA, summarization, and sentiment tasks
Compare feature-based vs. fine-tuned performance on your own dataset