Internship Report > Worklog - AWS Learning Journey > Week 10 - Transfer Learning, BERT & T5 > Day 50 - Fine-Tuning Practice

Day 50 - Fine-Tuning Practice

Date: 2025-11-14 (Friday)
Status: “Done”

Fine-Tuning Recipes

Today focuses on practical knobs: which layers to freeze, how to schedule learning rates, and how to evaluate transfer setups.

Freezing vs. Training

Freeze lower layers when data is small or domain is close to pre-training.
Unfreeze progressively (top -> bottom) if accuracy plateaus.
Add a task-specific head (classification, span QA, seq2seq) and start there.

Hyperparameter Basics

Learning rate: 1e-5 to 3e-5 for full encoder; slightly higher if mostly frozen.
Warmup steps: ~5% to 10% of total steps to stabilize early training.
Max sequence length: match task; chunk long docs for QA.

Evaluation Loop

Track loss + task metrics (EM/F1 for QA, ROUGE for summarization, accuracy/F1 for classification).
Early stop on dev set; keep best checkpoint, not just last.
Compare feature-based baseline vs. full fine-tune for a small subset.

Deployment Considerations

Distill or quantize for latency if accuracy holds.
Cache tokenizer and truncation rules to avoid drift between train and serve.
Log prompts/inputs to debug closed-book vs. context-based behaviors.

Practice Targets for Today

Run (or plan) a small grid: learning rate, freezing strategy, max length.
Evaluate on a held-out set and record EM/F1 or ROUGE.
Decide post-training steps: distillation, quantization, or caching.