Day 39 - NMT & Text Summarization
Date: 2025-10-30 (Thursday)
Status: “Done”
Neural Machine Translation (NMT)
Architecture Overview
The input sentence is converted to a numerical representation and encoded into a deep representation by a six-layer encoder, which is subsequently decoded by a six-layer decoder into the translation in the target language.
Encoder and Decoder Layers
Layers consist of:
- Self-attention: Helps model focus on different parts of input
- Feed-forward layers: Process information
- Encoder-decoder attention layer (decoder only): Uses deep representation from last encoder layer
Attention Mechanism Example
Translation Task: “The woman took the empty magazine out of her gun”
Target Language: Czech
Visualization of Self-Attention
When translating “magazine”, the attention mechanism:
- Creates strong attention link between ‘magazine’ and ‘gun’
- This helps correctly translate “magazine” as “zásobník” (gun magazine)
- Instead of “časopis” (news magazine)
Why Attention Matters
Attention = mechanism that helps model focus on the most important parts of input when generating output
In other words: Attention = selective information processing instead of consuming everything at once
In NLP, attention allows the model to decide which words most strongly influence understanding another word in the sentence.
NMT Implementation Details
Model Architecture Components:
- Input tokens (source language)
- Target tokens (target language)
Step 1: Make Copies
Create two copies each of input and target tokens (needed in different places of model)
Step 2: Encoder
- One copy of input tokens → encoder
- Transform into key and value vectors
- Go through embedding layer → LSTM
Step 3: Pre-attention Decoder
- One copy of target tokens → pre-attention decoder
- Shift sequence right + add start-of-sentence token (teacher forcing)
- Go through embedding layer → LSTM
- Output becomes query vectors
Note: Encoder and pre-attention decoder can run in parallel (no dependencies)
Step 4: Prepare for Attention
- Get query, key, value vectors
- Create padding mask to identify padding tokens
- Use copy of input tokens for this step
Step 5: Attention Layer
Pass queries, keys, values, and mask to attention layer
- Outputs context vectors and mask
Step 6: Post-attention Decoder
Drop mask, pass context vectors through:
- LSTM
- Dense layer
- LogSoftmax
Step 7: Output
Model returns:
- Log probabilities
- Copy of target tokens (for loss computation)
Text Summarization
Summarization = condensing content while preserving main ideas
Two Types:
Concept: Select the most important sentences from original text
Characteristics:
- Doesn’t rewrite text
- Preserves original wording
- Like “highlighting key sentences”
Process (Classical TextRank):
- Split into sentences
- Convert sentences to embeddings
- Calculate similarity (cosine)
- Create graph (sentences as nodes)
- Rank using TextRank
- Select top-ranked sentences
Result: Subset of original text
2. Abstractive Summarization
Concept: Rewrite main ideas in new sentences
Characteristics:
- Creates sentences that never appeared in original
- Understands content → paraphrases
- Requires strong models (seq2seq, Transformer)
Example:
Original article discusses prosecutor’s investigation process…
Generated summary:
“Prosecutor: So far no videos were used in the crash investigation.”
This sentence doesn’t exist in original but captures the main idea.
| Feature |
Extractive |
Abstractive |
| Approach |
Select existing sentences |
Generate new sentences |
| Creativity |
Low |
High |
| Complexity |
Simpler |
More complex |
| Accuracy |
More faithful to source |
May introduce errors |
| Model |
TextRank, graph-based |
Seq2seq, Transformer |
Step-by-step extractive summarization:
- Combine articles → full text
- Split sentences
- Convert sentences → vectors (embeddings)
- Create similarity matrix
- Build graph (sentences = nodes, edges = similarity)
- Rank nodes using TextRank algorithm
- Select top-ranked sentences → Summary
This is the classical algorithm that dominated before deep learning!
Syntax and Semantics Review
Syntax – Sentence Structure
Syntax examines how words combine to form grammatically correct sentences.
Includes:
- Word order: English uses S–V–O (Subject–Verb–Object)
- Phrase structure: NP (Noun Phrase), VP (Verb Phrase), PP (Prepositional Phrase)
- Dependency relations: How words relate to each other
NLP Relevance:
- POS tagging
- Parsing
- Named Entity Recognition
- Machine translation
- Question answering
Semantics – Meaning of Words and Sentences
Semantics focuses on meaning independent of external context.
Includes:
- Lexical semantics: Word meaning
- Compositional semantics: Sentence meaning
- Synonymy / antonymy: Similar/opposite meanings
- Hypernymy / hyponymy: General/specific relationships
NLP Relevance:
- Word embeddings
- Similarity measures
- Semantic search
- Text classification
Pragmatics – Intended Meaning in Context
Pragmatics studies meaning from context, speaker intention, and real-world knowledge.
Covers:
- Implicature: Hidden meaning
- Deixis: Context-dependent references (this/that/here/you)
- Speech acts: Promises, requests, apologies
- Politeness, formality, sarcasm: Tone and intention
NLP Relevance:
- Dialogue systems
- Chatbots
- Sentiment and irony detection
- Contextual language models (BERT, GPT)