Internship Report > Worklog - AWS Learning Journey > Week 8 - Natural Language Processing & Deep Learning > Day 39 - NMT & Text Summarization

Day 39 - NMT & Text Summarization

Date: 2025-10-30 (Thursday)
Status: “Done”

Neural Machine Translation (NMT)

Architecture Overview

The input sentence is converted to a numerical representation and encoded into a deep representation by a six-layer encoder, which is subsequently decoded by a six-layer decoder into the translation in the target language.

Encoder and Decoder Layers

Layers consist of:

Self-attention: Helps model focus on different parts of input
Feed-forward layers: Process information
Encoder-decoder attention layer (decoder only): Uses deep representation from last encoder layer

Attention Mechanism Example

Translation Task: “The woman took the empty magazine out of her gun”
Target Language: Czech

Visualization of Self-Attention

When translating “magazine”, the attention mechanism:

Creates strong attention link between ‘magazine’ and ‘gun’
This helps correctly translate “magazine” as “zásobník” (gun magazine)
Instead of “časopis” (news magazine)

Why Attention Matters

Attention = mechanism that helps model focus on the most important parts of input when generating output

In other words: Attention = selective information processing instead of consuming everything at once

In NLP, attention allows the model to decide which words most strongly influence understanding another word in the sentence.

NMT Implementation Details

Model Architecture Components:

Inputs:

Input tokens (source language)
Target tokens (target language)

Step 1: Make Copies

Create two copies each of input and target tokens (needed in different places of model)

Step 2: Encoder

One copy of input tokens → encoder
Transform into key and value vectors
Go through embedding layer → LSTM

Step 3: Pre-attention Decoder

One copy of target tokens → pre-attention decoder
Shift sequence right + add start-of-sentence token (teacher forcing)
Go through embedding layer → LSTM
Output becomes query vectors

Note: Encoder and pre-attention decoder can run in parallel (no dependencies)

Step 4: Prepare for Attention

Get query, key, value vectors
Create padding mask to identify padding tokens
Use copy of input tokens for this step

Step 5: Attention Layer

Pass queries, keys, values, and mask to attention layer

Outputs context vectors and mask

Step 6: Post-attention Decoder

Drop mask, pass context vectors through:

LSTM
Dense layer
LogSoftmax

Step 7: Output

Model returns:

Log probabilities
Copy of target tokens (for loss computation)

Text Summarization

Summarization = condensing content while preserving main ideas

Two Types:

1. Extractive Summarization

Concept: Select the most important sentences from original text

Characteristics:

Doesn’t rewrite text
Preserves original wording
Like “highlighting key sentences”

Process (Classical TextRank):

Split into sentences
Convert sentences to embeddings
Calculate similarity (cosine)
Create graph (sentences as nodes)
Rank using TextRank
Select top-ranked sentences

Result: Subset of original text

2. Abstractive Summarization

Concept: Rewrite main ideas in new sentences

Characteristics:

Creates sentences that never appeared in original
Understands content → paraphrases
Requires strong models (seq2seq, Transformer)

Example: Original article discusses prosecutor’s investigation process…

Generated summary:

“Prosecutor: So far no videos were used in the crash investigation.”

This sentence doesn’t exist in original but captures the main idea.

Extractive vs Abstractive Summary

Feature	Extractive	Abstractive
Approach	Select existing sentences	Generate new sentences
Creativity	Low	High
Complexity	Simpler	More complex
Accuracy	More faithful to source	May introduce errors
Model	TextRank, graph-based	Seq2seq, Transformer

TextRank Pipeline

Step-by-step extractive summarization:

Combine articles → full text
Split sentences
Convert sentences → vectors (embeddings)
Create similarity matrix
Build graph (sentences = nodes, edges = similarity)
Rank nodes using TextRank algorithm
Select top-ranked sentences → Summary

This is the classical algorithm that dominated before deep learning!

Syntax and Semantics Review

Syntax – Sentence Structure

Syntax examines how words combine to form grammatically correct sentences.

Includes:

Word order: English uses S–V–O (Subject–Verb–Object)
Phrase structure: NP (Noun Phrase), VP (Verb Phrase), PP (Prepositional Phrase)
Dependency relations: How words relate to each other

NLP Relevance:

POS tagging
Parsing
Named Entity Recognition
Machine translation
Question answering

Semantics – Meaning of Words and Sentences

Semantics focuses on meaning independent of external context.

Includes:

Lexical semantics: Word meaning
Compositional semantics: Sentence meaning
Synonymy / antonymy: Similar/opposite meanings
Hypernymy / hyponymy: General/specific relationships

NLP Relevance:

Word embeddings
Similarity measures
Semantic search
Text classification

Pragmatics – Intended Meaning in Context

Pragmatics studies meaning from context, speaker intention, and real-world knowledge.

Covers:

Implicature: Hidden meaning
Deixis: Context-dependent references (this/that/here/you)
Speech acts: Promises, requests, apologies
Politeness, formality, sarcasm: Tone and intention

NLP Relevance:

Dialogue systems
Chatbots
Sentiment and irony detection
Contextual language models (BERT, GPT)