Hernando Abella's Website

Understanding these building blocks helps developers move beyond simply using AI APIs and begin understanding how AI models actually process and generate language.

When you send a message to an AI model, it doesn't actually understand words the way humans do. Instead, it converts text into numerical representations, analyzes relationships between those numbers, and predicts the most likely response.

Why These Concepts Matter

The AI processing pipeline follows this pattern:

📝Text InputUser prompt

→

🔤TokensText → pieces

→

📊EmbeddingsTokens → vectors

→

🎯AttentionWeigh relationships

→

🧠AI ProcessingNeural layers

→

✨Generated OutputResponse

Each step plays a critical role in helping the model understand and generate language.

What Are Tokens?

Tokens are the smallest pieces of text that an AI model processes.

A complete wordPart of a wordA punctuation markA numberA symbol

Example Tokenization

Input: "Artificial Intelligence is amazing."
Tokens: ["Artificial", " Intelligence", " is", " amazing", "."]

Token Limits

Content	Approx. Tokens
One sentence	10–30
One paragraph	100–300
One page	500–1000
Large article	Several thousand

What Are Embeddings?

Once text is converted into tokens, the model transforms those tokens into numerical vectors called embeddings — a mathematical representation of meaning.

Word → Vector

"cat" →

[0.12, -0.44, 0.91, 0.03, -0.67, 0.42, ...]

"kitten" →

[0.10, -0.41, 0.88, 0.05, -0.64, 0.45, ...]

Similarity Map

[semantic space]

🐱 cat

🐕 dog

🚗 car

🏠 house

Similar words → closer together

Why Embeddings Are Powerful

Query vs Document matching:

Query: "How do I learn Python?"

Document: "Best ways to study Python programming."

→ Embeddings reveal the meanings are closely related, even with different wording.

Embeddings in Real Applications: RAG

User Question

↓

Generate Embedding

↓

Search Vector DB

↓

Find Similar Docs

↓

Send to AI

↓

Generate Response

The foundation of Retrieval-Augmented Generation (RAG)

What Is Attention?

Attention is the mechanism that allows AI models to determine which words matter most when processing language. Before attention, models struggled with long sentences.

"The cat climbed the tree because it was scared."

Thecatclimbedthetreebecauseitwasscared.

↓ Attention on "it" ↓

cat: ████████tree: ██

it most likely refers to cat → higher attention score

Self-Attention

Modern transformer models use self-attention, allowing every token to examine every other token:

Token A ↔ Token BToken A ↔ Token CToken B ↔ Token C

→ Creates a rich network of relationships across the entire sentence.

Why Transformers Were Revolutionary

📄"Attention Is All You Need"

Language Understanding = Attention

Instead of processing words sequentially, transformers analyze relationships between all words simultaneously.

Bringing It All Together

When you send a prompt to an AI model:

User Prompt

↓

Tokenization

↓

Tokens

↓

Embeddings

↓

Attention Layers

↓

Neural Processing

↓

Predicted Next Token

↓

Generated Response

A Real-World Analogy

📖

Tokens

Individual words on the page

🧠

Embeddings

Your understanding of each word's meaning

🎯

Attention

Focusing on important words to understand the story

Key Takeaways

🔤

Tokens

break text into manageable pieces that AI models can process.

📊

Embeddings

convert tokens into numerical representations of meaning.

🎯

Attention

helps the model determine which words are most relevant to one another.

🏗️

Transformers

combine all three concepts to understand and generate language.

⚙️

Real-world tech

like semantic search, RAG systems, AI assistants, and chatbots all rely heavily on embeddings and attention.

These three concepts form the foundation of modern AI. Once you understand them, topics such as transformers, vector databases, prompt engineering, and large language models become much easier to learn and apply in real-world projects.

📘 Ready to go deeper?

Generative AI with Python

Master RAG pipelines, AI agents, tool calling, vector databases, and multimodal systems — with hands-on code throughout.

🔍 RAG & Vector DBs🤖 AI Agents🛠 Tool Calling🖼 Multimodal AI

Get it on Amazon →

Tokens, Embeddings, and Attention: The Building Blocks of AI

Why These Concepts Matter

What Are Tokens?

Token Limits

What Are Embeddings?

Why Embeddings Are Powerful

Embeddings in Real Applications: RAG

What Is Attention?

Self-Attention

Why Transformers Were Revolutionary

Bringing It All Together

A Real-World Analogy

Key Takeaways

Generative AI with Python