Hernando Abella
Deep DiveAI FundamentalsLLMsArchitecture

Tokens, Embeddings, and Attention: The Building Blocks of AI

Modern AI systems can write articles, answer questions, and generate code. Behind these capabilities are three fundamental concepts: tokens, embeddings, and attention.

📖 12 min read🧑‍💻 Hernando Abella🧠 Foundational
Key TechnologiesPythonTransformersPyTorchHugging Face

Understanding these building blocks helps developers move beyond simply using AI APIs and begin understanding how AI models actually process and generate language.

When you send a message to an AI model, it doesn't actually understand words the way humans do. Instead, it converts text into numerical representations, analyzes relationships between those numbers, and predicts the most likely response.


Why These Concepts Matter

The AI processing pipeline follows this pattern:

📝Text InputUser prompt
🔤TokensText → pieces
📊EmbeddingsTokens → vectors
🎯AttentionWeigh relationships
🧠AI ProcessingNeural layers
Generated OutputResponse

Each step plays a critical role in helping the model understand and generate language.


What Are Tokens?

Tokens are the smallest pieces of text that an AI model processes.

A complete wordPart of a wordA punctuation markA numberA symbol
Example Tokenization
Input: "Artificial Intelligence is amazing."
Tokens: ["Artificial", " Intelligence", " is", " amazing", "."]

Token Limits

ContentApprox. Tokens
One sentence10–30
One paragraph100–300
One page500–1000
Large articleSeveral thousand

What Are Embeddings?

Once text is converted into tokens, the model transforms those tokens into numerical vectors called embeddings — a mathematical representation of meaning.

Word → Vector

"cat" →
[0.12, -0.44, 0.91, 0.03, -0.67, 0.42, ...]
"kitten" →
[0.10, -0.41, 0.88, 0.05, -0.64, 0.45, ...]

Similarity Map

[semantic space]
🐱 cat
🐕 dog
🚗 car
🏠 house

Similar words → closer together

Why Embeddings Are Powerful

Query vs Document matching:

Query: "How do I learn Python?"
Document: "Best ways to study Python programming."
→ Embeddings reveal the meanings are closely related, even with different wording.

Embeddings in Real Applications: RAG

1
User Question
2
Generate Embedding
3
Search Vector DB
4
Find Similar Docs
5
Send to AI
6
Generate Response

The foundation of Retrieval-Augmented Generation (RAG)


What Is Attention?

Attention is the mechanism that allows AI models to determine which words matter most when processing language. Before attention, models struggled with long sentences.

"The cat climbed the tree because it was scared."

Thecatclimbedthetreebecauseitwasscared.
↓ Attention on "it" ↓
cat: ████████tree: ██

it most likely refers to cat → higher attention score

Self-Attention

Modern transformer models use self-attention, allowing every token to examine every other token:

Token A ↔ Token BToken A ↔ Token CToken B ↔ Token C

→ Creates a rich network of relationships across the entire sentence.

Why Transformers Were Revolutionary

📄"Attention Is All You Need"

Language Understanding = Attention

Instead of processing words sequentially, transformers analyze relationships between all words simultaneously.


Bringing It All Together

When you send a prompt to an AI model:

1
User Prompt
2
Tokenization
3
Tokens
4
Embeddings
5
Attention Layers
6
Neural Processing
7
Predicted Next Token
8
Generated Response

A Real-World Analogy

📖
Tokens
Individual words on the page
🧠
Embeddings
Your understanding of each word's meaning
🎯
Attention
Focusing on important words to understand the story

Key Takeaways

🔤
Tokens

break text into manageable pieces that AI models can process.

📊
Embeddings

convert tokens into numerical representations of meaning.

🎯
Attention

helps the model determine which words are most relevant to one another.

🏗️
Transformers

combine all three concepts to understand and generate language.

⚙️
Real-world tech

like semantic search, RAG systems, AI assistants, and chatbots all rely heavily on embeddings and attention.

These three concepts form the foundation of modern AI. Once you understand them, topics such as transformers, vector databases, prompt engineering, and large language models become much easier to learn and apply in real-world projects.


📘 Ready to go deeper?

Generative AI with Python

Master RAG pipelines, AI agents, tool calling, vector databases, and multimodal systems — with hands-on code throughout.

🔍 RAG & Vector DBs🤖 AI Agents🛠 Tool Calling🖼 Multimodal AI
Get it on Amazon →
Generative AI with Python book cover