How DeepSeek R1 Was Trained: A Simple Guide
Blog post description.
1/22/20251 min read
My post content
DeepSeek AI has created a new AI model called DeepSeek R1 that's really good at solving complex problems. Let's break down how they built it in simple terms.
What's New About Their Approach?
DeepSeek used a new method called Group Relative Policy Optimization (GRPO) to train their AI.
Think of GRPO like a teacher who:
Gives the same question to multiple students
Looks at all their answers
Compares them to find out which approaches worked best
Uses this information to help everyone improve
This is different from older methods because it's simpler and uses less computer memory while still being effective.
The Training Process
DeepSeek trained their AI in four main steps:
They started by showing their AI lots of well-written solutions to problems. This is like giving a student good examples to learn from.
Step 1: Learning from Examples
Step 2: Practice Makes Perfect
They then had the AI solve lots of math and coding problems. The AI got feedback on its answers and learned from its mistakes. They made sure the AI stuck to using one language at a time to avoid confusion.
Step 3: Creating More Practice Material
The team created a huge collection of practice problems. They used their AI to generate new examples and had another AI check if these examples were good enough to use for training.
Step 3: Creating More Practice Material
The team created a huge collection of practice problems. They used their AI to generate new examples and had another AI check if these examples were good enough to use for training.
Lets build Future Together
Prcept AI
Prcept AI LLP, 5/10 A,
Ashok Vihar Phase 3 Extn,
Gurugram, Haryana,
India 122001
CONTACT US
+91 9860934576
info@prcept.com
praveen@prcept.com