How DeepSeek R1 Was Trained: A Simple Guide

Blog post description.

1/22/20251 min read

My post content

DeepSeek AI has created a new AI model called DeepSeek R1 that's really good at solving complex problems. Let's break down how they built it in simple terms.

What's New About Their Approach?

DeepSeek used a new method called Group Relative Policy Optimization (GRPO) to train their AI.

Think of GRPO like a teacher who:

Gives the same question to multiple students
Looks at all their answers
Compares them to find out which approaches worked best
Uses this information to help everyone improve

This is different from older methods because it's simpler and uses less computer memory while still being effective.

The Training Process

DeepSeek trained their AI in four main steps:

They started by showing their AI lots of well-written solutions to problems. This is like giving a student good examples to learn from.

Step 1: Learning from Examples

Step 2: Practice Makes Perfect

They then had the AI solve lots of math and coding problems. The AI got feedback on its answers and learned from its mistakes. They made sure the AI stuck to using one language at a time to avoid confusion.

Step 3: Creating More Practice Material

The team created a huge collection of practice problems. They used their AI to generate new examples and had another AI check if these examples were good enough to use for training.

Step 3: Creating More Practice Material

The team created a huge collection of practice problems. They used their AI to generate new examples and had another AI check if these examples were good enough to use for training.

Lets build Future Together

Prcept AI

Prcept AI LLP, 5/10 A,
Ashok Vihar Phase 3 Extn,
Gurugram, Haryana,
India 122001

CONTACT US

+91 9860934576

info@prcept.com
praveen@prcept.com

How DeepSeek R1 Was Trained: A Simple Guide

What's New About Their Approach?

The Training Process

Step 1: Learning from Examples

Step 2: Practice Makes Perfect

Step 3: Creating More Practice Material

Step 3: Creating More Practice Material

Lets build Future Together

Prcept AI

CONTACT US

SOCIALS