How Was ChatGPT Trained? Understanding AI's Data Learning Process Simply
How did ChatGPT become so smart? I'll explain AI's learning process in simple terms. Understanding the principles helps you use it better!
Hello!
Have you ever asked ChatGPT "How did you become so smart?"
AI answers, but it's full of difficult terms...
Today I'll explain how AI is trained so even elementary students can understand!
AI Grows by Eating Data
Humans learn by reading books and through experiences, right?
AI is similar.
Except it grows by eating data!
๐ก What is data?
- Text (news articles, blogs, Wikipedia, etc.)
- Images (cat photos, landscape photos, etc.)
- Audio (recordings of people talking)
- Video (YouTube videos)
AI looks at billions of these data pieces and finds patterns!
How Did ChatGPT Learn?
Stage 1: Reading Internet Text
1. ๐ Reading Massive Amounts of Text
ChatGPT read an enormous amount of text from the internet:
- Entire Wikipedia
- Millions of news articles
- Blog posts, forum posts
- Books, papers, code
While reading all this, it learned patterns like "this kind of answer usually follows this kind of question!"
Stage 2: Getting Human Feedback
2. ๐ Improving Through Feedback
But just looking at data makes AI give strange answers sometimes.
So people evaluate it: this is a good answer / this is a bad answer.
AI continuously improves based on this feedback!
Stage 3: Testing and Fixing
3. ๐ Repeated Testing
Through tens of thousands of tests, it reduces bad answers and increases helpful ones.
Like building skills by continuously practicing test problems!
How is Image AI Trained?
Image generation AI like DALL-E and Midjourney work similarly!
๐ก Image AI Learning Process
Stage 1: Sees hundreds of millions of images with text descriptions
- Photo: ๐ฑ
- Text: "An orange cat sitting on a sofa"
Stage 2: Learns the relationship between the words "orange cat" and actual cat shapes
Stage 3: Creates drawings even for unseen requests by combining
- "Cat in a spacesuit" โ Never seen it, but can combine!
Does More Data Always Make AI Better?
Benefits of Having Lots of Data
- โ Can handle more diverse situations
- โ Higher accuracy
- โ Improved creative combination ability
For example, ChatGPT can answer various questions because it read an enormous amount of internet text.
But There Are Problems Too
1. Learns Bad Data Too
The internet has good information but also misinformation and biased content.
AI can't distinguish and learns everything, so sometimes it gives strange answers.
2. Privacy Issues
If training data includes personal information, problems can arise.
That's why nowadays they only train on data with personal info removed.
3. Requires Enormous Computing Resources
AI training needs thousands of high-performance computers running for months.
Electricity costs alone can reach tens of billions of won!
๐ก GPT-3 Training Cost
Training GPT-3 once cost approximately $4.6 million (5 billion won)!
Why Human Feedback is Important
Data alone isn't enough for AI. Human feedback is essential!
Reinforcement Learning
When AI gives an answer, humans evaluate it:
- ๐ "This answer is good" โ AI learns to respond this way
- ๐ "This answer is bad" โ AI learns to avoid such responses
Repeating this process tens of thousands of times makes AI smarter!
RLHF (Reinforcement Learning from Human Feedback)
This is especially why ChatGPT excels.
Process:
- AI generates various responses
- Humans rank "which answer is better"
- AI learns the style of highly-ranked responses
- Repeat!
This way AI gives responses humans prefer.
Does AI Keep Learning?
AI like ChatGPT is in a finished trained state.
Training vs Usage
Training Stage:
- Learns patterns by looking at enormous data
- Takes several months
- Massive cost
Usage Stage (Inference):
- We use the trained AI
- Doesn't learn new things
- Only responds based on what it learned
๐ก ChatGPT remembers conversations but doesn't learn from them!
Your conversations are only remembered "during the session"โthe AI itself doesn't learn from them.
How Do Updates Work?
Companies like OpenAI create new versions.
- GPT-3 โ GPT-3.5 โ GPT-4 โ GPT-4o
Each version is retrained from scratch with new data!
Three Types of AI Learning
1. Supervised Learning
Learning from data with correct answers
Example: Training on 1000 cat photos labeled "cat"
Use cases:
- Email spam filters (spam / not spam)
- Translation (English โ Korean answer pairs)
- Speech recognition (sound โ text)
2. Unsupervised Learning
Finding patterns without answers
Example: Automatically grouping customers by analyzing customer data
Use cases:
- Recommendation systems (finding similar movies)
- Anomaly detection (finding unusual patterns)
3. Reinforcement Learning
Learning through trial and error
Example: Reward for winning a game, penalty for losing
Use cases:
- Game AI (AlphaGo, chess AI)
- Autonomous driving (reward for safe driving)
- Improving ChatGPT's conversation quality
Data Quality Matters More
"Garbage In, Garbage Out"
No matter how much data, AI becomes bad if quality is poor.
Good Data Requirements
-
Must be accurate
- AI gets things wrong if incorrect information is mixed in
-
Must be diverse
- Can't speak English if only Korean data is used
-
Must not be biased
- Data with only one perspective creates biased AI
-
Must be current
- Training only on old data means not knowing current information
Peek at the Actual Training Process
Let's see how ChatGPT was made, step by step?
Stage 1: Pre-training
- Data: Hundreds of billions of words from internet text
- Goal: Learn basic language patterns
- Duration: Several months
- Cost: Billions of won
At this stage, AI endlessly practices "predicting the next word."
Example:
Input: "Today's weather is really"
AI prediction: "nice" (70%), "bad" (20%), "strange" (10%)
Stage 2: Supervised Fine-tuning
- Data: Tens of thousands of high-quality conversation examples written by humans
- Goal: Learn helpful response styles
- Duration: Several weeks
At this stage, AI learns "what a good answer is."
Stage 3: RLHF (Human Feedback Reinforcement Learning)
- Data: Hundreds of thousands of human evaluations
- Goal: Generate responses humans prefer
- Duration: Several weeks
At this stage, AI learns "what kind of responses people like."
Ethical Concerns
AI training involves many ethical considerations.
1. Copyright Issues
If AI learned from internet text, is that copyright infringement?
Still under debate.
2. Bias Problems
If training data has biases, AI becomes biased too.
Example: Stereotypes about certain genders or races
3. Environmental Issues
AI training uses enormous amounts of electricity.
There are concerns about environmental impact.
Wrapping Up
Do you now have a sense of how AI learns?
Key Summary:
- AI learns patterns by looking at enormous data
- Continuously improves with human feedback
- Training requires massive cost and time
- Data quality determines AI quality
In the next article, I'll clearly distinguish "what AI is good at and what it's not"!
To utilize AI properly, you need to know its limitations.
Next Article Preview: ๐ AI's Strengths and Limits โ Understanding Expectations and Reality