๐Ÿฅš
์‚ถ์€AIAI ์‹ค์ „ ๊ฐ€์ด๋“œ 120ํŽธ
๋ชฉ๋ก์œผ๋กœ
๐Ÿ“š AI Basics

How Was ChatGPT Trained? Understanding AI's Data Learning Process Simply

How did ChatGPT become so smart? I'll explain AI's learning process in simple terms. Understanding the principles helps you use it better!

Hello!

Have you ever asked ChatGPT "How did you become so smart?"

AI answers, but it's full of difficult terms...

Today I'll explain how AI is trained so even elementary students can understand!

AI Grows by Eating Data

Humans learn by reading books and through experiences, right?

AI is similar.

Except it grows by eating data!

๐Ÿ’ก What is data?

  • Text (news articles, blogs, Wikipedia, etc.)
  • Images (cat photos, landscape photos, etc.)
  • Audio (recordings of people talking)
  • Video (YouTube videos)

AI looks at billions of these data pieces and finds patterns!

How Did ChatGPT Learn?

Stage 1: Reading Internet Text

1. ๐Ÿ“š Reading Massive Amounts of Text

ChatGPT read an enormous amount of text from the internet:

  • Entire Wikipedia
  • Millions of news articles
  • Blog posts, forum posts
  • Books, papers, code

While reading all this, it learned patterns like "this kind of answer usually follows this kind of question!"

Stage 2: Getting Human Feedback

2. ๐Ÿ‘ Improving Through Feedback

But just looking at data makes AI give strange answers sometimes.

So people evaluate it: this is a good answer / this is a bad answer.

AI continuously improves based on this feedback!

Stage 3: Testing and Fixing

3. ๐Ÿ”„ Repeated Testing

Through tens of thousands of tests, it reduces bad answers and increases helpful ones.

Like building skills by continuously practicing test problems!

How is Image AI Trained?

Image generation AI like DALL-E and Midjourney work similarly!

๐Ÿ’ก Image AI Learning Process

Stage 1: Sees hundreds of millions of images with text descriptions

  • Photo: ๐Ÿฑ
  • Text: "An orange cat sitting on a sofa"

Stage 2: Learns the relationship between the words "orange cat" and actual cat shapes

Stage 3: Creates drawings even for unseen requests by combining

  • "Cat in a spacesuit" โ†’ Never seen it, but can combine!

Does More Data Always Make AI Better?

Benefits of Having Lots of Data

  • โœ… Can handle more diverse situations
  • โœ… Higher accuracy
  • โœ… Improved creative combination ability

For example, ChatGPT can answer various questions because it read an enormous amount of internet text.

But There Are Problems Too

1. Learns Bad Data Too

The internet has good information but also misinformation and biased content.

AI can't distinguish and learns everything, so sometimes it gives strange answers.

2. Privacy Issues

If training data includes personal information, problems can arise.

That's why nowadays they only train on data with personal info removed.

3. Requires Enormous Computing Resources

AI training needs thousands of high-performance computers running for months.

Electricity costs alone can reach tens of billions of won!

๐Ÿ’ก GPT-3 Training Cost

Training GPT-3 once cost approximately $4.6 million (5 billion won)!


Why Human Feedback is Important

Data alone isn't enough for AI. Human feedback is essential!

Reinforcement Learning

When AI gives an answer, humans evaluate it:

  • ๐Ÿ‘ "This answer is good" โ†’ AI learns to respond this way
  • ๐Ÿ‘Ž "This answer is bad" โ†’ AI learns to avoid such responses

Repeating this process tens of thousands of times makes AI smarter!

RLHF (Reinforcement Learning from Human Feedback)

This is especially why ChatGPT excels.

Process:

  1. AI generates various responses
  2. Humans rank "which answer is better"
  3. AI learns the style of highly-ranked responses
  4. Repeat!

This way AI gives responses humans prefer.


Does AI Keep Learning?

AI like ChatGPT is in a finished trained state.

Training vs Usage

Training Stage:

  • Learns patterns by looking at enormous data
  • Takes several months
  • Massive cost

Usage Stage (Inference):

  • We use the trained AI
  • Doesn't learn new things
  • Only responds based on what it learned

๐Ÿ’ก ChatGPT remembers conversations but doesn't learn from them!

Your conversations are only remembered "during the session"โ€”the AI itself doesn't learn from them.

How Do Updates Work?

Companies like OpenAI create new versions.

  • GPT-3 โ†’ GPT-3.5 โ†’ GPT-4 โ†’ GPT-4o

Each version is retrained from scratch with new data!


Three Types of AI Learning

1. Supervised Learning

Learning from data with correct answers

Example: Training on 1000 cat photos labeled "cat"

Use cases:

  • Email spam filters (spam / not spam)
  • Translation (English โ†’ Korean answer pairs)
  • Speech recognition (sound โ†’ text)

2. Unsupervised Learning

Finding patterns without answers

Example: Automatically grouping customers by analyzing customer data

Use cases:

  • Recommendation systems (finding similar movies)
  • Anomaly detection (finding unusual patterns)

3. Reinforcement Learning

Learning through trial and error

Example: Reward for winning a game, penalty for losing

Use cases:

  • Game AI (AlphaGo, chess AI)
  • Autonomous driving (reward for safe driving)
  • Improving ChatGPT's conversation quality

Data Quality Matters More

"Garbage In, Garbage Out"

No matter how much data, AI becomes bad if quality is poor.

Good Data Requirements

  1. Must be accurate

    • AI gets things wrong if incorrect information is mixed in
  2. Must be diverse

    • Can't speak English if only Korean data is used
  3. Must not be biased

    • Data with only one perspective creates biased AI
  4. Must be current

    • Training only on old data means not knowing current information

Peek at the Actual Training Process

Let's see how ChatGPT was made, step by step?

Stage 1: Pre-training

  • Data: Hundreds of billions of words from internet text
  • Goal: Learn basic language patterns
  • Duration: Several months
  • Cost: Billions of won

At this stage, AI endlessly practices "predicting the next word."

Example:

Input: "Today's weather is really"
AI prediction: "nice" (70%), "bad" (20%), "strange" (10%)

Stage 2: Supervised Fine-tuning

  • Data: Tens of thousands of high-quality conversation examples written by humans
  • Goal: Learn helpful response styles
  • Duration: Several weeks

At this stage, AI learns "what a good answer is."

Stage 3: RLHF (Human Feedback Reinforcement Learning)

  • Data: Hundreds of thousands of human evaluations
  • Goal: Generate responses humans prefer
  • Duration: Several weeks

At this stage, AI learns "what kind of responses people like."


Ethical Concerns

AI training involves many ethical considerations.

1. Copyright Issues

If AI learned from internet text, is that copyright infringement?

Still under debate.

2. Bias Problems

If training data has biases, AI becomes biased too.

Example: Stereotypes about certain genders or races

3. Environmental Issues

AI training uses enormous amounts of electricity.

There are concerns about environmental impact.


Wrapping Up

Do you now have a sense of how AI learns?

Key Summary:

  • AI learns patterns by looking at enormous data
  • Continuously improves with human feedback
  • Training requires massive cost and time
  • Data quality determines AI quality

In the next article, I'll clearly distinguish "what AI is good at and what it's not"!

To utilize AI properly, you need to know its limitations.


Next Article Preview: ๐Ÿ“Œ AI's Strengths and Limits โ€“ Understanding Expectations and Reality