How Does ChatGPT Works Under the Hood?

A Deep Dive for Software Engineers

Tech Truths
5 min readApr 1, 2024

Medium Non-Members Can Read The Full Story Here

ChatGPT has been around for a while now, but its ability for human like conversations continues to amaze us still. This powerful tool leverages the cutting-edge world of Large Language Models (LLMs) to create a truly engaging user experience.

Photo by Andrew Neel on Unsplash

But for software engineers curious about the inner workings, the question remains: how exactly does ChatGPT function under the hood? Let’s learn how:

Under the Hood of LLMs: Demystifying GPT-3.5

At the heart of ChatGPT lies a powerhouse LLM known as GPT-3.5. This isn’t your average program; it’s a complex neural network architecture meticulously trained on a colossal dataset of text and code.

Imagine a web of interconnected artificial neurons, mimicking the human brain’s structure. This network is then bombarded with a staggering amount of text data, encompassing everything from books and articles to the ever-evolving landscape of online conversations.

As OpenAI’s researchers appropriately put it:

“GPT-3.5 is a descendant of a line of increasingly powerful language models”

Photo by Andrew Neel on Unsplash

The Power of Data: Diving Deeper into Training

One of the key factors behind ChatGPT’s remarkable abilities is the sheer volume and variety of text data it’s been trained on. We’re talking about a staggering 500 billion tokens, which translates to hundreds of billions of words.

This isn’t just a random collection of text; the data is carefully curated and preprocessed to ensure the model ingests high-quality information.

Software engineers will recognize techniques like tokenization, where text is broken down into smaller units, and word embedding, where words are converted into numerical vectors to facilitate learning by the neural network.

“The amount of data you train a language model on is critically important,” says Ilya Sutskever, co-founder and Chief Scientist at OpenAI

This exposure allows GPT-3.5 to identify intricate statistical patterns and relationships between words. By understanding these connections, the model can not only predict the next word in a sequence but also generate human-quality text that flows naturally and adheres to grammatical rules.

Fine-tuning the Machine: Human Feedback Shapes ChatGPT’s Responses

Although LLMs represent remarkable achievements in engineering, their outputs can occasionally be inaccurate or even offensive.

To address this, ChatGPT incorporates a revolutionary technique called Reinforcement Learning from Human Feedback (RLHF).

Here’s how RLHF adds a human touch into ChatGPT’s responses, utilizing techniques familiar to many software engineers:

  • Active Learning and Human Evaluation: Human evaluators play a crucial role in shaping the model’s behavior. People like you and me provide valuable input by evaluating the quality and appropriateness of responses generated by the LLM for specific prompts. This process, helps the model identify its weaknesses and areas for improvement.
  • Implementing a Reward Model with Policy Gradients Based on human input, a reward system is established. This system assigns higher rewards to responses that are deemed superior by human evaluators. Software engineers will recognize similarities to policy gradients used in reinforcement learning algorithms. Here, the policy being optimized is the behavior of the LLM itself.
  • Learning Through Play — Proximal Policy Optimization (PPO) in Action: Imagine playing a game where you’re constantly refining your skills. That’s essentially what PPO does. The LLM is presented with slightly modified versions of prompts and receives feedback on the generated responses. Through this iterative process, the LLM progressively improves its ability to craft responses that resonate with human preferences. PPO, a type of policy gradient method, ensures that the learning process is stable and the model doesn’t forget what it has already learned. As OpenAI’s researchers explain, “PPO ensures that the updates we make to the policy are safe and don’t result in the model forgetting everything it has learned so far”.
Photo by Mariia Shalabaieva on Unsplash

The User Experience: Putting it All Together

Interacting with ChatGPT is a breeze. You simply provide a prompt or question, and ChatGPT does its magic. But behind the scenes, a fascinating dance unfolds, involving techniques software engineers will find familiar:

Interacting with ChatGPT is seamless. You provide a prompt or question, and ChatGPT effortlessly responds. But behind the scenes, a captivating process takes place, involving techniques familiar to software engineers.

  1. Context is King: Prompt Integration with Stateful Attention
    The prompt you enter, along with the history of your conversation (to provide context), is fed into the fine-tuned GPT-3.5 model. This leverages the concept of attention mechanisms, where the model focuses on relevant parts of the input sequence when generating a response. This ensures that ChatGPT understands the nuances of your request and responds accordingly, maintaining a coherent conversation flow.
  2. The Art of Conversation: Primer Engineering and Soft Decoding
    Additional instructions, known as primer engineering, are included to guide the model towards generating a response that feels natural and conversational. Software engineers might recognize similarities to techniques used in natural language generation tasks, where specific priming text can influence the style and content of the output. Here, the primers provide subtle cues about the desired tone and formality of the response.
  3. Keeping it Safe: Moderation Takes the Stage with Safety Filters
    Finally, the generated response undergoes a safety check through a moderation API. This API leverages techniques like regular expression matching and pre-trained filtering models to identify and flag potentially harmful or unsafe content. This ensures that the response adheres to safety guidelines before being presented to you.

Key Takeaway

Essentially, ChatGPT seamlessly merges the capabilities of LLMs such as GPT-3.5 with the invaluable knowledge derived from human feedback via RLHF.

This powerful blend, combined with advanced software engineering methods, enables ChatGPT not only to produce informative and thorough responses but also to present them in a manner that is both captivating and conversational.

So, when you engage with ChatGPT, keep in mind the remarkable technical innovation happening behind the scenes, showcasing the potential of artificial intelligence and the creativity of software engineers!

Hello, we write about the latest trends, updates, and innovations in the tech space. To stay on top of our latest articles, follow us on Tech Truths. And, to have stories sent directly to you, subscribe to the newsletter.👇

--

--