How LLM's Work: The Magical Kitchen
Picture an LLM as a chef in a magical kitchen, crafting responses like a creating your very own custom dish.
For business leaders, generative AI is like a super-smart conversationalist who knows a lot of things. Large Language Models (LLM’s) that power generative AI are like the chef in a magical kitchen , they create responses by combining small pieces of information (called tokens) based on patterns they’ve learned.
When you give a prompt (your order in our analogy), the chef doesn’t follow a single recipe but draws from a massive pantry of ingredients and a massive library of recipes. These are the words, phrases, and patterns learned from vast datasets. These ingredients are tokens, which are the building blocks of language.
The chef picks one token at a time, predicting what comes next based on your prompt and their training. For example, if you ask, “How do I improve sales?”, the chef might start with the tokens To ⇒ improve ⇒ sales, building a coherent response token by token. The result is a tailored answer, like a dish made just for you.
Key point:
The generated output depends on the clarity of your input.
A prompt like “Help with sales” might yield a generic output that applies to a lemonade stand or a high-tech firm that we can affectionately call “AI Slop”, while “Suggest three strategies to boost online shopping cart conversions for a retail business” gets a precise, actionable response that you can do something about.
Sometimes these get mixed up to hilarious effect; but not-so-hilarious when it’s your risk, revenue, or cost outcomes on the line.
Understanding this process sets the stage for mastering prompts. This article will give you a working understanding of several of the ways that LLMs work under the hood.
The next sections go into more detail on three important concepts and I will:
- Demystify tokens , which are the building blocks of LLMs,
- Explore how LLMs use context windows to manage information,
- Look at the concept of fine-tuning and augmentation with external data.
Tokens demystified
In some of your conversations, especially if billing or commercials are involved, you may have seen prices expressed like "$15 per 1M tokens".
But what exactly is a token? And if you asked someone technical, likely they something along the lines of “it depends…”.
Under the covers, LLMs can only work on numbers because the heart of the LLM transformer architecture is mathematical. Matrix multiplication, if you want to get technical.
So, the first step in the LLM process is to convert words into numbers. This is where tokens come in. Tokens are like a map between groups of letters, words, and their corresponding numerical representations.
For example, lets use the words “fantastic” and “fantastically” to consider the differences in how they map to a set of numbers. A single number is also a set of just one number. Also note that there are differences between word capitalisation.
Using the ChatGPT 4o tokenizer (there are different tokenizers) we get the following sets of numbers:
Fantastic⇒104343fantastic⇒47824, 5620Fantastically⇒62430, 629, 2905fantastically⇒47824, 629, 2905
These are the numbers that the LLM does math calculations on. Once the calculations are complete, the numbers get turned back into words for you to read, which you may then copy into your email message, slide, or document.
Notice how the longer word breaks down into multiple tokens. Capital letters and lowercase letters matter a lot. Position in the sentence matters a lot. The tokenizer is designed to break words into smaller parts when it needs to, especially for less common words or variations.
Now, if you make a typo, the tokenizer will break the word down even further, so a simple typo like “fantatsic” is tokenized as 47824, 1838, 291. Completely different numbers!
Tokens generated from typos or unusual formatting lead to unexpected results because the LLM is not working on the actual word but a loosely associated meaning which can share meanings and contexts with other, unrelated things.
Context Windows
Every LLM has a finite number of tokens which is called the “context window”. This is the amount of information the LLM can consider at any one time when generating a response.
Since the tokens are added to the very end of the context window, the oldest tokens get pushed out when the limit is reached. This means that if your prompt plus the generated response exceed the context window size, some of the earlier information is lost.
This matters if you are dropping in large documents (those 50+ page policy documents come to mind!) or have long conversations with the LLM.
As the context window is exceeded, the LLM loses alignment with the original prompt, earlier parts of the conversation, or important details. Getting generic “slop” after a long conversation or when working with large documents is often a sign that the context window has been exceeded.
Every LLM provider publishes the context window size for their models. Check the latest documentation for the model you are using to understand its limits.
As an end user of LLMs, the only thing to do is break the problem down into smaller chunks that fit within the context window and try again. For developers, there are techniques to manage this, but they are out of scope of this article.
Fine-tuning and augmentation
For business leaders, the idea of fine-tuning and augmentation is important because it relates to how you can get better results from an LLM by tailoring it with specificity that you, your team, or your business requires.
Without getting into the technical weeds, fine-tuning is a process where an existing LLM is taken and further trained on specific datasets to make it more knolwedgeable or aligned with specific information. This is like giving the chef in our magical kitchen a special cookbook that focuses on your industry, products, or services.
Similarly, you can use retrieval-augmented generation (RAG) techniques to provide the LLM with access to external data sources during the response generation process. To use our kitchen analogy for the last time, it’s akin to the chef having access to a special pantry stocked with ingredients that use your business data, documents, or knowledge bases.
Both fine-tuning and augmentation improve the relevance and accuracy of the LLM’s responses, making them more useful for specific business applications.
Knowing about these concepts is important, especially as it applies to how you might implement an LLM-based solution. It is the difference between a generic AI assistant that has low trust because it misses important details, versus having an AI assistant that “just gets it” – that’s not magic, it takes dedication and investment.
The tag #ai-for-busy-people is a series of articles designed to guide business executives through a learning journey about AI, Large Language Models (LLMs), and prompting.
My aim here is to empower you to understand AI and apply it effectively in your businesses.