Understanding LLMs, RAG and Apple Intelligence
How does Siri know when my mom’s flight lands anyway?
With the launch of the new iPhones last week, Apple has been talking up Apple Intelligence way more than what they had done during WWDC. It’s everywhere - in their promotional material, three new ads and even the official tag line for this year’s iPhones - “Hello, Apple Intelligence!”
If you have watched some of the interviews Apple executives have given about Apple Intelligence, by now you must have at least once heard the following words - “personal context” and “semantic index”. I have spent some time wondering what these words mean and it made me curious about how Apple Intelligence would work behind the scenes. Around the same time, I have been building an AI product where I came across some very similar terms. This has helped me piece together how Apple Intelligence would probably work and I am here to share it with you.
This is AI explained, for the rest of us!
But wait, what is an LLM?
Let’s start from the very basics. If you haven’t been living under a rock, you would have heard people calling ChatGPT and similar products as a LLM or a Large Language Model.
A Large Language Model is a model (or algorithm) that is trained on large sets of data so that it can understand and generate human language. The data set it’s trained on helps the model in understanding human questions, hold conversations and create text similar to how a human would. It helps when the LLM is trained with as much data as possible.
If you used the earlier versions of ChatGPT, like GPT 3.5, you would have noticed that while the LLM was great at answering existential questions like “What do I do with my life?”, it would fail miserably at answering questions like “Which is the latest iPhone?” This was because these LLMs had a knowledge cutoff date, i.e. the LLM was trained with real-world data only up to a certain point, in the case of the initial version of GPT 3.5, only till September 2021. So if you asked it for the latest iPhone in 2024, it would tell you that it’s the 13 Pro Max.
Now imagine if Apple Intelligence was built on this basic LLM. If you ask Siri when your mom’s flight landed, instead of the frustrating “I found some web results”, it would probably just reply with an even more frustrating “I only have knowledge of events till 2021”
But if you have used ChatGPT any time recently, you would know that it’s accurately able to tell you that the latest iPhone model is the iPhone 16 Pro Max and not the 13 Pro Max, so how did that work?
With recent generations of ChatGPT, GPT 4 and later, OpenAI has added a layer of web-scraping functionality to ChatGPT. So whenever the model figures out that it doesn’t have data within its trained database, it immediately starts searching the web for the same query, fetches the top results, scrapes relevant textual information (remember that a LLM is very good at processing text) and parses that into a coherent answer for you.
This is great and helps when you want to know about the latest iPhone or the newest song from Sabrina Carpenter. But let’s come back to the question of when your mom’s flight is landing. If Apple Intelligence was built to scrape the web and your phone content every time on-the-fly, it will firstly cause a massive battery drain, but also be so slow that you might as well call your mom and figure it out.
So what is the best way to do this?
Enter RAG
When trying to build whatever I was trying to build (more on that some other day), I was trying to overcome some very similar limitations with LLM. That’s when I came across what a RAG is and I strongly think Apple would have done something similar (we wouldn’t really know for sure unless someone from Apple lets us know!).
So what is a RAG?
RAG stands for Retrieval-Augmented Generation. The main purpose of RAG is to enable an LLM to use data outside of its trained dataset. Whenever you have new data, instead of re-training your whole model (a very expensive, time-consuming process), you can instead just plug in these new data points which are processed and embedded into a “vector database”. So when the model needs to look beyond its trained data, it simply queries this vector database and is able to retrieve content and is able to coherently piece it together to give the right response.
Huh, how is this any different from reading it in real-time?
When data is embedded into a vector database, it does not mean we are simply storing text into a file. What the model does is that it starts forming “semantic relationships” and “context” - what this means is that the model tries to co-relate and link elements together. This would be much more difficult to do on-the-fly as time would be a huge constraint.
These “semantic relationships” play a huge role in helping the model understand important parameters. For example, when you ask about your mom’s flight, the model already knows who your “mom” is, something derived out of the relationships it would have established earlier.
This also means that there’s greater scope for vocabulary mismatch - even if you say something similar but not the same, the model is able to understand that - which is similar to what we saw in Apple’s ads.
Vector database also enables combining data from multiple sources quickly - so in this case, the model would be able to compare data from Contacts, Messages, Notes etc.
Similar to how Spotlight indexes content today, I believe that Apple Intelligence will index content from your data constantly, helping it build this semantic index so that every time you make a request, the model is able to quickly retrieve the relevant data.
The biggest advantage of RAG is that you can constantly update this data over and over. So if your mom’s flight details have changed, that data can feed into the model too and there is no training overhead involved.
This way, RAG is the perfect middle ground between a LLM and web scraping.
A word of caution - all of the ideas here are based on my understanding and study of how LLMs and RAG works. Apple has no official documentation on how Apple Intelligence works behind the scenes, so it’s possible that I am completely wrong about my explanation. Like people say about ChatGPT, this whole article could be complete hallucination! But hey, at least I learnt something new about the world of ML and AI. Until next time!


