We have updated our Privacy Policy, click here for more information.

Contact

    Thank you

    THE FIRST VIEW SERIES:

    DeepSeek – What is all the fuss about?

    DeepSeek certainly put the cat among the AI pigeons last week when they released their R1 model, trained at a fraction of the cost of OpenAI and Anthropic. This challenged the current thinking that you needed huge budgets to make progress in model training. Did people really think that the vast field of Artificial General Intelligence wasn’t going to get cheaper and more efficient? Surely that is a good thing.

    In business-to-business contexts, human oversight remains crucial. Generative AI excels as an augmentative tool, empowering human capabilities rather than replacing them entirely. This shift is evident in the growing adoption of domain-specific models like Moody’s Research Assistant. These specialized models, designed for specific tasks within a particular industry or domain, demonstrate the increasing value of AI as a powerful tool that enhances human expertise and decision-making.

    DeepSeek has done something pretty clever and brought new thinking to the space not only in their model design but also in the use of infrastructure. Forced into the latter by their lack of access to the latest Nvidia chips. We will see increasing use of small language models (50 – 100 billion parameters) for specific tasks e.g. for the rapid sentiment analysis of financial news.

    The R1 model broke away from the generally accepted approach to use two techniques – generation and retrieval. It’s a bit like using ChatGPT and standard Google search at the same time. This solved one of the obvious flaws of these big models – the aging of their training. If the model was trained in 2023 it can not answer a question relating to events in 2024 as they do not feature in its intelligence but the R1 model recognises the gap and ‘googles’ the answer (of course it doesn’t actually use Google).

    I asked ChatGPT ‘what is DeepSeek’, I get:
    “It looks like there might be some confusion around the term DeepSeek, as it doesn’t refer to a widely known or standardised product in the way that ChatGPT does”

    If I ask Gemini the same question, I get:
    “DeepSeek is a Chinese artificial intelligence company that has gained attention for developing efficient and powerful large language models (LLMs)”

    So you can see Google is integrating its LLM technology with its retrieval technology.

    DeepSeek also used a segregation method called Mixture of Experts – this is a way of using elements of the model to respond to specific questions. DeepSeek has around 671 billion parameters (ChatGPT4 has 1.8 trillion). Ask DeepSeek a question and it allocates the most appropriate resources to answer.

    Google have a slightly different approach, asked how many parameters it uses it responds with:
    “So, while I’m a very large model, and it’s accurate to say I have a massive number of “parameters” in the broader sense, giving a specific number isn’t really possible or meaningful in this context. It’s more about the overall capacity and how that capacity is used dynamically.”

    The value of Mixture of Experts is, it is cheaper and faster to respond, making it more scalable.

    Why all of this was a shock to the AI world, or rather the investing world, is a bit of a mystery. Actually I have a theory about that, but that’s for another time. In my view R1 is another step forward in realising the potential of AI and shows what we have been saying for a while – ‘AI’ is not just about LLMs but a combination of methods, where the best results are obtained using a composite of techniques.

    Explore more from the series

    The latest insights, perspectives and analysis

    FIRST VIEW

    Sign up to the mailing list