Sunday, January 26, 2025

LLMs, AI, Parameters, And Tokens -- Not Ready For Prime Time -- January 26, 2025

Locator: 44777AI.

Updates

January 26, 2025: Link here.

From the linked article:

SINGAPORE—A Chinese artificial-intelligence company has Silicon Valley marveling at how its programmers nearly matched American rivals despite using inferior chips.
AI models from DeepSeek, the Chinese company, have zoomed to the global top 10 in performance, according to a popular ranking, suggesting Washington’s export curbs are having difficulty blocking rapid advances in China.
On January 20, 2025, DeepSeek introduced R1, a specialized model designed for complex problem-solving.
“Deepseek R1 is one of the most amazing and impressive breakthroughs I’ve ever seen,” said Marc Andreessen, the Silicon Valley venture capitalist who has been advising President Trump, in an X post on Friday.
DeepSeek’s development was led by a Chinese hedge-fund manager, Liang Wenfeng, who has become the face of the country’s AI push. On January 20, 2025, Liang met China’s premier and discussed how homegrown companies could narrow the gap with the U.S.
Specialists said DeepSeek’s technology still trails that of OpenAI and Google. But it is a close rival despite using fewer and less-advanced chips, and in some cases skipping steps that U.S. developers considered essential.

Original Post

Seemingly out of nowhere, my twitter feed is now filled with references to DeepSeek.

It may be time to buy the book, LLM for Dummies.  Assuming there is such a book, that it's readable, and that it wasn't written by AI. Good luck, LOL.

Having said that, it may be time for folks to take all of this more seriously.

Start with DeepSeek at wiki, and then follow that rabbit hole through Altman, parameters and tokens.

DeepSeek, wiki

DeepSeek is a Chinese artificial intelligence company which develops open-source large language models. DeepSeek is solely funded by Chinese hedge fund High-Flyer, which was also founded by Liang Wenfeng, with both based in Hangzhou, Zhejiang.

 So far, DeepSeek is focused solely on research and has no detailed plans for commercialization.

The code for the model was made open-source under the MIT license, with an additional license agreement regarding "open and responsible downstream usage" for the model itself.

In December 2024 DeepSeek-V3 was launched. It came with 671 billion parameters and trained in around 55 days at a cost of US$5.58 million, using significantly less resources compared to its peers. It was trained on a dataset of 14.8 trillion tokens

Benchmark tests showed it outperformed Llama 3.1 and Qwen 2.5 while matching GPT-4o and Claude 3.5 Sonnet.

DeepSeek's optimization on limited resources highlighted potential limits of US sanctions on China's AI development.

An opinion piece by The Hill described the release as American AI reaching its Sputnik moment.

Meanwhile, other sources:

Parameters: GPT-4: estimated to have 1.8 trillion parameters. If true, greatly exceeds all other LLMs. The operative word is "greatly." Compare with DeepSeek-V3 above, 671 billion parameters. Think of a "parameter" as a way for a machine to "consider" a token. Think of all the ways a "period" might be used in a document: to end a sentence; to end a phrase; to end an abbreviation; part of an ellipsis; division within a URL; sometimes used with abbreviations, sometimes not (think state abbreviations); Morse code, dot on top of an "i"; dot on top of a semicolon; two dotes make a colon; dots on top of letters (diacritics).

A "space" is even more difficult: what defines a "space." How can AI tell the difference between one space and two spaces -- why isn't it one long space? 

Tokens are the smallest unit of data that a model processes, such as the "space" between words or the "period" at the end of a sentence. GPT-4 was trained on roughly 13 trillion tokens, which is roughtly 10 trillion words.

Then, think about this: LLMs train on existing documents. If one connects the dots, it's just a matter of time before "English" becomes the one and only global language.

Tokens:

Sam Altman. The Atlantic had an incredibly good article on AI within the last 24 months. I'll see if I can find the link. 

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.