AI, LLMs still got those data blues

The past weeks have been very heavily tilted towards artificial intelligence (AI) news. Before I cover some of it, a reminder that generative AI (gAI) is not the same as General AI (G-AI). The former is where the model can make some inferences, the latter is an AI system that can perform just like a human across multiple subject areas.

- I recently watched a YouTube video that highlighted the flaws across a range of Large Language Models (LLMs). To see this in action search for "Climate Misinformation From AI" from Tony Heller and watch it. The content isn't as important as the different results provided by the range of AI LLMs used for the example. Each one provided a different result and all were incorrect when compared to actual data sets on the subject. It's important to understand that LLMs are typically trained on what is available from the internet and what has been said or posted there. The example above is an excellent case, because the presenter ran the same query against actual data and posted it. Not long afterwards the new and correct information was then returned by ChatGPT, where the earlier result from that source was incorrect. The lesson is that any result from an LLM should be questioned as to its validity, because actual underlying data may not be part of the LLM's current training level.

- Keeping that in mind, the first shake-up of the market was driven by the announcement that DeepSeek from China was better than other LLMs, cost much less to train and was running on a smaller hardware platform than others like ChatGPT. There have already been questions around these claims and some have scoffed at them. This didn't stop the stock market and in particular Nvidia stock from dropping. To be fair, the cost of use for DeepSeek was a fraction of that charged by others. Before the dust had even settled, Alibaba, with a much better distribution and hardware platform, announced their new AI model Qwen 2.5 Max which they claim has been shown to outperform its competitors in certain benchmark tests, including Arena-Hard and LiveBench. Alibaba also dramatically dropped their pricing to match DeepSeek. OpenAI claim their GPT-4o and Meta's Liama-3.1-405B both beat Qwen 2.5-Max in certain benchmark tests.

- Competition is a good thing. China and the US seem to be in an AI war. In the background, commentators are saying things like generative AI results are better and others are claiming that G-AI is just around the corner. I would remind readers that fusion energy was just 10 years away 50 years ago. The story of AI is an evolving one, but many managers don't seem to understand the current status and limitations, nor just how long it takes to train a model to a good level of performance. You will of course continue to hear the term AI being dropped everywhere this year.

- About two hours after I wrote the previous paragraph, I learned UI-TARS, User-Interface Task Automation and Reasoning System, with two open-sourced versions, 7B and 7B2, had been released for the PC, Mac and mobile devices. It comes from ByteDance, the people behind TikTok, and is described as a task-oriented AI that is claimed to be faster than GPT-4o and Claude. Given the source and the observation that it can be used to take over your device and perform tasks for you, it's a little worrying, but the open-source community may be able to mollify those concerns over time. The software can "see" what's on your screen and react to requests. It can interpret images, navigate web pages and install a programming extension. Some readers may remember a while back, when I wrote about these capabilities being missing but coming in the future. If the marketing is correct, then UI-TARS can do this and sets the new benchmark for the next AI release cycle. The training for this model involved a large number of annotated screens. I find it interesting that three new AIs have appeared out of China in the past two weeks.

- Today's lesson is the Microsoft definition of "deprecation". This is the end of active development and it's what Windows 10 faces in the near future. This is different from "end of support", which simply means no more updates and from now on any bugs and security issues are yours to deal with, also facing users this year. So, deprecation according to Microsoft is a "save the date" notification of an impending end of service. For Windows 10, the latter date is Oct 14 this year. So, most of us have until then to upgrade to Windows 11. The problem with Windows 11 is the hardware limitations I've covered in the past. One problem is that for Windows 10 users, many of the new on-by-default features of Windows 11 can already be found in Windows 10. There are still Wi-Fi issues for some Win 11 users and the latest OS doesn't have any really enticing features to encourage movement from Windows 10 and earlier versions.

James Hein is an IT professional with over 30 years' standing. You can contact him at jclhein@gmail.com.

AI, LLMs still got those data blues

RECOMMENDED

TRENDING