It’s the end of the year at Radar! We hope all our readers have a happy holiday season. Here are the predictions for 2025:
Is this the end of the road to improving LLM performance by expanding the number of parameters or training data? No one knows yet. Regardless of the answer, we expect interest to shift to smaller models. We will reluctantly accept the 70B parameter model as “small”, but in practice we mean parameters of 20B or less. These models will likely prove easier to work with for companies developing AI-enabled applications. It won’t cost much to run and it will be simpler to fine-tune it for professional applications. Few applications require a completely general language model.
Learn faster. Take a deeper dive. Look further.
artificial intelligence
- The OpenGPT-X project has released Teuken-7B, an open large language model. Importantly, this model supports 24 European languages and is designed to comply with European law. You can find it at Hugging Face.
- OLMo 2 is a newly released fully open compact language model available in 7B and 13B sizes. Both versions claim to be the best performers in their group.
- NVIDIA announced Fugatto, a new generative text-to-audio model that can create entirely new kinds of sounds. They position it as a tool for creators.
- Anthropic has announced the developer preview of Model Context Protocol. MCP allows Claude Desktop to communicate securely with other resources. The MCP server limits the services exposed to Claude, filters Claude’s requests, and prevents data from being exposed across the Internet.
- OpenScholar is an open source language model designed to support scientific research. It is much more accurate than GPT-4o and more economical to run. Use RAG to access large databases of open access scientific papers to ensure citation accuracy.
- Meta has partnered with VSParticle to create new material based on AI-generated instructions. They are focusing on nanoporous materials that could be catalysts for breaking down CO2 into useful products.
- Perplexity introduces in-app shopping. Users can search for something and then have Perplexity purchase it. This is the first widely available example of an AI agent changing the state of the physical world.
- Research has shown that generative AI models have a unique style not unlike that of human writers. Stylistic analysis can identify the text source for the model that generated that text.
- Mistral has launched Pixtral Large, a 124B parameter multimode model with benchmark performance on par with the latest versions of other frontier models.
- Mozilla’s Common Voice project collects speech samples in languages other than British and American English to help developers build speech-enabled applications using other languages and dialects. This project is open source.
- Mechanical interpretability is an area of research that uses AI to investigate what is happening within each layer of large-scale language models. This provides a path toward AI interpretability. It is the ability to understand why an AI produces the output it produces and to control that output.
- Google’s Pixel phones can detect fraud in real time by monitoring phone conversations. Processing takes place entirely by telephone. This feature is off by default and can be enabled on a per-call basis. Another new feature detects stalkerware, apps that collect data without the user’s consent or knowledge.
- The Common Corpus dataset for training large-scale language models is now publicly available and can be used by Hugging Face. The dataset contains over 2T tokens from “licensed permissive” sources and documents the provenance of all sources.
- OpenAI’s latest model, Orion, is an improved version of GPT-4. But is this a significant improvement? Obviously not. This may be the end of the road to improving LLM by making it bigger. (Is it Orion GPT-5?)
- FrontierMath is a new AI benchmark based on extremely difficult math problems. At this point, the language model score is no higher than 2% (Gemini 1.5 Pro).
- Separating the instrument from the musical performance is difficult, but possible. Here is a masterpiece of signal processing without AI that attempts to do just that. Can I convert my performance back to sheet music?
- Standard Intelligence has launched hertz-dev, a new model for real-time speech synthesis. They are trained purely on audio and can engage in unscripted conversations without using text.
- Microsoft’s Magentic-One is a general-purpose agent system capable of performing complex tasks. Magentic-One is open source for researchers and developers. Microsoft has also released AutoGenBench, an open source tool for evaluating the performance of agent systems.
- ChainForge is a new visual tool for rapid engineering. You can use it to test prompts against multiple models and evaluate response quality.
- AI was used to age Tom Hanks and Robin Wright in a new film, allowing the actors to play their characters 60 years in the future.
- Anthropic has launched a new version of its smallest and fastest model, the Claude 3.5 Haiku. The company claims that its performance is better than its predecessor, the flagship model Claude 3 Opus, in many benchmarks. Anthropic has also significantly increased the price of using Haiku.
- OpenAI introduced prediction results. If the output of the prompt is mostly known in advance (for example, if you ask GPT to modify a file), you can upload the expected results along with the prompt and GPT will make the necessary changes. Predicted output reduces latency. Clearly they are not cutting costs.
- Fortunately, AI psychiatry has nothing to do with psychoanalyzing human patients. A forensic tool for AI error post-mortem analysis that allows investigators to recover the exact model that was in use when the error occurred.
- SmolLM2 is a new small language model designed to run on devices. Available in 135M, 360M and 1.7B parameter versions. Early reports suggest performance is impressive.
- vLLM is a framework for delivering LLMs. Works with most language models in Hugging Face. We argue that this is not only simpler, but also offers significant performance and cost benefits by using a key-value store to cache input tokens.
- AI Flame Graph gives developers a detailed view of what their models are doing. If you care about performance or energy usage, this product is revolutionary.
- Google’s Project Jarvis is reported to be the company’s answer to Anthropic’s computer-enabled API. Jarvis acts on behalf of the browser (presumably Chrome) to do the work for you.
- NotebookLM’s ability to create podcasts from documents is impressive. Can other models do the same? NotebookLlama is an open source project that uses the Llama model to generate podcasts.
programming
- bpftune is a utility that uses observability data from BPF to continuously tune Linux system performance. It has “no configurables” (no configuration), low overhead, and is smart enough to deviate from settings specified by the system administrator. It doesn’t appear to use AI.
- Kyanos is a new open source network analysis tool based on eBPF. Access to eBPF data allows you to filter packets by process or service and provides accurate information about packet latency.
- VMware Fusion and VMware Workstation are now free to all users, including commercial users. Broadcom will continue to develop products, but will stop providing troubleshooting support to users.
- OpenCoder is a suite of language models for code generation. It is completely open source, and the training data, data pipeline, training results, and training protocol are all available in addition to the code. The purpose is to encourage further experimentation and research into code generation.
- Mergiraf is a tool for resolving Git merge conflicts using its understanding of common programming languages (including Java, Rust, and Go) and file formats (including JSON, HTML, XML, and YAML). The authors claim that new languages can be added easily.
- A proposal has been published for Safe C++, a new version of C++ that incorporates memory safety features.
- DataChain is a Python library for working with structured data in the context of artificial intelligence. It is designed to build data pipelines and manipulate data at scale.
- No-code GitHub? GitHub Spark allows users to create small “micro apps,” or Sparks, without writing any code. More important than no code is no deployment. Spark is deployed on GitHub’s infrastructure and accessed over the web.
- Using Git to back up the /etc directory on Linux is obvious once you think about it.
- Ractor is an Actor framework for Rust. This means you can program in Rust as if it were Erlang. I was impressed by “Hello, World,” the longest and most complex song I’ve ever seen.
- Kubernetes is a platform for building platforms. And the platform must support both development and operations teams.
- GitHub Copilot can now use models other than GPT. In addition to various OpenAI models, users can choose Claude Sonnet or Gemini. Other new features include automatic code review, upgrade support for Java, multi-file editing, and a feature called Spark, similar to Claude’s Artifacts.
- Is your AI-generated code safe? no. It’s unlikely that you’ll stop using tools like Copilot and Cursor, but you should understand the issues. The AI model was trained with publicly available code. Most publicly available code has vulnerabilities. This is reflected in the output of AI.
- Does Java need another build tool? Mill awaits acquisition. Mill claims to be 5-10x faster than Maven and 2-4x faster than Gradle.
- Amphion is an open source toolkit for creating all forms of audio, including music and speech.
security
robot
- Grasso is an AI-based garbage robot. This is a mobile robot made from trash. We use Llava-v1.6-mistral-7B to understand visual input from the camera and Mistral-7B for prompts and responses. (Does not understand or produce speech.)
- Meta has launched several new projects for touch recognition, a critical element in building AI-powered robots that can interact with the real world. Digit 360 is a tactile digital fingertip, Sparsh is an encoder for tactile data, and Digit Plexus is a platform for creating prosthetic hands.
- By tying two non-intelligent microrobots (bristlebots) together with a short, flexible rope, they gain the ability to solve simple problems.
knitting
- Want to run Linux in your browser? You can do it. WebVM is a virtual machine that runs in your browser. Linux in the browser may not be very interesting. It’s more important as another example of Wasm’s abilities.
virtual reality
- Want to chat with Rosa Parks or Abraham Lincoln? Try ENGAGE XR, a tool that combines VR and generative AI. Whether this is actually history is an interesting question. The bus in Rosa Parks’ example looks more like a modern European bus than a 1950s American bus.
quantum computing
- Google’s DeepMind has developed AlphaQubit, an AI system that detects errors in quantum systems. Error correction has made tremendous progress in the past year, but it still remains a major problem in quantum computing.