Despite having 31 days, December is a short month. Announcements or happenings other than office parties are difficult to attract attention. Against this trend, OpenAI has issued a series of announcements called “12 Days of OpenAI.” Not to be outdone, Google responded with a slew of announcements, including the Gemini 2.0 Flash Thinking model. Models have emerged that can use streaming audio and video for both input and output. But perhaps the most important announcement was DeepSeek-V3, a very large expert mixture model (671B parameters) that delivers performance on par with other top models, but at roughly 1/10 the training cost.
AI
- DeepSeek-V3 is another LLM to watch out for. Performance is on par with Llama 3.1, GPT-4o, and Claude Sonnet. Although training was not cheap, training costs were estimated to be around 10% of a larger model.
- Not to be outdone by Google, OpenAI previewed its next models, o3 and o3-mini. These are all “reasoning models” trained to solve logical problems. It could be released by the end of January. OpenAI is looking for safety and security researchers for testing.
- Not to be outdone by 12 Days of OpenAI, Google launched Gemini 2.0 Flash Thinking, a new experimental model trained to solve logical problems. Unlike OpenAI’s GPT model, which supports reasoning, Flash Thinking explicitly shows the chain of thought.
- Jeremy Howard and his team released ModernBERT, a significant upgrade to the BERT model released six years ago. Available in two sizes: 139M and 395M parameters. Ideal for search, classification, entity extraction, and other components of your data pipeline.
- AWS’s Bedrock service has the ability to check for hallucinations in the output of other models.
- Not to be outdone by 12 Days of OpenAI, Google announced Android XR, its operating system for extended reality headsets and glasses. Google has no plans to build its own hardware. They are affiliated with Samsung, Qualcomm, and other manufacturers.
- Additionally, to keep up with the 12 Days of OpenAI, Anthropic announced Clio, a privacy-preserving approach to discovering how people use models. The information will be used to improve Anthropic’s understanding of safety issues and build more useful models.
- Not to be outdone by 12 Days of OpenAI, Google announced Gemini 2.0 Flash, a multi-mode model that supports streaming for both input and output. In this announcement, Astra, an AI agent for smartphones, was also introduced. Neither is generally available yet.
- OpenAI has launched Canvas, a new feature that combines programming and writing. Any changes you make to the canvas (code or text) immediately become part of the context. Python code runs in the browser using Pyodide (Wasm) rather than a container, as does the code interpreter.
- Stripe has announced an agent toolkit that lets you build payments functionality into your agent workflow. Stripe recommends using the toolkit in test mode until your application has been fully verified.
- Simon Willison shows how to run a GPT-4 class model (Llama 3.3 70B) on a reasonably well-equipped laptop (64GB MacBook Pro M2).
- As part of the 12 Days of OpenAI Series, OpenAI has finally released Sora, a video generation model. It’s free for ChatGPT Plus subscribers, but limited to 50-second video clips per month. ChatGPT Pro account relaxes many restrictions.
- Researchers have shown that advanced AI models, including Claude 3 Opus and OpenAI o1, can “scheme” against users’ interests to achieve their goals. Plans include subverting oversight mechanisms, intentionally delivering subpar results, and even taking steps to prevent downtime or replacement. Hello, HAL?
- Roaming RAG is a new technology for augmented generative search that searches titles to find related content to navigate documents like humans. You need a well-organized document. It’s such an incredibly simple idea.
- Google has announced PaliGemma 2, a new version of the Gemma model that incorporates vision.
- GPT-4-o1-preview no longer exists. The preview is now the real OpenAI o1. In addition to advanced inference technology, the production release claims to deliver faster and more consistent results.
- AI agent group minecraft Surprisingly, it behaved like a human. They even developed professions and religions. Is this a way to model how groups of humans work together?
- One of the things the AI industry desperately needs (aside from more performance) are better benchmarks. Current benchmarks are closed, easily gamed (which is what AI does), unreproducible, and may not test anything meaningful. Better Bench is a framework for assessing benchmark quality.
- Writer’s new language model, Palmyra Creative, promises the ability to develop a “style” to ensure that all AI-generated output doesn’t sound boringly the same.
- During training, AI picks up biases in human data. When humans interact with AI, there is a feedback loop that amplifies these biases.
programming
- Unicon may never be one of the top 20 (or even top 100) programming languages, but it is a descendant of Icon, which has always been my favorite language for string processing.
- If a bot equipped with an LLM can successfully complete a task set for a human, what does CAPTCHA mean?
- egui, along with eframes, is a GUI library and framework for Rust. It’s portable and runs natively (macOS, Windows, Linux, and Android), on the web (using Wasm), and on a variety of game engines.
- For the archivist in us: The Manx Project is not about cats or islands in the Irish Sea. A catalog of manuals for old computers.
- Cerbrec is a graphical Python framework for deep learning. It is aimed at Python programmers who do not have sufficient expertise to build applications using PyTorch or other AI libraries.
- GitHub announced that GitHub Copilot is now available for free to all current and new users. With free access, you get 2,000 code completions and 50 chat messages per month. We also added the ability to use Claude 3.5 Sonnet in addition to GPT-4o.
- Devin, an AI-assisted coding tool that claims to support software development from start to finish, including design and debugging, has reached general availability.
- JSON5, also known as “JSON for humans,” is a variant of JSON designed to be easily readable by humans so that it can be written and maintained manually, such as configuration files.
- AWS announced two important new services: Aurora DSQL, a distributed SQL database, and S3 Tables, which supports data lakehouses with Apache Iceberg.
- AutoFlow is an open source tool for creating knowledge graphs. It is based on TiDB (Vector Database), LlamaIndex, and DSPy.
security
- Portspoof is a security tool that makes it appear that all 65,535 TCP ports are open to valid services. Emulates a valid service on any port. This makes it difficult for an attacker to determine which ports are actually open without probing each port.
- Let’s Encrypt, which issues certificates that websites (and other applications) use to prove their identity, has announced a short-lived certificate that expires in six days. Short-lived certificates increase security by minimizing exposure if your private key is compromised.
- Due to the continued presence of attackers within communication networks, the FBI and CISA have recommended the use of encrypted communication protocols. (They still want a backdoor into the encryption system, which would make it vulnerable to attacks.)
- A new phishing attack uses corrupted Word documents to bypass security checks. If your document is damaged, Word can help you recover it.
- LLM Flowbreaking is a new class of attacks against language models that prevent guardrails from preventing objectionable output from reaching the user. These attacks exploit race conditions in the user’s interaction with the application.
- Bootkitty is a UEFI bootkit aimed at secure booting of Ubuntu systems. It appears to have been developed by cyber security students in South Korea and then leaked (perhaps accidentally). They have not yet been discovered in the wild, but if they were discovered they would pose a dangerous threat.
- DEF CON has launched a project to improve cybersecurity for America’s water infrastructure. They are starting with six water companies serving rural communities.
quantum computing
- Google has built a quantum computing chip whose error-corrected logical qubits can remain stable for an hour. Passes “below threshold”. That is, the error rate decreases as physical qubits are added for error correction. The chips were made in Google’s new manufacturing facility.
knitting
- Google added ‘Store Reviews’ to Chrome. Reviews are AI-generated summaries of reports from well-known sources reporting fraud and other issues.
- Here’s how to build a streaming text user interface on the web: Streaming text is almost essential for building AI-powered chatbots.
biology
- Yes, we can have virtual tastes. A research team has developed a Lollipop interface to allow people to experience flavors in a virtual world.
Learn faster. Take a deeper dive. Look further.