$ timeahead_
all sourcesAhead of AI (Sebastian Raschka)Anthropic NewsApple Machine Learning ResearchArs Technica AIAWS Machine Learning BlogCerebras BlogCohere BlogCrewAI BlogDeepSeek BlogDistill.pubfast.ai BlogFireworks AI BlogGoogle AI BlogGoogle Cloud AI BlogGoogle DeepMind BlogGroq BlogHaystack (deepset) BlogHugging Face BlogImport AI (Jack Clark)LangChain BlogLangFuse BlogLil'Log (Lilian Weng)LlamaIndex BlogMeta AI BlogMicrosoft AutoGen BlogMicrosoft Research BlogMistral AI NewsMIT Technology ReviewModal Blogn8n BlogNathan Lambert (RLHF)NVIDIA Developer BlogOllama BlogOpenAI BlogPerplexity AI BlogPyTorch BlogReplicate BlogSimon Willison BlogTensorFlow BlogThe Batch (DeepLearning.AI)The GradientThe Verge AITogether AI BlogVentureBeat AIvLLM BlogWeights & Biases BlogWired AIxAI (Grok) Blog
allapiagentsframeworkshardwareinframodelopen sourcereleaseresearchtutorial
★ TOP STORY[ OAI ]Model·2d ago

GPT-5.5 System Card

GPT‑5.5 is a new model designed for complex, real-world work, including writing code, researching online, analyzing information, creating documents and spreadsheets, and moving across tools to get things done. Relative to earlier models, GPT‑5.5 understands the task earlier, asks for less guidance, uses tools more effectively, checks it work and keeps going until it’s done. We subjected the model to our full suite of predeployment safety evaluations and our Preparedness Framework, including targeted red-teaming for advanced cybersecurity and biology capabilities, and collected feedback on real use cases from nearly 200 early-access partners before release. We are releasing GPT‑5.5 with our strongest set of safeguards to date, designed to reduce misuse while preserving legitimate, beneficial uses of advanced capabilities. We generally treat GPT‑5.5’s safety results as strong proxies for GPT‑5.5 Pro, which is the same underlying model using a setting that…

OpenAI Blogread →
▲ trending · last 48hview all →
[ANT]Anthropic News· 5 articlesvisit →
9d ago
Introducing Claude Opus 4.7 Product Apr 16, 2026 Our latest Opus model brings stronger performance across coding, agents, vision, and multi-step tasks, with greater thoroughness and consistency on the work that matters most.
Introducing Claude Opus 4.7 Our latest model, Claude Opus 4.7, is now generally available. Opus 4.7 is a notable improvement on Opus 4.6 in advanced software engineering, with particular gains on the most difficult tasks. Users report being able to hand off their hardest coding work—the kind that previously needed close supervision—to Opus 4.7 with confidence. Opus 4.7 handles complex, long-running tasks with rigor and consistency, pays precise attention to instructions, and devises ways to verify its own outputs before reporting back. The model also has substantially better vision: it can see images in greater resolution. It’s more tasteful and creative when completing professional tasks, producing higher-quality interfaces, slides, and docs. And—although it is less broadly capable than our most powerful model, Claude Mythos Preview—it shows better results than Opus 4.6 across a range of benchmarks: Last week we announced…
9d ago
Product Apr 17, 2026 Introducing Claude Design by Anthropic Labs Today, we’re launching Claude Design, a new Anthropic Labs product that lets you collaborate with Claude to create polished visual work like designs, prototypes, slides, one-pagers, and more.
Introducing Claude Design by Anthropic Labs Today, we’re launching Claude Design, a new Anthropic Labs product that lets you collaborate with Claude to create polished visual work like designs, prototypes, slides, one-pagers, and more. Claude Design is powered by our most capable vision model, Claude Opus 4.7, and is available in research preview for Claude Pro, Max, Team, and Enterprise subscribers. We’re rolling out to users gradually throughout the day. Design with Claude Even experienced designers have to ration exploration—there's rarely time to prototype a dozen directions, so you limit yourself to a few. And for founders, product managers, and marketers with an idea but not a design background, creating and sharing those ideas can be daunting. Claude Design gives designers room to explore widely and everyone else a way to produce visual work. Describe what you need and Claude…
9dModel#claude
44d ago
Mar 12, 2026 Announcements Anthropic invests $100 million into the Claude Partner Network
Anthropic invests $100 million into the Claude Partner Network We’re launching the Claude Partner Network, a program for partner organizations helping enterprises adopt Claude. We’re committing an initial $100 million to support our partners with training courses, dedicated technical support, and joint market development. Partners who join from today will get immediate access to a new technical certification and be eligible for investment. Anthropic is focused on ensuring that our AI model, Claude, serves the needs of businesses. To do this, we’ve partnered with a number of other companies. Notably, Claude is the only frontier AI model available on all three leading cloud providers: AWS, Google Cloud, and Microsoft. We also work with large management consultancies, professional services firms, specialist AI firms, and similar agencies. These organizations help our enterprise customers identify where Claude can provide the most value to…
44dModel#claude
51d ago
Mar 5, 2026 Announcements Where things stand with the Department of War
Where things stand with the Department of War A statement from Dario Amodei Yesterday (March 4) Anthropic received a letter from the Department of War confirming that we have been designated as a supply chain risk to America’s national security. As we wrote on Friday, we do not believe this action is legally sound, and we see no choice but to challenge it in court. The language used by the Department of War in the letter (even supposing it was legally sound) matches our statement on Friday that the vast majority of our customers are unaffected by a supply chain risk designation. With respect to our customers, it plainly applies only to the use of Claude by customers as a direct part of contracts with the Department of War, not all use of Claude by customers who have such contracts.…
51dModel#claude
80d ago
Announcements Feb 4, 2026 Claude is a space to think We’ve made a choice: Claude will remain ad-free. We explain why advertising incentives are incompatible with a genuinely helpful AI assistant, and how we plan to expand access without compromising user trust.
Claude is a space to think There are many good places for advertising. A conversation with Claude is not one of them. Advertising drives competition, helps people discover new products, and allows services like email and social media to be offered for free. We’ve run our own ad campaigns, and our AI models have, in turn, helped many of our customers in the advertising industry. But including ads in conversations with Claude would be incompatible with what we want Claude to be: a genuinely helpful assistant for work and for deep thinking. We want Claude to act unambiguously in our users’ interests. So we’ve made a choice: Claude will remain ad-free. Our users won’t see “sponsored” links adjacent to their conversations with Claude; nor will Claude’s responses be influenced by advertisers or include third-party product placements our users did not…
80dModel#claude
[ATA]Ars Technica AI· 8 articlesvisit →
3d ago
Anthropic tested removing Claude Code from the Pro plan
Anthropic caused a stir among developers with what appeared to be a surprise change to its pricing plan: The company signaled that Claude Code, the popular agentic development tool, would no longer be available to subscribers on the $20-per-month Pro plan. Users took to Reddit and X to point out that Anthropic’s pricing page for Claude explicitly showed Claude Code as not supported in the Pro plan. (It remained in the $100/month+ Max plan.) Some new users signing up for Pro subscriptions were unable to access Claude Code. Meanwhile, existing subscribers saw no interruption. After speculation and frustration spread, Anthropic’s head of growth, Amol Avasare, took to social media to clarify that this was a “small test on ~2% of new prosumer signups.” As for the reasoning, he explained: When we launched Max a year ago, it didn’t include Claude…
3dModel#claude#codingby Samuel Axon
4d ago
Report: Meta will train AI agents by tracking employees' mouse, keyboard use
Meta will begin tracking the mouse movements, clicks, and keystrokes of its US employees to generate high-quality training data for future AI agents, Reuters reports. The news organization cites internal memos posted by the Meta Superintelligence Labs team in reporting on the new Model Capability Initiative employee-tracking software. That software will operate on specific work-related apps and websites and also make use of periodic screenshots to provide context for the AI training, according to the memo. “This is where all Meta employees can help our models get better simply by doing their daily work,” the memo reads, in part, Reuters reports. Meta spokesperson Andy Stone told Reuters that the collected training data will help Meta’s AI agents with tasks that it sometimes struggles with, including “things like mouse movements, clicking buttons, and navigating dropdown menus.” “If we’re building agents to…
4dModel#trainingby Kyle Orland
9d ago
OpenAI starts offering a biology-tuned LLM
On Thursday, OpenAI announced it had developed a large language model specifically trained on common biology workflows. Called GPT-Rosalind after Rosalind Franklin, the model appears to differ from most science-focused models from major tech companies, which have generally taken a more generic approach that works for various fields. In a press briefing, Yunyun Wang, OpenAI’s Life Sciences Product Lead, said the system was designed to tackle two major roadblocks faced by current biology researchers. One is the massive datasets created by decades of genome sequencing and protein biochemistry, which can be too much for any one researcher to take in. The second is that biology has many highly specialized subfields, each with its own techniques and jargon. So, for example, a geneticist who finds themselves working on a gene that’s active in brain cells might struggle to understand the immense…
9dModel#agentsby John Timmer
9d ago
Gemini can now create personalized AI images by digging around in Google Photos
Google began rolling out “personal intelligence” in Gemini early this year, giving AI subscribers the option of a more customized experience when using the company’s chatbot. Today, it’s using personal intelligence to tie its image-generation model to Google Photos. If you opt in, generated images will have access to your photos and associated labels to simplify prompts and produce more accurate AI images. This change essentially streamlines an existing workflow. Google’s Nano Banana 2 is among the best AI image generators available, and it was already possible to feed it images of yourself or others to use as context for creating new AI content. Adding personal intelligence to the mix makes that process smoother by turning the image bot loose on the content of your photos, if indeed that’s something you want to do. It is generally true that adding…
9dModel#gemini#multimodalby Ryan Whitwam
10d ago
Boston Dynamics’ robot dog now reads gauges and thermometers with Google's AI
Robots such as Boston Dynamics’ four-legged Spot can now accurately read analog thermometers and pressure gauges while roaming around factories and warehouses. Those improvements come courtesy of Google DeepMind’s newest robotic AI model that aims to enhance robotic capabilities for ‘embodied reasoning’ when interacting with physical environments. The new Gemini Robotics-ER 1.6 model announced on April 14 performs as a “high-level reasoning model for a robot” that can plan and execute tasks, according to Google DeepMind. This model also unlocks the capability of accurately reading instruments such as complex gauges and doing visual inspections using sight glasses that provide a transparent window to peek inside tanks and pipes—a performance upgrade that came about through Google DeepMind’s ongoing collaboration with robotics company Boston Dynamics. Boston Dynamics has a keen interest in testing both quadruped and humanoid robotic workers in a wide…
10dModel#geminiby Jeremy Hsu
10d ago
Adobe takes Creative Cloud into Claude Code-esque territory
Adobe has been putting task-specific AI tools and features into its creative productivity applications like Photoshop, Illustrator, and Premiere at a breakneck pace, but the latest product from the company—a chat-based interface that can handle complex, multi-modal projects across several applications—marks a significant shift in how users can think about its suite of tools. You could imprecisely but defensibly call it a sort of “Claude Code for creative apps.” On one hand, it’s meant to provide experienced creatives with an efficient way to offload mundane tasks across multiple apps. On the other, it’s meant to reduce the “barrier to entry” for inexperienced or casual users, in the wake of tool complexity that the company says has previously “widened the gap between idea and output.” Adobe has offered chat-based prompts within individual apps before and in other Firefly interfaces. It has…
10dModel#claudeby Samuel Axon
11d ago
UK gov's Mythos AI tests help separate cybersecurity threat from hype
Last week, Anthropic announced it was restricting the initial release of its Mythos Preview model to “a limited group of critical industry partners,” giving them time to prepare for a model that it said is “strikingly capable at computer security tasks.” Now, the UK government’s AI Security Institute (AISI) has published an initial evaluation of the model’s cyberattack capabilities that adds some independent public verification to those Anthropic reports. AISI’s findings show that Mythos isn’t significantly different from other recent frontier models in tests of individual cybersecurity-related tasks. But Mythos could set itself apart from previous models through its ability to effectively chain these tasks into the multistep series of attacks necessary to fully infiltrate some systems. “The Last Ones” finally falls AISI has been putting various AI models through specially designed Capture the Flag challenges since early 2023, when…
11dModelby Kyle Orland
11d ago
Google introduces "Skills" in Chrome to make Gemini prompts instantly reusable
Chrome is the most popular browser in the world, and the competition is not even close. So the browser is a key part of Google’s efforts to get everyone using its AI tools. The company’s chatbot has already infused various parts of the Chrome UI, and you can even turn Gemini loose to control the browser. The latest AI addition to Chrome comes in the form of “Skills,” reusable prompts you can access while browsing with a single click. Skills don’t so much add new functionality as they make it easier to repeat tasks that were already possible with Gemini in Chrome. Previously, you would have to reenter the prompt each time you wanted Gemini to do something in Chrome; whether that meant typing it or copy-pasting from a saved document, you had to do it manually. Saving those favorite…
11dModel#geminiby Ryan Whitwam
[AWS]AWS Machine Learning Blog· 2 articlesvisit →
4d ago
From developer desks to the whole organization: Running Claude Cowork in Amazon Bedrock
Artificial Intelligence From developer desks to the whole organization: Running Claude Cowork in Amazon Bedrock Today, we’re excited to announce Claude Cowork in Amazon Bedrock. You can now run Cowork and Claude Code Desktop through Amazon Bedrock, directly or using an LLM gateway. From startups to global enterprises across every industry, organizations build with Claude Code in Amazon Bedrock to boost developer productivity and accelerate delivery. With Amazon Bedrock you can build within your existing AWS environment, maintain enterprise security and regional data residency, and scale inference. Your data stays under your account’s controls: Amazon Bedrock does not store prompts, files, tool inputs and outputs, or model responses, and does not use them to train foundation models. With Claude Cowork in Amazon Bedrock, you can expand AI adoption to every knowledge worker in your organization, with a desktop application that…
4dModel#claude#codingby Sofian Hamiti
9d ago
Cost-efficient custom text-to-SQL using Amazon Nova Micro and Amazon Bedrock on-demand inference
Artificial Intelligence Cost-efficient custom text-to-SQL using Amazon Nova Micro and Amazon Bedrock on-demand inference Text-to-SQL generation remains a persistent challenge in enterprise AI applications, particularly when working with custom SQL dialects or domain-specific database schemas. While foundation models (FMs) demonstrate strong performance on standard SQL, achieving production-grade accuracy for specialized dialects requires fine-tuning. However, fine-tuning introduces an operational trade-off: hosting custom models on persistent infrastructure incurs continuous costs, even during periods of zero utilization. The on-demand inference of Amazon Bedrock with fine-tuned Amazon Nova Micro models offers an alternative. By combining the efficiency of LoRA (Low-Rank Adaptation) fine-tuning with serverless and pay-per-token inference, organizations can achieve custom text-to-SQL capabilities without the overhead cost incurred by persistent model hosting. Despite the additional inference time overhead of applying LoRA adapters, testing demonstrated latency suitable for interactive text-to-SQL applications, with costs scaling by…
9dModel#fine-tuning#inferenceby Zeek Granston
[FAB]Fireworks AI Blog· 1 articlesvisit →
28d ago
3/28/2026 The Fine-Tuning Bottleneck Isn't the Algorithm
TL;DR: Integration friction and slow iteration cycles are the bottlenecks that actually stall fine-tuning — not the algorithm. We share the patterns we see across engagements, how teams like Cursor and Genspark broke through them, and where the workflow is heading: toward fully agentic fine-tuning loops that close themselves. Most teams that come to us for fine-tuning are not struggling with the training algorithm. They are struggling with everything around it: getting reward functions to talk to internal APIs without leaking data, waiting days between experiments because each step lives in a different tool, and figuring out whether the problem even calls for SFT, RFT, or DPO. Over the past year, working with a select group of the most innovative startups, digital natives, and Fortune 500 companies, we have seen these patterns repeat across every engagement. Every team that comes…
[GDM]Google DeepMind Blog· 4 articlesvisit →
11d ago
Gemini Robotics ER-1.6 enhances reasoning to help robots navigate real-world tasks.
For robots to be truly helpful, they need to understand the physical world like we do. That’s why today we're introducing Gemini Robotics-ER 1.6, an upgrade to our reasoning-first model that enables robots to understand their environments with unprecedented precision. By enhancing spatial logic and multi-view understanding, we’re bringing a new level of autonomy to the next generation of physical agents. This model specializes in capabilities critical for robotics, including visual and spatial understanding, task planning and success detection. We’re also helping robots with instrument reading, a new capability to enable robots to read complex gauges and sight glasses — a capability discovered through collaboration with Boston Dynamics. Gemini Robotics-ER 1.6 is our safest robotics model to date, demonstrating superior compliance with safety policies on adversarial spatial reasoning tasks. Starting today, Gemini Robotics-ER 1.6 is available to developers via the…
11dModel#gemini
51d ago
The latest AI news we announced in February
The latest AI news we announced in February For more than 20 years, we’ve invested in machine learning and AI research, tools and infrastructure to build products that make everyday life better for more people. Teams across Google are working on ways to unlock AI’s benefits in fields as wide-ranging as healthcare, crisis response and education. To keep you posted on our progress, we're doing a regular roundup of Google's most recent AI news. Here’s a look back at some of our AI announcements from February. For us, February was about global impact. At the AI Impact Summit in India, we demonstrated how our ongoing breakthroughs in AI are now solving real-world challenges for people everywhere — and we launched new partnerships and investments to make sure everyone benefits. We see AI as an enabling technology that can help people…
51dModel#geminiby Keyword Team
117d ago
The latest AI news we announced in December
The latest AI news we announced in December For more than 20 years, we’ve invested in machine learning and AI research, tools and infrastructure to build products that make everyday life better for more people. Teams across Google are working on ways to unlock AI’s benefits in fields as wide-ranging as healthcare, crisis response and education. To keep you posted on our progress, we're doing a regular roundup of Google's most recent AI news. Here’s a look back at some of our AI announcements from December. December is usually a time for reflection, and looking ahead. That’s why this month we’ve been focused on taking frontier intelligence out of the lab and putting it into your hands in ways that actually matter for your day-to-day. Whether it’s the lightning speed of Gemini 3 Flash helping you tackle tasks in seconds,…
117dModel#geminiby Keyword Team
129d ago
Gemini 3 Flash: frontier intelligence built for speed
Gemini 3 Flash: frontier intelligence built for speed Today, we're expanding the Gemini 3 model family with the release of Gemini 3 Flash, which offers frontier intelligence built for speed at a fraction of the cost. With this release, we’re making Gemini 3’s next-generation intelligence accessible to everyone across Google products. Last month, we kicked off Gemini 3 with Gemini 3 Pro and Gemini 3 Deep Think mode, and the response has been incredible. Since launch day, we have been processing over 1T tokens per day on our API. We’ve seen you use Gemini 3 to vibe code simulations to learn about complex topics, build and design interactive games and understand all types of multimodal content. With Gemini 3, we introduced frontier performance across complex reasoning, multimodal and vision understanding and agentic and vibe coding tasks. Gemini 3 Flash retains…
129dModel#geminiby Tulsee Doshi
[HF]Hugging Face Blog· 8 articlesvisit →
24d ago
Falcon Perception
Falcon Perception TL;DR — Falcon Perception is a 0.6B-parameter early-fusion Transformer for open-vocabulary grounding and segmentation from natural language prompts. The model processes image patches + text in one sequence using a hybrid attention mask, and produces variable numbers of instances with a small, structured token interface and lightweight output heads. On SA-Co, Falcon Perception reaches 68.0 Macro-F1 (vs. 62.3 for SAM 3) with the main remaining gap being presence calibration (MCC 0.64 vs. 0.82). We also introduce PBench, a diagnostic benchmark that breaks down performance by capability (attributes, OCR-guided disambiguation, spatial constraints, relations) and by dense long-context crowded scenes. We also relase Falcon OCR, a 0.3B-parameter model which reaches a score of 80.3 and 88.6 on the olmOCR benchmark and OmniDocBench respectively, while having the highest throughput of any open source OCR model. This post is a brief, practical…
24dModel
64d ago
GGML and llama.cpp join HF to ensure the long-term progress of Local AI
GGML and llama.cpp join HF to ensure the long-term progress of Local AI Georgi Gerganov and team are joining HF with the goal of scaling and supporting the community behind ggml and llama.cpp as Local AI continues to make exponential progress in the coming years. We've been working with Georgi and team for quite some time (we even have awesome core contributors to llama.cpp like Son and Alek in the team already) so this has been a very natural process. llama.cpp is the fundamental building block for local inference, and transformers is the fundamental building block for model definition, so this is basically a match made in heaven. ❤️ What will change for llama.cpp, the open source project and the community? Not much – Georgi and team still dedicate 100% of their time maintaining llama.cpp and have full autonomy and…
64dModel#llama#local
66d ago
One-Shot Any Web App with Gradio's gr.HTML
One-Shot Any Web App with Gradio's gr.HTML gr.HTML now supports custom templates, scoped CSS, and JavaScript interactivity. Which means you can build pretty much any web component — and Claude (or any other frontier LLM) can generate the whole thing in one shot: frontend, backend, and state management, all in a single Python file. We tested this by building different types of apps. Each one is a single Python file, no build step, deployable to Hugging Face Spaces in seconds. Productivity Apps Pomodoro Timer: A focus timer where a pixel-art tree grows as you work. Starts as a seed, sprouts branches, grows leaves. Complete a session and the tree joins your forest. Session tracking, theme switching, break modes — all interactive, all in one file. The tree animation alone would normally require a separate React component. Here it's just CSS…
66dModel#claude
71d ago
Custom Kernels for All from Codex and Claude
Custom Kernels for All from Codex and Claude tl;dr: We built an agent skill that teaches coding agents how to write production CUDA kernels. Then we pointed Claude and Codex at two real targets: a diffusers pipeline and a transformers model. The agents produced working kernels for both, with correct PyTorch bindings and benchmarks, end to end. Writing CUDA kernels is hard. Writing CUDA kernels that correctly integrate with transformers and diffusers is harder. There are architecture-specific memory access patterns, vectorization strategies, warp shuffle reductions, and a dozen integration pitfalls that trip up even experienced developers. It is exactly the kind of specialized, high-stakes problem where agent skills shine. We gave coding agents the domain knowledge they need, like which GPU architecture to target, how to structure a kernel-builder project, when to use shared memory versus registers, and how to…
71dModel#claude
95d ago
One Year Since the “DeepSeek Moment”
One Year Since the “DeepSeek Moment” The first blog addresses strategic changes and the explosion of new open models and open source players. The second covers architectural and hardware choices largely by Chinese companies made in the wake of a growing open ecosystem, available here. The third analyzes prominent organizations’ trajectories and the future of the global open source ecosystem, available here. For AI researchers and developers contributing to and relying on the open source ecosystem and for policymakers understanding the rapidly changing environment, there has never been a better time to build and release open models and artifacts, as proven by the past year’s immense growth catalyzed by DeepSeek. Notably, geopolitics has driven adoption; while models developed in China have been dominating across metrics throughout 2025 and new players leapfrogging each other, Western AI communities are seeking commercially deployable…
95dModel
110d ago
Introducing Falcon-H1-Arabic: Pushing the Boundaries of Arabic Language AI with Hybrid Architecture
Introducing Falcon-H1-Arabic: Pushing the Boundaries of Arabic Language AI with Hybrid Architecture Discover more in our official blogpost, featuring an interactive experience The journey of building world-class Arabic language models has been one of continuous learning and iteration. Today, we're excited to announce Falcon-H1-Arabic, our most advanced Arabic language model family to date, representing a significant leap forward in both architecture and capabilities. This release embodies months of research, community feedback, and technical innovation, culminating in three powerful models that set new standards for Arabic natural language processing. Building on Success: The Evolution from Falcon-Arabic When we launched Falcon-Arabic a few months ago, the response from the community was both humbling and enlightening. Developers, researchers and students across the Arab world used the model for real use cases, pushing them to its limits and providing invaluable feedback. We learned where…
110dModel
135d ago
New in llama.cpp: Model Management
New in llama.cpp: Model Management Reminder: llama.cpp server is a lightweight, OpenAI-compatible HTTP server for running LLMs locally. This feature was a popular request to bring Ollama-style model management to llama.cpp. It uses a multi-process architecture where each model runs in its own process, so if one model crashes, others remain unaffected. Quick Start Start the server in router mode by not specifying a model: llama-server This auto-discovers models from your llama.cpp cache (LLAMA_CACHE or ~/.cache/llama.cpp ). If you've previously downloaded models via llama-server -hf user/model , they'll be available automatically. You can also point to a local directory of GGUF files: llama-server --models-dir ./my-models Features - Auto-discovery: Scans your llama.cpp cache (default) or a custom --models-dir folder for GGUF files - On-demand loading: Models load automatically when first requested - LRU eviction: When you hit --models-max (default: 4), the…
135dModel#llama
155d ago
20x Faster TRL Fine-tuning with RapidFire AI
20x Faster TRL Fine-tuning with RapidFire AI Why this matters When fine-tuning or post-training LLMs, teams often do not have the time and/or budget to compare multiple configs even though that can significantly boost eval metrics. RapidFire AI lets you launch multiple TRL configs concurrently--even on a single GPU--and compare them in near real time via a new adaptive, chunk-based scheduling and execution scheme. In internal benchmarks referenced in the TRL page, this delivers ~16–24× higher experimentation throughput than sequentially comparing configs one after another, enabling you to reach much better metrics much faster. RapidFire AI establishes live three-way communication between your IDE, a metrics dashboard, and a multi-GPU execution backend What you get, out of the box Drop-in TRL wrappers — Use RFSFTConfig ,RFDPOConfig , andRFGRPOConfig as near-zero-code replacements for TRL's SFT/DPO/GRPO configs.Adaptive chunk-based concurrent training — RapidFire AI…
155dModel#fine-tuning
[MTR]MIT Technology Review· 1 articlesvisit →
8d ago
The Download: bad news for inner Neanderthals, and AI warfare’s human illusion
The Download: bad news for inner Neanderthals, and AI warfare’s human illusion Plus: Despite blacklisting Anthropic, the White House wants its new model. This is today's edition of The Download, our weekday newsletter that provides a daily dose of what's going on in the world of technology. The problem with thinking you’re part Neanderthal There’s a theory that many of us have an “inner Neanderthal.” The idea is that Homo sapiens and a cousin species once bred, leaving some people today with a trace of Neanderthal DNA. This DNA is arguably the 21st century’s most celebrated discovery in human evolution. But in 2024, a pair of French geneticists called into question the theory's very foundations. They proposed that what scientists interpret as interbreeding could instead be explained by population structure—the way genes concentrate in smaller, isolated groups. Find out what…
8dModelby Thomas Macaulay
[NV]NVIDIA Developer Blog· 4 articlesvisit →
16d ago
Cut Checkpoint Costs with About 30 Lines of Python and NVIDIA nvCOMP
Training LLMs requires periodic checkpoints. These full snapshots of model weights, optimizer states, and gradients are saved to storage so training can resume after interruptions. At scale, these checkpoints become massive (782 GB for a 70B model) and frequent (every 15-30 minutes), generating one of the largest line items in a training budget. Most AI teams chase GPU utilization, training throughput, and model quality. Almost none look at what checkpointing is costing them. This is an expensive oversight. The synchronous checkpoint overhead of a 405B model on 128 NVIDIA Blackwell GPUs alone can cost $200,000 a month. By introducing a lossless compression step implemented with about 30 lines of Python, we can reduce storage costs by $56,000 every month. Mixture of experts (MoE) models save even more. We’ll break down how we got to that calculation and how NVIDIA nvComp…
16dModel#rag#training#gpuby Wenqi Glantz
47d ago
Implementing Falcon-H1 Hybrid Architecture in NVIDIA Megatron Core
In the rapidly evolving landscape of large language model (LLM) development, NVIDIA Megatron Core has emerged as the foundational framework for training massive transformer models at scale. The open source library offers industry-leading parallelism and GPU-optimized performance. Now developed GitHub-first in the NVIDIA/Megatron-LM repo, Megatron Core is increasingly shaped by contributions from foundation model builders, making it a more flexible, future-proofed engine for open AI models. This post provides a technical overview of how the Technology Innovation Institute (TII), creators of the Falcon model family, have contributed to and integrated with Megatron Core and Megatron Bridge frameworks. The first section examines the implementation of the Falcon-H1 parallel hybrid architecture within Megatron Bridge, highlighting the challenges of coordinating heterogeneous Transformer and Mamba layers alongside non-learnable µP multipliers. The second section explores the integration of BitNet into Megatron Core, detailing the replacement…
47dModel#training#gpuby Mireille Fares
81d ago
Accelerating Long-Context Model Training in JAX and XLA
Large language models (LLMs) are rapidly expanding their context windows, with recent models supporting sequences of 128K tokens, 256K tokens, and beyond. However, training these models with extended context lengths presents significant computational and communication challenges. As context lengths grow, the memory and communication overhead of attention mechanisms scale quadratically, creating bottlenecks that traditional parallelism strategies struggle to address efficiently. This post demonstrates that integrating the NVSHMEM communication library into Accelerated Linear Algebra (XLA) compiler optimizes context parallelism. This integration enables the efficient training of Llama 3 8B model in JAX framework with sequences up to 256K tokens. Our results show that NVSHMEM provides up to 36% speedup over NVIDIA Collective Communications Library (NCCL) for long-context training workloads, particularly when combined with tensor parallelism across multiple nodes. The long-context training challenge To understand why NVSHMEM provides significant speedups for long-context…
81dModel#llama#training#gpuby Sevin Fide Varoglu
82d ago
Optimizing Communication for Mixture-of-Experts Training with Hybrid Expert Parallel
In LLM training, Expert Parallel (EP) communication for hyperscale mixture-of-experts (MoE) models is challenging. EP communication is essentially all-to-all, but due to its dynamics and sparseness (only topk experts per AI token instead of all experts), it’s challenging to implement and optimize. This post details an efficient MoE EP communication solution, Hybrid-EP, and its use in the NVIDIA Megatron family of frameworks, on NVIDIA Quantum InfiniBand and NVIDIA Spectrum-X Ethernet platforms. It also dives into the effectiveness of Hybrid-EP in real-world model training. Efficiency challenges of hyperscale MoE model training DeepSeek-V3 is a representative model of the new generation of large-scale fine-grained MoE models. Such models balance computational overhead with model performance through “hyperparameter size sparse activation,” but they also pose serious challenges for existing large-model training frameworks. - Communication efficiency bottlenecks: The MoE model relies on parallel experts and…
82dModel#training#gpuby Fan Yu
[OLL]Ollama Blog· 8 articlesvisit →
61d ago
The simplest and fastest way to setup OpenClaw February 23, 2026 Setup OpenClaw in under two minutes with a single Ollama command.
The simplest and fastest way to setup OpenClaw February 23, 2026 OpenClaw is a personal AI assistant that can clear your inbox, send emails, manage your calendar, and complete other tasks via messaging apps like WhatsApp, Telegram, iMessage, or any chat app you already use. It all runs on your own hardware, and with Ollama 0.17, it’s now a single command to get started. What you’ll need - Ollama 0.17 or later - Node.js (npm is used to install OpenClaw) - Mac or Linux system (Windows users can install OpenClaw via WSL - Windows Subsystem for Linux) Step 1: Run the command Open a terminal, and type in: ollama launch openclaw --model kimi-k2.5:cloud Note: Other models can also be configured. See ollama launch openclaw for recommended models. Ollama handles everything from here. Step 2: Install OpenClaw If OpenClaw isn’t already…
61dModel#llama
68d ago
Subagents and web search in Claude Code February 16, 2026 Ollama now supports subagents and web search in Claude Code.
Subagents and web search in Claude Code February 16, 2026 Ollama now supports subagents and web search in Claude Code. No MCP servers or API keys required. Get started ollama launch claude --model minimax-m2.5:cloud It works with any model on Ollama’s cloud. Subagents Subagents can run tasks in parallel, such as file search, code exploration, and research, each in their own context. Longer coding sessions stay productive. Side tasks don’t fill the context with noise. Some models will naturally trigger subagents when needed (minimax-m2.5, glm-5, kimi-k2.5), but you can force triggering subagents by telling the model to “use/spawn/create subagents” Example prompts: > spawn subagents to explore the auth flow, payment integration, and notification system > audit security issues, find performance bottlenecks, and check accessibility in parallel with subagents > create subagents to map the database queries, trace the API routes,…
83d ago
OpenClaw February 1, 2026 OpenClaw is a personal AI assistant that connects your messaging apps to local AI coding agents, all running on your own device.
OpenClaw February 1, 2026 OpenClaw is a personal AI assistant that bridges your favorite messaging platforms to AI coding agents through a centralized gateway. It runs locally on your own devices, keeping your conversations and code private. OpenClaw integrates with WhatsApp, Telegram, Slack, Discord, iMessage, and other messaging services, allowing you to interact with AI coding agents from anywhere. Get started Start by installing OpenClaw curl -fsSL https://openclaw.ai/install.sh | bash Windows iwr -useb https://openclaw.ai/install.ps1 | iex Running with Ollama Once installed, you can launch OpenClaw directly with Ollama to connect local/cloud models: ollama launch openclaw If you want to configure OpenClaw without immediately starting the service: ollama launch openclaw --config The gateway will auto-reload if it’s already running. ## Recommended models OpenClaw requires a larger context length to complete tasks. It is recommended to use a context length of at…
92d ago
ollama launch January 23, 2026 ollama launch is a new command which sets up and runs coding tools like Claude Code, OpenCode, and Codex with local or cloud models. No environment variables or config files needed.
ollama launch January 23, 2026 ollama launch is a new command which sets up and runs your favorite coding tools like Claude Code, OpenCode, and Codex with local or cloud models. No environment variables or config files needed. Get started Download Ollama v0.15+, then open a terminal and run: # ~23 GB VRAM required with 64000 tokens context length ollama pull glm-4.7-flash # or use a cloud model (with full context length) ollama pull glm-4.7:cloud One command setup Claude Code: ollama launch claude OpenCode: ollama launch opencode This will guide you to select models and launch your chosen integration. No environment variables or config files needed. Supported integrations - Claude Code - OpenCode - Codex - Droid Recommended models for coding Note: Coding tools work best with a full context length. Update the context length in Ollama’s settings to at…
99d ago
Claude Code with Anthropic API compatibility January 16, 2026 Ollama is now compatible with the Anthropic Messages API, making it possible to use tools like Claude Code with open models.
Claude Code with Anthropic API compatibility January 16, 2026 Ollama v0.14.0 and later are now compatible with the Anthropic Messages API, making it possible to use tools like Claude Code with open-source models. Run Claude Code with local models on your machine, or connect to cloud models through ollama.com. Using Claude Code with Ollama Claude Code is Anthropic’s agentic coding tool that lives in your terminal. With Anthropic API support, you can now use Claude Code with any Ollama model. Get started Install Claude Code macOS, Linux, WSL: curl -fsSL https://claude.ai/install.sh | bash Windows PowerShell: irm https://claude.ai/install.ps1 | iex Windows CMD: curl -fsSL https://claude.ai/install.cmd -o install.cmd && install.cmd && del install.cmd Connect Ollama Configure environment variables to use Ollama: export ANTHROPIC_AUTH_TOKEN=ollama export ANTHROPIC_BASE_URL=http://localhost:11434 Run Claude Code with an Ollama model: claude --model gpt-oss:20b Models in Ollama’s Cloud also work with…
184d ago
NVIDIA DGX Spark performance October 23, 2025 We ran performance tests on release day firmware and an updated Ollama version to see how Ollama performs.
NVIDIA DGX Spark performance October 23, 2025 Performance We ran performance tests on release day firmware and an updated Ollama version to see how Ollama performs. The tests were run using the latest NVIDIA DGX Spark firmware (580.95.05) and Ollama v0.12.6. Each test is performed: - 10 times - Temperature set to 0 - Constrained to 500 tokens output - Prompt: “write an in-depth summary of this story: $(head -n200 pg98.txt)” (please see the test script for the book, “A Tale of Two Cities”) - Caching is disabled so repeated tests will not be faster The test script and its readme are made available and can be customized for your own testing. *OpenAI’s gpt-oss models are tested using models officially provided by OpenAI, distributed via Ollama. Some GGUFs distributed online labeled as MXFP4 are further quantized to q8_0 in the…
184dModel#llama#gpu
193d ago
Qwen3-VL October 14, 2025 Ollama now supports Alibaba's Qwen3-VL.
Qwen3-VL October 14, 2025 Qwen3-VL, the most powerful vision language model in the Qwen series is now available on Ollama’s cloud. The models will be made available locally soon. Model capabilities - Visual Agent: Operates PC/mobile GUIs—recognizes elements, understands functions, invokes tools, completes tasks - Visual Coding Boost: Generates Draw.io/HTML/CSS/JS from images/videos - Advanced Spatial Perception: Judges object positions, viewpoints, and occlusions; provides stronger 2D grounding and enables 3D grounding for spatial reasoning and embodied AI - Long Context & Video Understanding: Native 256K context, expandable to 1M; handles books and hours-long video with full recall and second-level indexing - Enhanced Multimodal Reasoning: Excels in STEM/Math—causal analysis and logical, evidence-based answers - Upgraded Visual Recognition: Broader, higher-quality pre-training is able to recognize everything more types of objects—celebrities, anime, products, landmarks, flora/fauna, etc - Expanded OCR: Supports 32 languages (up from…
193dModel#llama#qwen
194d ago
NVIDIA DGX Spark October 13, 2025 The latest NVIDIA DGX Spark is here! Ollama has partnered with NVIDIA to ensure it runs fast and efficiently out-of-the-box.
NVIDIA DGX Spark October 13, 2025 The latest NVIDIA DGX Spark is here! Ollama has partnered with NVIDIA to ensure it runs fast and efficiently out-of-the-box. Powered by the NVIDIA GB10 Grace Blackwell Superchip, the NVIDIA DGX delivers 1 petaFLOP of performance for prototyping and running local language models on Ollama. With 128GB of memory, you can run the latest models from Alibaba (Qwen), DeepSeek, Meta (Llama), Mistral, Google (Gemma), OpenAI (Gpt-oss), and many more from Ollama’s library. You can also upload and bring your own custom or fine-tuned models. We can’t wait to see what you’ll build with the latest NVIDIA DGX Spark! In the meantime, we’re working with NVIDIA to optimize Ollama’s performance and testing it across the use cases we see most often—chat, document processing (retrieval, OCR, modification), code tasks, and multimodal workflows. Learn more about the…
194dModel#llama#gpu
[OAI]OpenAI Blog· 39 articlesvisit →
2d ago
GPT-5.5 Bio Bug Bounty
GPT‑5.5 Bio Bug Bounty Testing universal jailbreaks for biorisks in GPT‑5.5 As part of our ongoing efforts to strengthen our safeguards for advanced AI capabilities in biology, we’re introducing a Bio Bug Bounty for GPT‑5.5 and accepting applications. We’re inviting researchers with experience in AI red teaming, security, or biosecurity to try to find a universal jailbreak that can defeat our five-question bio safety challenge. - Model in scope: GPT‑5.5 in Codex Desktop only. - Challenge: Identify one universal jailbreaking prompt to successfully answer all five bio safety questions from a clean chat without prompting moderation. - Rewards: - $25,000 to the first true universal jailbreak to clear all five questions. - Smaller awards may be granted for partial wins at our discretion. - Timeline: Applications open April 23, 2026 with rolling acceptances, and close on June 22, 2026. Testing…
2dModel#safety
5d ago
OpenAI helps Hyatt advance AI among colleagues
OpenAI helps Hyatt advance AI among colleagues Key takeaways: - Hyatt has deployed ChatGPT Enterprise. - With ChatGPT Enterprise, Hyatt employees can access frontier AI capabilities like GPT 5.4, Codex, and more. - Departments including finance, marketing, and operations will use ChatGPT Enterprise to improve Hyatt guest and customers’ experience. Hyatt’s innovative approach with OpenAI reflects how Hyatt is elevating its use of technology and enhancing human connections. The company is making artificial intelligence broadly accessible to its employees, enabling teams to spend less time on manual tasks and more time focused on delivering exceptional guest experiences. As part of this effort, Hyatt has made ChatGPT Enterprise available to employees across its global corporate and hotel workforce, making it a core component of how the business runs day to day. ChatGPT Enterprise is just one example of how Hyatt is…
5dModel#gpt
9d ago
Accelerating the cyber defense ecosystem that protects us all
Trusted Access for Cyber is designed around a simple premise: advanced cyber capabilities should reach defenders broadly, but access should scale with trust, validation, and safeguards. Today we’re sharing the first organizations helping put that approach into practice, from open-source security teams and vulnerability researchers to enterprises operating some of the world’s most complex digital environments. The strength of this approach comes from the breadth of defenders involved. Cybersecurity is a team sport, and the systems people rely on are protected by organizations of many kinds, from major enterprises and security vendors to researchers, maintainers, public institutions, nonprofits, and smaller teams with limited security resources. Not every organization has the benefit of a 24x7 security team who is able to respond to incidents when they are disclosed on a Friday night(opens in a new window). It’s important for all software…
9dModel
11d ago
Trusted access for the next era of cyber defense
We are scaling up our Trusted Access for Cyber (TAC) program to thousands of verified individual defenders and hundreds of teams responsible for defending critical software. For years, we’ve been building a cyber defense program on the principles of democratized access, iterative deployment, and ecosystem resilience. In preparation for increasingly more capable models from OpenAI over the next few months, we are fine-tuning our models specifically to enable defensive cybersecurity use cases, starting today with a variant of GPT‑5.4 trained to be cyber-permissive: GPT‑5.4‑Cyber. In this post, we share how we expect our approach of scaling cyber defense in lockstep with increasing model capabilities to guide the testing and deployment of future releases. The progressive use of AI accelerates defenders – those responsible for keeping systems, data, and users safe – enabling them to find and fix problems faster in…
11dModel
32d ago
Helping developers build safer AI experiences for teens
Helping developers build safer AI experiences for teens Introducing a set of teen safety policies formatted as prompts for gpt-oss-safeguard Today, we’re releasing prompt-based safety policies(opens in a new window) to help developers create age-appropriate protections for teens. Built to work with our open-weight safety model, gpt-oss-safeguard(opens in a new window), these policies simplify how developers turn safety requirements into usable classifiers for real-world systems. We released open weight models to democratize access to powerful AI and support broad innovation. At the same time, we believe safety and innovation go hand in hand, and that developers should have access to capable models as well as the tools and policies to deploy them safely and responsibly. We developed these policies to support developers in their safety efforts to protect young users, with input from trusted external organizations including Common Sense Media(opens…
51d ago
GPT-5.4 Thinking System Card
GPT‑5.4 Thinking is the latest reasoning model in the GPT‑5 series, and explained in our blog. The comprehensive safety mitigation approach for this model is similar to previous models in this series, but 5.4 Thinking is the first general purpose model to have implemented mitigations for High capability in Cybersecurity. The approach to cyber safety builds on the latest approaches implemented for GPT‑5.3 Codex, in ChatGPT and the API. In this card we also refer to GPT‑5.4 Thinking as gpt-5.4-thinking. Note that there is not a model named GPT‑5.3 Thinking, so the main model to baseline against is GPT‑5.2 Thinking. Author OpenAI
51dModel
52d ago
Extending single-minus amplitudes to gravitons
Extending single-minus amplitudes to gravitons Researchers used GPT‑5.2 Pro to help find a new mathematical result describing how particles can interact in quantum gravity. We’ve published a new preprint studying scattering amplitudes in quantum gravity, extending recent results obtained for gluons to the gravitational setting. The work shows that a class of graviton interactions long assumed to vanish can in fact arise under well-defined kinematic conditions. The preprint is available here(opens in a new window). We welcome feedback from the community. The paper, “Single-minus graviton tree amplitudes are nonzero,” is authored by Alfredo Guevara (Institute for Advanced Study), Alexandru Lupsasca (Vanderbilt University and OpenAI), David Skinner (University of Cambridge), Andrew Strominger (Harvard University), and Kevin Weil (OpenAI) on behalf of OpenAI. Scattering amplitudes are mathematical quantities physicists use to calculate the probability that particles interact in particular ways. Rather than…
52dModel
53d ago
GPT-5.3 Instant: Smoother, more useful everyday conversations
Today, we’re releasing an update to ChatGPT’s most-used model that makes everyday conversations more consistently helpful and fluid. GPT‑5.3 Instant delivers more accurate answers, richer and better-contextualized results when searching the web, and reduces unnecessary dead ends, caveats, and overly declarative phrasing that can interrupt the flow of conversation. This update focuses on the parts of the ChatGPT experience people feel every day: tone, relevance, and conversational flow. These are nuanced problems that don’t always show up in benchmarks, but shape whether ChatGPT feels helpful or frustrating. GPT‑5.3 Instant directly reflects user feedback in these areas. We heard feedback that GPT‑5.2 Instant would sometimes refuse questions it should be able to answer safely, or respond in ways that feel overly cautious or preachy, particularly around sensitive topics. GPT‑5.3 Instant significantly reduces unnecessary refusals, while toning down overly defensive or moralizing…
53dModel
53d ago
GPT-5.3 Instant System Card
GPT‑5.3 Instant is the newest addition to the GPT‑5 series. As described in our blog, GPT‑5.3 Instant responds faster, delivers richer and better-contextualized answers when searching the web, and reduces unnecessary dead ends, caveats, and overly declarative phrasing that can interrupt the flow of conversation. The comprehensive safety mitigation approach for this model is largely the same as that described for GPT‑5.2 Instant in the GPT‑5.2 System Card. In this card we also refer to GPT‑5.3 Instant as gpt-5.3-instant. Author OpenAI
53dModel
59d ago
OpenAI o1 and new tools for developers
OpenAI o1 and new tools for developers Introducing OpenAI o1, Realtime API improvements, a new fine-tuning method and more for developers. Today we’re introducing more capable models, new tools for customization, and upgrades that improve performance, flexibility, and cost-efficiency for developers building with AI. This includes: - OpenAI o1 in the API(opens in a new window), with support for function calling, developer messages, Structured Outputs, and vision capabilities. - Realtime API updates(opens in a new window), including simple WebRTC integration, a 60% price reduction for GPT‑4o audio, and support for GPT‑4o mini at one-tenth of previous audio rates. - Preference Fine-Tuning(opens in a new window), a new model customization technique that makes it easier to tailor models based on user and developer preferences. - New Go and Java SDKs(opens in a new window) available in beta. OpenAI o1, our reasoning…
59d ago
Boosting the customer retail experience with GPT-4o mini
Zalando boosts the customer experience with its Assistant, powered by GPT‑4o mini Zalando(opens in a new window), one of Europe’s largest online fashion and lifestyle platforms, serves over 50 million customers in 25 countries. With a vast catalog of apparel, shoes, and beauty products, the company has steadily expanded its offerings to become a go-to destination for fashion enthusiasts. Zalando worked with OpenAI to develop the Zalando Assistant, an AI-powered tool that provides personalized content recommendations and streamlines product discovery. With GPT‑4o mini and a robust evaluation framework, and compared to the previous version, the latest iteration of the Assistant has delivered: - A 23% increase in product clicks - A 40%+ increase in products added to the wishlist - Scaled availability to 25 markets featuring local languages Scaling the Assistant for 25 markets The first iteration of the Zalando…
59dModel#gpt
59d ago
Introducing Verdi, an AI dev platform powered by GPT-4o
Mercado Libre introduces Verdi, an AI developer platform powered by GPT‑4o Mercado Libre(opens in a new window) is Latin America’s largest e-commerce and fintech company. In their 25-year history, Mercado Libre has grown exponentially, earning them the title of most valuable company in LATAM. The company has successfully implemented AI solutions to maintain their competitive edge in a crowded market. Among these initiatives is Verdi, a development platform layer using GPT‑4o, GPT‑4o mini, and GPT‑3.5 Turbo, which is transforming how Mercado Libre handles customer service and other complex tasks. For years, Mercado Libre has used AI to streamline processes and enhance user experiences. They use OpenAI’s API to: - Improve inventory capacity: GPT‑4 Vision tags and completes product listings, enabling Mercado to catalog 100x more products than before in a span of two years. - Detect fraud: GPT‑4 evaluates data…
59dModel#gpt#coding
59d ago
Using GPT-4 to improve teaching and learning in Brazil
Arco Educação uses GPT‑4 to improve teaching and learning in Brazil Arco Educação, Brazil’s largest educational operating system, is partnering with OpenAI to build tools that enable teachers to concentrate on what matters most: helping students learn. “Arco’s products were built by teachers, for teachers,” says CEO Ari de Sá Cavalcante. “Our AI strategy aims to free up educators’ time, enabling them to focus more on each student's unique learning journey.” In Brazil, the average teacher spends a third of their time on administrative tasks, which is 40 percent more than the global average (OECD 2018). The rest of their time is spent on lesson planning, grading, and other operational activities. Arco Educação is leveraging AI to reduce this burden, allowing teachers to dedicate more time to delivering quality teaching directly to their students. “Our AI product agenda was shaped…
59dModel#gpt
59d ago
Using GPT-4 to deliver a new customer service standard
Ada uses GPT‑4 to deliver a new customer service standard Ada is fueling a $100B shift(opens in a new window) in customer service spend, and at the forefront of this transition is their AI-native customer service automation platform. Founded in 2016, Ada(opens in a new window) is now valued at $1.2B with a total of $200M in funding; customers include Verizon, YETI, Canva, and Square. Ada isn’t new to AI—they’ve been an AI-native platform since inception. The first generation of the product was built using custom Natural Language Processing (NLP) models that were developed and trained in-house. But they noticed a gap between how many customer questions their platform could handle, and how many queries were truly being resolved in a satisfactory way. “We got really excited by OpenAI and what was happening in the industry. In 2022, we decided…
59dModel#gpt
59d ago
Delivering contextual job matching for millions with OpenAI
Indeed uses OpenAI to deliver contextual job matching to millions of job seekers Indeed(opens in a new window), whose mission is to help people get jobs, is the world’s #1 job site1. Over 350 million unique visitors2 come to Indeed every month to connect with more than 3.5 million employers and over 32 million jobs. But what’s more is that every three seconds someone gets hired on Indeed3. Since Indeed’s inception, AI has powered the millions of connections between job seekers and employers on the platform, through features such as ‘Invite to Apply’ which sends AI-based job recommendations to job seekers based on their resume, Indeed Profile, and other qualifications. Improvements in AI—specifically generative AI—are helping match job seekers to jobs in new and exciting ways. Using OpenAI's GPT models and fine-tuning capabilities, Indeed enhanced the personalized language in the…
59dModel#fine-tuning
59d ago
GPT-4o System Card External Testers Acknowledgements
GPT‑4o System Card External Testers Acknowledgements Red Teamers Adam Kuzdraliński, Alexa W, Amer Sawan, Ana-Diamond Aaba Atach, Anna Becker, Arjun Singh Puri, Baybars Orsek, Ben Kobren, Bertie Vidgen, Blue Sheffer, Broderick McDonald, Bruce Bassett, Bruno Arsioli, Caroline Friedman Levy, Casey Williams, Christophe Ego, Ciel Qi, Cory Alpert, Dani Madrid-Morales, Daniel Kang, Darius Emrani, Dominik Haenni, Drin Ferizaj, Emily Lynell Edwards, Emmett Alton Sartor, Farhan Sahito, Francesco De Toni, Gabriel Chua, Gaines Hubbell, Gelei Deng, George Gor, Gerardo Adesso, Grant Brailsford, Hao Zhao, Henry Silverman, Hasan Sawan, Herman Wasserman, Hugo Gobato Souto, Ioana Tanase, Isabella Andric, Ivan Carbajal, Jacy Reese Anthis, Jake Okechukwu Effoduh, Javier García Arredondo, Jennifer Victoria Scurrell, Jianlong Zhu, Joanna Brzyska, Kate Turetsky, Kelly Bare, Kristen Menou, Latisha Harry, Lee Elkin, Liseli Akayombokwa, Louise Giam, M. Alexandra García Pérez, Manas Chawla, Marjana Skenduli, Martin Rydén, Mateusz Garncarek, Matt…
59dModel#gpt
59d ago
Using GPT-4o reasoning to transform cancer care
Color Health Color Health(opens in a new window) is working with OpenAI to pioneer a new way of accelerating cancer patients’ access to treatment. Their new copilot application uses GPT‑4o to identify missing diagnostics and create tailored workup plans, enabling healthcare providers to make evidence-based decisions about cancer screening and treatment. Color has been working to improve access to healthcare for a decade, serving more than 7 million patients since it was founded. In 2023, they partnered with the American Cancer Society to help employers and health plans take control of cancer—the second most common cause of death in the United States and the leading driver of American healthcare costs. Color Health uses OpenAI’s APIs to integrate patient medical data with clinical knowledge. The outcome is a copilot application that creates customized, comprehensive treatment plans for providers to review and…
59dModel#gpt#coding
59d ago
Automating customer support agents
MavenAGI launches automated customer support agents powered by OpenAI MavenAGI(opens in a new window) is a new software company for the AI era. They recently launched an AI customer service agent, built on the flexibility of GPT‑4, which a number of companies like Tripadvisor, Clickup and Rho are already using to save time and better serve their customers. Customer support is expensive yet disappointing In today's customer service environment, no one is winning. Service representatives face repetitive work, demanding ticket volume, disorganized documentation, and delays in escalations. Customers get frustrated explaining issues and waiting for answers, and companies struggle to meet their desire for good customer service. Up to 90% of consumers expect(opens in a new window) an “immediate” customer service response, and more than half of consumers say(opens in a new window) they’d drop a company and use a…
59dModel#gpt
71d ago
GPT-5.2 derives a new result in theoretical physics
GPT‑5.2 derives a new result in theoretical physics In a new preprint, GPT‑5.2 proposed a formula for a gluon amplitude later proved by an internal OpenAI model and verified by the authors. We’ve published a new preprint showing that a type of particle interaction many physicists expected would not occur can in fact arise under specific conditions. The work focuses on gluons, the particles that carry the strong nuclear force. The preprint(opens in a new window) is available on arXiv and is being submitted for publication. In the meantime, we welcome feedback from the community. The preprint, titled “Single-minus gluon tree amplitudes are nonzero,” is authored by Alfredo Guevara (Institute for Advanced Study), Alex Lupsasca (Vanderbilt University and OpenAI), David Skinner (University of Cambridge), and Andrew Strominger (Harvard University), and Kevin Weil (OpenAI) on behalf of OpenAI. The preprint studies…
71dModel
86d ago
Retiring GPT-4o, GPT-4.1, GPT-4.1 mini, and OpenAI o4-mini in ChatGPT
Retiring GPT‑4o, GPT‑4.1, GPT‑4.1 mini, and OpenAI o4-mini in ChatGPT On February 13, 2026, alongside the previously announced retirement of GPT‑5 (Instant and Thinking), we will retire GPT‑4o, GPT‑4.1, GPT‑4.1 mini, and OpenAI o4-mini from ChatGPT. In the API, there are no changes at this time. While this announcement applies to several older models, GPT‑4o deserves special context. After we first deprecated it and later restored access during the GPT‑5 release, we learned more about how people actually use it day to day. We brought GPT‑4o back after hearing clear feedback from a subset of Plus and Pro users, who told us they needed more time to transition key use cases, like creative ideation, and that they preferred GPT‑4o’s conversational style and warmth. That feedback directly shaped GPT‑5.1 and GPT‑5.2, with improvements to personality, stronger support for creative ideation, and…
86dModel#gpt
93d ago
Inside GPT-5 for Work: How Businesses Use GPT-5
Launched just two and a half years ago, ChatGPT is used by workers across every industry, in every job function, and at companies of every size. Today, over a quarter of U.S. workers—and 45% of those with postgraduate degrees—report using ChatGPT for work. Enterprise tech has always followed a familiar pattern: big upfront costs, long rollouts, and slow adoption before the payoff. ChatGPT broke that mold when people ported it from their personal lives into their jobs. They didn’t need months of training or complicated onboarding; they just started using it to get meaningful work done. Already, we see clear signals. Everyone from scientists to marketers to operators is folding ChatGPT into everyday work. From debugging code to brainstorming campaigns, it’s becoming the first step in core workflows. This report shares new data from our own analysis, combined with peer-reviewed…
93dModel#gpt
93d ago
Inside Praktika's conversational approach to language learning
Inside Praktika's conversational approach to language learning Using GPT‑4.1 and GPT‑5.2, Praktika builds tutoring agents that adapt lessons based on learner behavior, progress, and conversation context. Results 24% Increase in Day-1 retention with GPT-powered learning experiences Results 2x Revenue growth from new multi-agent system Praktika was born from a deeply personal insight: language unlocks opportunity. Co-founders Adam Turaev, Anton Marin, and Ilya Chernyakov all grew up navigating new countries after their families immigrated in search of better opportunities. English quickly became essential, not just for school, but for work, mobility, and belonging. “Learning English was never just about communication,” Turaev said. “It opened doors to international work and career growth.” But traditional language education fell short. Despite years of study, the founders found that while they could read and write fluently, they struggled to speak confidently when it mattered most:…
93dModel#gpt
99d ago
Introducing ChatGPT Go, now available worldwide
Introducing ChatGPT Go, now available worldwide In August 2025, we introduced ChatGPT Go in India as a low-cost subscription designed to expand access to ChatGPT’s most popular features and help more people use advanced AI in their daily life. Since then, ChatGPT Go has rolled out to 170 additional countries, making it our fastest growing plan and among the most affordable AI subscription globally. In markets where Go has been available, we’ve seen strong adoption and regular everyday use for tasks like writing, learning, image creation, and problem-solving. This early momentum helped inform our decision to make ChatGPT Go available globally. Starting today, ChatGPT Go is rolling out everywhere ChatGPT is available. In the US, Go is available for $8 per month. With this launch, ChatGPT now offers three subscription tiers globally: - ChatGPT Go at $8 USD/month* - ChatGPT…
99dModel#gpt
128d ago
Introducing GPT-5.2-Codex
Introducing GPT‑5.2‑Codex The most advanced agentic coding model for professional software engineering and defensive cybersecurity. Today we’re releasing GPT‑5.2‑Codex, the most advanced agentic coding model yet for complex, real-world software engineering. GPT‑5.2‑Codex is a version of GPT‑5.2 further optimized for agentic coding in Codex, including improvements on long-horizon work through context compaction, stronger performance on large code changes like refactors and migrations, improved performance in Windows environments, and significantly stronger cybersecurity capabilities. As our models continue to advance along the intelligence frontier, we’ve observed that these improvements also translate to capability jumps in specialized domains such as cybersecurity. For example, just last week, a security researcher using GPT‑5.1‑Codex‑Max with Codex CLI found and responsibly disclosed(opens in a new window) a vulnerability in React that could lead to source code exposure. GPT‑5.2‑Codex has stronger cybersecurity capabilities than any model we’ve released…
128dModel#coding
128d ago
Addendum to GPT-5.2 System Card: GPT-5.2-Codex
Addendum to GPT‑5.2 System Card: GPT‑5.2‑Codex GPT‑5.2‑Codex is our most advanced agentic coding model yet for complex, real-world software engineering. A version of GPT‑5.2 optimized for agentic coding in Codex, it includes further improvements on long-horizon work through context compaction, stronger performance on project-scale tasks like refactors and migrations, and improved performance in Windows environments—and significantly stronger cybersecurity capabilities. This system card outlines the comprehensive safety measures implemented for GPT‑5.2‑Codex. It details both model-level mitigations, such as specialized safety training for harmful tasks and prompt injections, and product-level mitigations like agent sandboxing and configurable network access. GPT‑5.2‑Codex was evaluated under our Preparedness Framework. It is very capable in the cybersecurity domain but does not reach High capability on cybersecurity. We expect current trends of rapidly increasing capability to continue, and for models to cross the High cybersecurity threshold in the…
128dModel
128d ago
Introducing GPT-5.2-Codex
Introducing GPT‑5.2‑Codex The most advanced agentic coding model for professional software engineering and defensive cybersecurity. Today we’re releasing GPT‑5.2‑Codex, the most advanced agentic coding model yet for complex, real-world software engineering. GPT‑5.2‑Codex is a version of GPT‑5.2 further optimized for agentic coding in Codex, including improvements on long-horizon work through context compaction, stronger performance on large code changes like refactors and migrations, improved performance in Windows environments, and significantly stronger cybersecurity capabilities. As our models continue to advance along the intelligence frontier, we’ve observed that these improvements also translate to capability jumps in specialized domains such as cybersecurity. For example, just last week, a security researcher using GPT‑5.1‑Codex‑Max with Codex CLI found and responsibly disclosed(opens in a new window) a vulnerability in React that could lead to source code exposure. GPT‑5.2‑Codex has stronger cybersecurity capabilities than any model we’ve released…
128dModel#coding
130d ago
The new ChatGPT Images is here
Today, we’re releasing a new version of ChatGPT Images(opens in a new window), powered by our new flagship image generation model. Now, whether you’re creating something from scratch or editing a photo, you’ll get the output you’re picturing. It makes precise edits while keeping details intact, and generates images up to 4x faster. Alongside, we’re introducing a new Images feature(opens in a new window) within ChatGPT, designed to make image generation delightful—to spark inspiration and make creative exploration effortless. The new Images model is rolling out today in ChatGPT for all users, and is available in the API as GPT Image 1.5. The new Images experience in ChatGPT is also rolling out today for most users, with Business and Enterprise access coming later. Now, when you ask for edits to an uploaded image, the model adheres to your intent more…
135d ago
How Podium is arming 10,000+ SMBs with AI agents
How Podium is arming 10,000+ SMBs with AI agents By using GPT‑5.1 to power AI agents that capture leads and close jobs, Podium gives SMBs a faster path to growth. Results 300% year-over-year AI revenue growth. Results Podium’s AI agents influence billions in revenue by responding in under a minute and delivering 24/7 service Results 30% increase in revenue on average Results 45% increase in lead conversion Podium(opens in a new window) builds AI software for local businesses—such as HVAC providers, auto dealers, and medspas—that help them increase revenue by capturing and converting more demand and providing better service to their customers. After 11 years working side-by-side with small-to-medium (SMB)-sized operators, Podium saw the opportunity to harness AI and transform how local businesses market, sell, and grow by turning missed calls and slow replies into booked jobs, more revenue, and…
135dModel
137d ago
How Scout24 is building the next generation of real-estate search with AI
How Scout24 is building the next generation of real-estate search with AI Scout24 is using generative AI to reimagine how people discover where and how they want to live. Scout24 operates Germany’s largest real-estate platform, connecting seekers, homeowners, landlords and agents in one ecosystem. AI has supported areas like fraud detection, marketing efficiency, and property valuation for years, and the rise of powerful large language models created an opportunity to build something new for customers: an intelligent, conversational real-estate assistant. We sat down with Gertrud Kolb, Chief Technology Officer at Scout24, to hear how her team built a GPT‑5 powered search experience, what they learned about “intelligent interaction,” and how they balanced innovation with quality and trust before launch. “Be curious, be open, start doing things. Talk with each other, learn from each other—and have fun.” - GPT‑5 powering Scout24’s…
137dModel
151d ago
Inside JetBrains—the company reshaping how the world writes code
Inside JetBrains—the company reshaping how the world writes code By integrating OpenAI models across its tools and workflows, JetBrains is redefining how developers design, reason, and build with AI. If you don’t write software, you may not know JetBrains. If you do, you almost certainly use them. The company sits behind the scenes of modern development—powering the tools used by roughly 15M professional engineers around the world (88 of the Fortune 100) and creators of Kotlin (the official programming language for Android). If you’ve opened IntelliJ, PyCharm, WebStorm, GoLand, or Rider, you’ve used JetBrains. We sat down with Kris Kang, Head of Product at JetBrains, to explore how the team is using OpenAI models to change how developers build—not to replace what they do, but to raise the ceiling. “Developers don’t just write code. They review it, reason about it,…
151dModel#coding
152d ago
GPT-5 and the future of mathematical discovery
How GPT‑5 helped mathematician Ernest Ryu solve a 40-year-old open problem How a mathematician used GPT‑5 to explore ideas faster and find a path to solving a long-standing optimization problem. Every significant math problem has a story—someone who posed a question, someone who tried to solve it, someone who could not, and eventually, maybe, someone who could. The story behind answering one frustratingly simple optimization theory question(opens in a new window) is no different, except the researcher worked with a tool capable of quickly surfacing ideas and techniques from across a wide range of mathematical papers. With 15 years in applied mathematics and optimization theory, Professor Ernest Ryu of the University of California, Los Angeles (UCLA), was curious about the large language model (LLM) everyone was talking about. In 2023, he decided to test ChatGPT‑3.5’s ability to solve simple math…
152dModel
163d ago
Introducing GPT-5.1 for developers
Introducing GPT‑5.1 for developers Today we’re releasing GPT‑5.1 in the API platform, the next model in the GPT‑5 series that balances intelligence and speed for a wide range of agentic and coding tasks. GPT‑5.1 dynamically adapts how much time it spends thinking based on the complexity of the task, making the model significantly faster and more token-efficient on simpler everyday tasks. The model also features a “no reasoning” mode to respond faster on tasks that don’t require deep thinking, while maintaining the frontier intelligence of GPT‑5.1. To make GPT‑5.1 even more efficient, we’re releasing extended prompt caching for up to 24 hour cache retention, driving faster responses for follow-up questions at a lower cost. Our Priority Processing(opens in a new window) customers will also experience noticeably faster performance with GPT‑5.1 over GPT‑5. On coding, we’ve worked closely with startups like…
163dModel#coding
164d ago
GPT-5.1: A smarter, more conversational ChatGPT
GPT‑5.1: A smarter, more conversational ChatGPT We’re upgrading GPT‑5 while making it easier to customize ChatGPT. Starting to roll out today to everyone, beginning with paid users. Today we’re upgrading the GPT‑5 series with the release of: - GPT‑5.1 Instant: our most-used model, now warmer, more intelligent, and better at following your instructions. - GPT‑5.1 Thinking: our advanced reasoning model, now easier to understand and faster on simple tasks, more persistent on complex ones. We heard clearly from users that great AI should not only be smart, but also enjoyable to talk to. GPT‑5.1 improves meaningfully on both intelligence and communication style. We’re also making it easier for you to shape ChatGPT’s tone. Preferences on chat style vary—from person to person and even from conversation to conversation—so we’re introducing more intuitive and effective controls so ChatGPT can better match the…
164dModel#gpt
169d ago
Notion’s GPT‑5 rebuild unlocks autonomous AI workflows
Notion’s GPT‑5 rebuild unlocks autonomous AI workflows By rebuilding their agent system with GPT‑5, Notion created an AI workspace that can reason, act, and adapt across workflows. Results 7.6% Improvement over state-of-the-art models on outputs aligned with real user feedback In late 2022, within weeks of getting access to GPT‑4, Notion had already shipped a writing assistant, rolled out workspace-wide Q&A features, and integrated OpenAI models deeply across its search, content, and planning tools. But as models advanced - and users began asking agents to complete entire workflows - Notion’s team saw limits in their system architecture. The old pattern of prompting models to do isolated tasks was limiting the ceiling of what was capable on their platform. Agents needed to make decisions, orchestrate tools, and reason through ambiguity, and that shift required more than prompt engineering. “We didn’t want…
169dModel#agents
171d ago
How CRED is tapping AI to deliver premium customer experiences
How CRED is tapping AI to deliver premium customer experiences A conversation with Swamy Seetharaman of CRED. Our Executive Function series features perspectives from leaders driving transformation through AI. CRED is an India-based members-only club that rewards creditworthy individuals for their timely credit card bill payments by providing them with exclusive offers and access to premium experiences. Since 2018, CRED has built its brand on delivering seamless, secure, and beautifully designed digital products for India’s most affluent consumers. Today, over 15 million members use CRED each month. As the company scales, maintaining this level of quality has required new approaches to product development, service, and internal collaboration. We sat down with Swamy Seetharaman of CRED to understand how the company is working with OpenAI to create premium, concierge-like experiences at scale. CRED has always served India’s most discerning users. What…
171dModel
179d ago
Doppel’s AI defense system stops attacks before they spread
Doppel’s AI defense system stops attacks before they spread With GPT‑5 and reinforcement fine-tuning (RFT), Doppel cut analyst workloads by 80% and now mitigates threats in minutes instead of hours. Results 80% reduced analyst workflows Results 3x threat handling capacity A single impersonation site can launch, target thousands of users, and vanish in under an hour. That’s more than enough time for an attacker to do real damage. And with generative tools, they can spin up hundreds more just like it. Doppel was built to defend organizations from deepfakes and online impersonations, but quickly realized AI meant threats could scale infinitely. Attackers no longer needed to handcraft scams; they could generate endless variants of phishing kits, spoofed domains, and impersonation accounts in seconds. “Damage from phishing attacks can happen within minutes as they spread across social media and messaging channels.…
179dModel#fine-tuning
201d ago
Introducing AgentKit, new Evals, and RFT for agents
Today we’re launching AgentKit, a complete set of tools for developers and enterprises to build, deploy, and optimize agents. Until now, building agents meant juggling fragmented tools—complex orchestration with no versioning, custom connectors, manual eval pipelines, prompt tuning, and weeks of frontend work before launch. With AgentKit, developers can now design workflows visually and embed agentic UIs faster using new building blocks like: - Agent Builder: a visual canvas for creating and versioning multi-agent workflows - Connector Registry: a central place for admins to manage how data and tools connect across OpenAI products - ChatKit: a toolkit for embedding customizable chat-based agent experiences in your product We’re also expanding evaluation capabilities with new features like datasets, trace grading, automated prompt optimization, and third-party model support to measure and improve agent performance. Since releasing the Responses API and Agents SDK in…
205d ago
With GPT-5, Wrtn builds lifestyle AI for millions in Korea
With GPT‑5, Wrtn builds lifestyle AI for millions in Korea Wrtn scaled AI apps to 6.5 million users across Korea using GPT‑5—now they’re taking their playbook across East Asia. In Korea, millions of people turn to Wrtn’s AI apps to chat, learn, and get things done. What sets the company apart isn’t just better models—it’s how people keep discovering new ways to use its tools. When Wrtn(opens in a new window) launched, it focused on individual productivity, creating a set of AI-powered writing assistants and note-taking tools to make everyday work faster. But productivity was always just the beginning. The team’s bigger vision was to make AI a natural language interface for everyday life. In Korea, where digital culture already embraces character-driven platforms like KakaoTalk and LINE Friends, Wrtn saw an opportunity for what they now call Lifestyle AI: technology…
205dModel
207d ago
Sora 2 System Card
Sora 2 is our new state of the art video and audio generation model. Building on the foundation of Sora, this new model introduces capabilities that have been difficult for prior video models to achieve– such as more accurate physics, sharper realism, synchronized audio, enhanced steerability, and an expanded stylistic range. The model follows user direction with high fidelity, enabling the creation of videos that are both imaginative and grounded in real-world dynamics. Sora 2 expands the toolkit for storytelling and creative expression, while also serving as a step toward models that can more accurately simulate the complexity of the physical world. Sora 2 will be available via sora.com, in a new standalone iOS Sora app, and in the future it will be available via our API. Sora 2’s advanced capabilities require consideration of new potential risks, including nonconsensual use…
207dModel#multimodal
[PB]PyTorch Blog· 1 articlesvisit →
37d ago
TorchSpec: Speculative Decoding Training at Scale
Introduction Over the past year, large language models have rapidly expanded in both scale and capability. Frontier models such as Kimi K2.5, GLM 5, and Qwen 3.5 now operate with hundreds of billions of parameters and context windows stretching to millions of tokens, enabling long-context reasoning, agentic workflows, and complex tool use. As these models grow more capable, efficient inference has become one of the most critical systems challenges in LLM deployment. Speculative decoding is one of the most effective techniques for accelerating LLM generation. With speculative decoding, a lightweight draft model proposes several tokens ahead, while a larger target model verifies them in a single forward pass. When predictions are accepted, multiple tokens can be generated at once, improving throughput and latency. Recent approaches such as MTP(Multi Token Prediction) and EAGLE-3 demonstrate that well-trained draft models can deliver consistent…
37dModel#qwen#coding#trainingby TorchSpec team, Mooncake team
[RB]Replicate Blog· 1 articlesvisit →
186d ago
Extract text from documents and images with Datalab Marker and OCR
Extract text from documents and images with Datalab Marker and OCR Datalab’s state-of-the-art document parsing and text extraction models are now on Replicate. Marker turns PDF, DOCX, PPTX, images (and more!) into markdown or JSON. It formats tables, math, and code, extracts images, and can pull specific fields when you pass a JSON Schema. OCR detects text in ninety languages from images and documents, and returns reading order and table grids. The Marker model is based on the popular open source Marker project (29k Github stars) and OCR is based on Surya (19k Github stars). Run Marker and OCR on Replicate: Run Marker import replicate output = replicate.run( "datalab-to/marker", input={ "file": open("report.pdf", "rb"), "mode": "balanced", # fast / balanced / accurate "include_metadata": True, # return page-level JSON metadata }, ) print(output["markdown"][:400]) Run OCR import replicate output = replicate.run( "datalab-to/ocr", input={…
186dModel
[SWB]Simon Willison Blog· 11 articlesvisit →
3d ago
Is Claude Code going to cost $100/month? Probably not - it's all very confusing
Is Claude Code going to cost $100/month? Probably not—it’s all very confusing 22nd April 2026 Anthropic today quietly (as in silently, no announcement anywhere at all) updated their claude.com/pricing page (but not their Choosing a Claude plan page, which shows up first for me on Google) to add this tiny but significant detail (arrow is mine, and it’s already reverted): The Internet Archive copy from yesterday shows a checkbox there. Claude Code used to be a feature of the $20/month Pro plan, but according to the new pricing page it is now exclusive to the $100/month or $200/month Max plans. Update: don’t miss the update to this post, they’ve already changed course a few hours after this change went live. So what the heck is going on? Unsurprisingly, Reddit and Hacker News and Twitter all caught fire. I didn’t believe…
4d ago
Quoting Andreas Påhlsson-Notini
21st April 2026 AI agents are already too human. Not in the romantic sense, not because they love or fear or dream, but in the more banal and frustrating one. The current implementations keep showing their human origin again and again: lack of stringency, lack of patience, lack of focus. Faced with an awkward task, they drift towards the familiar. Faced with hard constraints, they start negotiating with reality. — Andreas Påhlsson-Notini, Less human AI agents, please. Recent articles - DeepSeek V4 - almost on the frontier, a fraction of the price - 24th April 2026 - Extract PDF text in your browser with LiteParse for the web - 23rd April 2026 - A pelican for GPT-5.5 via the semi-official Codex backdoor API - 23rd April 2026
4dModel
4d ago
scosman/pelicans_riding_bicycles
21st April 2026 - Link Blog scosman/pelicans_riding_bicycles (via) I firmly approve of Steve Cosman's efforts to pollute the training set of pelicans riding bicycles. (To be fair, most of the examples I've published count as poisoning too.) Recent articles - DeepSeek V4 - almost on the frontier, a fraction of the price - 24th April 2026 - Extract PDF text in your browser with LiteParse for the web - 23rd April 2026 - A pelican for GPT-5.5 via the semi-official Codex backdoor API - 23rd April 2026
4dModel#training
4d ago
Where's the raccoon with the ham radio? (ChatGPT Images 2.0)
Where’s the raccoon with the ham radio? (ChatGPT Images 2.0) 21st April 2026 OpenAI released ChatGPT Images 2.0 today, their latest image generation model. On the livestream Sam Altman said that the leap from gpt-image-1 to gpt-image-2 was equivalent to jumping from GPT-3 to GPT-5. Here’s how I put it to the test. My prompt: Do a where's Waldo style image but it's where is the raccoon holding a ham radio gpt-image-1 First as a baseline here’s what I got from the older gpt-image-1 using ChatGPT directly: I wasn’t able to spot the raccoon—I quickly realized that testing image generation models on Where’s Waldo style images (Where’s Wally in the UK) can be pretty frustrating! I tried getting Claude Opus 4.7 with its new higher resolution inputs to solve it but it was convinced there was a raccoon it couldn’t…
5d ago
Claude Token Counter, now with model comparisons
20th April 2026 - Link Blog Claude Token Counter, now with model comparisons. I upgraded my Claude Token Counter tool to add the ability to run the same count against different models in order to compare them. As far as I can tell Claude Opus 4.7 is the first model to change the tokenizer, so it's only worth running comparisons between 4.7 and 4.6. The Claude token counting API accepts any Claude model ID though so I've included options for all four of the notable current models (Opus 4.7 and 4.6, Sonnet 4.6, and Haiku 4.5). In the Opus 4.7 announcement Anthropic said: Opus 4.7 uses an updated tokenizer that improves how the model processes text. The tradeoff is that the same input can map to more tokens—roughly 1.0–1.35× depending on the content type. I pasted the Opus 4.7 system…
5dModel#claude
7d ago
Changes in the system prompt between Claude Opus 4.6 and 4.7
Changes in the system prompt between Claude Opus 4.6 and 4.7 18th April 2026 Anthropic are the only major AI lab to publish the system prompts for their user-facing chat systems. Their system prompt archive now dates all the way back to Claude 3 in July 2024 and it’s always interesting to see how the system prompt evolves as they publish new models. Opus 4.7 shipped the other day (April 16, 2026) with a Claude.ai system prompt update since Opus 4.6 (February 5, 2026). I had Claude Code take the Markdown version of their system prompts, break that up into separate documents for each of the models and then construct a Git history of those files over time with fake commit dates representing the publication dates of each updated prompt—here’s the prompt I used with Claude Code for the web.…
10d ago
Quoting John Gruber
15th April 2026 The real goldmine isn’t that Apple gets a cut of every App Store transaction. It’s that Apple’s platforms have the best apps, and users who are drawn to the best apps are thus drawn to the iPhone, Mac, and iPad. That edge is waning. Not because software on other platforms is getting better, but because third-party software on iPhone, Mac, and iPad is regressing to the mean, to some extent, because fewer developers feel motivated — artistically, financially, or both — to create well-crafted idiomatic native apps exclusively for Apple’s platforms. Recent articles - DeepSeek V4 - almost on the frontier, a fraction of the price - 24th April 2026 - Extract PDF text in your browser with LiteParse for the web - 23rd April 2026 - A pelican for GPT-5.5 via the semi-official Codex backdoor API…
10dModel#coding
10d ago
Gemini 3.1 Flash TTS
15th April 2026 Tool Gemini 3.1 Flash TTS — Convert text to natural-sounding speech using Google's Gemini 3.1 Flash TTS model with support for both single-speaker and multi-speaker conversation modes. The tool allows you to customize voice selection, apply directorial tags like `[whisper]` and `[short pause]` for dynamic delivery, and download the generated audio as a WAV file. Requires a valid Gemini API key to function. See my notes on Google's new Gemini 3.1 Flash TTS text-to-speech model. Recent articles - DeepSeek V4 - almost on the frontier, a fraction of the price - 24th April 2026 - Extract PDF text in your browser with LiteParse for the web - 23rd April 2026 - A pelican for GPT-5.5 via the semi-official Codex backdoor API - 23rd April 2026
10dModel#gemini
10d ago
Quoting Kyle Kingsbury
15th April 2026 I think we will see some people employed (though perhaps not explicitly) as meat shields: people who are accountable for ML systems under their supervision. The accountability may be purely internal, as when Meta hires human beings to review the decisions of automated moderation systems. It may be external, as when lawyers are penalized for submitting LLM lies to the court. It may involve formalized responsibility, like a Data Protection Officer. It may be convenient for a company to have third-party subcontractors, like Buscaglia, who can be thrown under the bus when the system as a whole misbehaves. — Kyle Kingsbury, The Future of Everything is Lies, I Guess: New Jobs Recent articles - DeepSeek V4 - almost on the frontier, a fraction of the price - 24th April 2026 - Extract PDF text in your browser…
10dModel#multimodal
11d ago
Trusted access for the next era of cyber defense
14th April 2026 - Link Blog Trusted access for the next era of cyber defense (via) OpenAI's answer to Claude Mythos appears to be a new model called GPT-5.4-Cyber: In preparation for increasingly more capable models from OpenAI over the next few months, we are fine-tuning our models specifically to enable defensive cybersecurity use cases, starting today with a variant of GPT‑5.4 trained to be cyber-permissive: GPT‑5.4‑Cyber. They're also extending a program they launched in February (which I had missed) called Trusted Access for Cyber, where users can verify their identity (via a photo of a government-issued ID processed by Persona) to gain "reduced friction" access to OpenAI's models for cybersecurity work. Honestly, this OpenAI announcement is difficult to follow. Unsurprisingly they don't mention Anthropic at all, but much of the piece emphasizes their many years of existing cybersecurity work…
12d ago
Quoting Bryan Cantrill
13th April 2026 The problem is that LLMs inherently lack the virtue of laziness. Work costs nothing to an LLM. LLMs do not feel a need to optimize for their own (or anyone's) future time, and will happily dump more and more onto a layercake of garbage. Left unchecked, LLMs will make systems larger, not better — appealing to perverse vanity metrics, perhaps, but at the cost of everything that matters. As such, LLMs highlight how essential our human laziness is: our finite time forces us to develop crisp abstractions in part because we don't want to waste our (human!) time on the consequences of clunky ones. — Bryan Cantrill, The peril of laziness lost Recent articles - DeepSeek V4 - almost on the frontier, a fraction of the price - 24th April 2026 - Extract PDF text in your…
12dModel
[TVA]The Verge AI· 5 articlesvisit →
2d ago
Claude is connecting directly to your personal apps like Spotify, Uber Eats, and TurboTax
Claude users can access more apps with Anthropic’s AI now thanks to new connectors for everything from hiking to grocery shopping. Anthropic already supported connecting numerous work-related apps to Claude, like Microsoft apps, but this expansion focuses on personal apps like Audible, Spotify, Uber, AllTrails, TripAdvisor, Instacart, TurboTax, and others. Claude is connecting directly to your personal apps like Spotify, Uber Eats, and TurboTax Anthropic says the new app connectors are available to all Claude users, ‘with mobile in beta.’ Anthropic says the new app connectors are available to all Claude users, ‘with mobile in beta.’ Some of these apps, such as Spotify, already have similar connectors in OpenAI’s ChatGPT. Once an app is connected, Claude will suggest relevant connected apps directly in your conversations, like using AllTrails for hike recommendations. Anthropic notes in its blog post announcing the new…
2dModel#claudeby Stevie Bonifield
2d ago
Meta is laying off 10 percent of its staff
Meta is planning to layoff around 10 percent of employees in May, according to a memo from the company’s chief people officer, Janelle Gale, published by Bloomberg. That means approximately 8,000 people will see their jobs cut. Meta will also be closing around 6,000 open roles, according to Gale. Meta is laying off 10 percent of its staff Meta is making the cuts to help ‘offset the other investments we’re making.’ Meta is making the cuts to help ‘offset the other investments we’re making.’ The cuts follow Meta’s significant investments in AI, including spending huge sums to hire top talent and build data centers. The company forecast in January that it will spend $115 billion to $135 billion in capital expenditures in 2026 — a significant increase from its $72.22 billion in capital expenditures for 2025. The increase is to…
2dModelby Jay Peters
2d ago
Anthropic’s Mythos breach was humiliating
Anthropic’s tightly controlled rollout of Claude Mythos has taken an awkward turn. After spending weeks insisting the AI model is so capable at cybersecurity that it is too dangerous to release publicly, it appears the model fell into the wrong hands anyway. Anthropic’s Mythos breach was humiliating There’s no good excuse for letting hackers into an AI model too dangerous for public release. There’s no good excuse for letting hackers into an AI model too dangerous for public release. According to Bloomberg, a “small group of unauthorized users” has had access to Mythos — whose existence was first revealed in a leak — since the day Anthropic announced plans to offer it to a select group of companies for testing. Anthropic says it is investigating. That’s a rough look for a company that has built its brand on taking AI…
2dModel#claudeby Robert Hart
3d ago
Google Meet will take AI notes for in-person meetings too
Google’s AI meeting notetaker is no longer limited to Google Meets — Gemini can also generate summaries and transcripts of in-person meetings now, as well as meetings on Zoom and Microsoft Teams, as first reported by 9to5Google. Google Meet will take AI notes for in-person meetings too Users can also get AI summaries and transcripts for meetings in Zoom and Teams. Users can also get AI summaries and transcripts for meetings in Zoom and Teams. Support for in-person meetings was previously limited to alpha users and only available on Android. Google’s support page for the feature notes that, “If a user who is not in person wants to join the meeting, you can transition the meeting to a normal video call.” The feature also works for impromptu meetings — Google says you “don’t need to be in a meeting room”…
3dModel#geminiby Stevie Bonifield
3d ago
Anthropic’s most dangerous AI model just fell into the wrong hands
Anthropic’s Mythos AI model, a powerful cybersecurity tool that the company said could be dangerous in the wrong hands, has been accessed by a “small group of unauthorized users,” Bloomberg reports. An unnamed member of the group, identified only as “a third-party contractor for Anthropic,” told the publication that members of a private online forum got into Mythos via a mix of tactics, utilizing the contractor’s access and “commonly used internet sleuthing tools.” Anthropic’s most dangerous AI model just fell into the wrong hands A Discord group has had access to the Mythos model for two weeks. A Discord group has had access to the Mythos model for two weeks. The Claude Mythos Preview is a new general-purpose model that’s capable of identifying and exploiting vulnerabilities “in every major operating system and every major web browser when directed by a…
3dModelby Jess Weatherbed
[WA]Wired AI· 1 articlesvisit →
8d ago
OpenAI Executive Kevin Weil Is Leaving the Company
Kevin Weil, OpenAI’s former chief product officer who was recently tapped to build a new AI workspace for scientists, Prism, is leaving the company, WIRED has confirmed. Weil was previously an early executive leading product at Instagram. “Today is my last day at OpenAI, as OpenAI for Science is being decentralized into other research teams,” Weil said in a social media post on Friday, shortly after WIRED reported his departure. “It’s been a mind-expanding two years, from Chief Product Officer to joining the research team and starting OpenAI for Science.” Weil did not immediately respond to a request for comment from WIRED. OpenAI is also sunsetting Prism, which the company launched as a web app in January to give scientists a better way to work with AI. The company is folding the roughly 10-person team behind it under OpenAI’s head…
8dModel#gptby Maxwell Zeff