★ TOP STORY[ TVA ]Agents·2d ago

You’re about to feel the AI money squeeze

Earlier this month, millions of OpenClaw users woke up to a sweeping mandate: The viral AI agent tool, which this year took the worldwide tech industry by storm, had been severely restricted by Anthropic. Anthropic, like other leading AI labs, was under immense pressure to lessen the strain on its systems and start turning a profit. So if the users wanted its Claude AI to power their popular agents, they’d have to start paying handsomely for the privilege. “Our subscriptions weren’t built for the usage patterns of these third-party tools,” wrote Boris Cherny, head of Claude Code, on X. “We want to be intentional in managing our growth to continue to serve our customers sustainably long-term. This change is a step toward that.” The announcement was a sign of the times. Investors have poured hundreds of billions of dollars into…

The Verge AIread →

▲ trending · last 48hview all →

▾[AOA(]Ahead of AI (Sebastian Raschka)· 1 articlesvisit →

21d ago

Components of A Coding Agent

Components of A Coding Agent How coding agents use tools, memory, and repo context to make LLMs work better in practice In this article, I want to cover the overall design of coding agents and agent harnesses: what they are, how they work, and how the different pieces fit together in practice. Readers of my Build a Large Language Model (From Scratch) and Build a Large Reasoning Model (From Scratch) books often ask about agents, so I thought it would be useful to write a reference I can point to. More generally, agents have become an important topic because much of the recent progress in practical LLM systems is not just about better models, but about how we use them. In many real-world applications, the surrounding system, such as tool use, context management, and memory, plays as much of a…

21dAgents#agents#codingby Sebastian Raschka, PhD

▾[AWS]AWS Machine Learning Blog· 3 articlesvisit →

8d ago

From hours to minutes: How Agentic AI gave marketers time back for what matters

Artificial Intelligence From hours to minutes: How Agentic AI gave marketers time back for what matters Your marketing team loses hours to page assembly, coordination emails, and review cycles. These manual workflows keep teams from their most important work: identifying what problems customers face, crafting messages that resonate, and building campaigns that drive meaningful engagement. In this post, we share how AWS Marketing’s Technology, AI, and Analytics (TAA) team worked with Gradial to build an agentic AI solution on Amazon Bedrock for accelerating content publishing workflows. The solution reduced webpage assembly time from up to four hours to approximately ten minutes (a reduction of over 95%) while maintaining quality standards across enterprise content management systems (CMS). Our marketing teams can now publish content faster and more consistently, freeing them to focus on finding more effective ways to reach and serve…

8dAgents#agentsby Ishara Premadasa

10d ago

Rede Mater Dei de Saúde: Monitoring AI agents in the revenue cycle with Amazon Bedrock AgentCore

Artificial Intelligence Rede Mater Dei de Saúde: Monitoring AI agents in the revenue cycle with Amazon Bedrock AgentCore This post is cowritten by Renata Salvador Grande, Gabriel Bueno and Paulo Laurentys at Rede Mater Dei de Saúde. The growing adoption of multi-agent AI systems is redefining critical operations in healthcare. In large hospital networks, where thousands of decisions directly impact cash flow, service delivery times, and the risk of claim denials, the ability to monitor, track, and govern AI agents has become essential for operational sustainability. This is the journey of Rede Mater Dei de Saúde, which is implementing its suite of 12 AI agents using Amazon Bedrock AgentCore, a comprehensive service that provides agent runtime, tool integration, memory management, and built-in observability for production AI agents. About Rede Mater Dei de Saúde With 45 years of history, Rede Mater…

10dAgents#agents#observabilityby Renata Salvador Grande

11d ago

Spring AI SDK for Amazon Bedrock AgentCore is now Generally Available

Artificial Intelligence Spring AI SDK for Amazon Bedrock AgentCore is now Generally Available Agentic AI is transforming how organizations use generative AI, moving beyond prompt-response interactions to autonomous systems that can plan, execute, and complete complex multi-step tasks. While early proof of concepts in Agentic AI spaces excite business stakeholders, scaling them to production requires addressing scalability, governance, and security challenges. Amazon Bedrock AgentCore is an Agentic AI platform to build, deploy, and operate agents at scale using any framework and any model. Java developers want to build AI agents using known Spring patterns, but production deployment requires infrastructure that’s complex to implement from scratch. Amazon Bedrock AgentCore provides building blocks like managed runtime infrastructure (scalability, reliability, security, observability), short- and long-term memory, browser automation, sandboxed code execution, and evaluations. Integrating these capabilities into a Spring application currently requires writing…

11dAgents#agents#fine-tuning#coding#open-sourceby Andrei Shakirin

▾[CB]CrewAI Blog· 12 articlesvisit →

3d ago

Latest

How a Healthcare Provider Cuts Nurse Intake Work by 80% with Agentic AI Discover how healthcare automates patient intake using agentic AI, cutting nurse intake time by up to 80% to improve efficiency and patient experience. How One E-Commerce Giant Automates Returns and Refunds with Agentic AI E-commerce returns automation cuts costs and boosts refunds with multi-agent AI handling classification, verification, and response drafting efficiently. Agent Harnesses Are Dead. Long Live Agent Harnesses. I said it at a Dev Day 2025 (DeepLearning.AI conference) last year: Frameworks are cheap. A few people in the audience looked uncomfortable. But I think the statement aged well. Now it's not just frameworks. Harnesses are getting the same treatment, and the cycle is only getting How a Global CPG Automates Supply Chain Demand Forecasting with Agentic AI Discover how CPG supply chains use agentic AI…

3dAgents#agents

3d ago

How a Healthcare Provider Cuts Nurse Intake Work by 80% with Agentic AI Discover how healthcare automates patient intake using agentic AI, cutting nurse intake time by up to 80% to improve efficiency and patient experience. Manual Intake Overloads Nurses and Patients Three nurses spend four hours daily on patient intake at many healthcare providers. These clinicians spend a third of their shift reading, assessing insurance eligibility, and routing forms instead of delivering care. It exhausts staff and costs money. When eligibility checks lag or forms are misrouted, patient satisfaction falls and costs rise. Intake Bottlenecks Waste Thousands of Nursing Hours On average, nurses at large health systems spend 4 hours each day on intake forms. When handling thousands of patients, this wastes thousands of nursing hours weekly. Manual workflows cause insurance verification errors above 20%, triggering denials and delayed…

3dAgents#agents

8d ago

How One E-Commerce Giant Automates Returns and Refunds with Agentic AI E-commerce returns automation cuts costs and boosts refunds with multi-agent AI handling classification, verification, and response drafting efficiently. Returns Were a Cost Black Hole This major e-commerce company handles 50,000 orders daily. That means thousands of returns and refund requests overwhelm customer support. Historically this has been slow, inconsistent, and exhausting. They spent $2 million yearly on staff just to keep up, yet still faced errors, delays, and frustrated shoppers. Returns aren’t a side problem. Automated solutions struggle because returns involve order history, shifting policies, fraud risk, and customer nuances. Manual or rigid systems collapse under complexity. Return rates from 15% to 30% in some sectors increase costs, often $20 per return just to process. The pressure to cut costs while improving service is intense. A Crew of Specialist…

8dAgents#agents

11d ago

Agent Harnesses Are Dead. Long Live Agent Harnesses. João (Joe) Moura Apr 14, 2026

Agent Harnesses Are Dead. Long Live Agent Harnesses. I said it at a Dev Day 2025 (DeepLearning.AI conference) last year: Frameworks are cheap. A few people in the audience looked uncomfortable. But I think the statement aged well. Now it's not just frameworks. Harnesses are getting the same treatment, and the cycle is only getting faster. Building anything is getting cheaper by the month. Vibe-code an app over the weekend. Spin up an agent with a few API calls. The distance between idea and working prototype has collapsed to almost nothing, and will collapse to nothing faster than most expect. And yet the industry is spending most of its energy debating what to call the building tools. We built our harness back in 2023 when we launched, what people now all agree to be the right abstraction, of multi agentic…

11dAgents#agents

11d ago

How a Global CPG Automates Supply Chain Demand Forecasting with Agentic AI Alex Clay Apr 14, 2026

How a Global CPG Automates Supply Chain Demand Forecasting with Agentic AI Discover how CPG supply chains use agentic AI to automate demand forecasting, boosting accuracy and speed while cutting manual effort. Excel-Built Forecasts Stall Supply Chains In a leading global beverage company, weekly demand forecasting felt like a spreadsheet battlefield. Demand planners stitched data from SAP, Databricks, and scattered Excel files. The manual cycle was slow, clunky, and error-prone. Forecast accuracy hovered around 70%. A 30% SKU-level error causes costly stockouts or bloated inventory. This problem spans the industry,CPG forecasting errors cost billions, erode margins, and hurt customer satisfaction. Why Manual Forecasting Fails Supply Chains This multinational brewer, with dozens of brands, struggled with Excel and siloed data. The fractured picture stifled supply chain agility. Forecasting was guesswork masked as science, with 25-35% errors annually. Global inventory distortion costs…

11dAgents#agents

19d ago

How Enterprise AI SaaS Closes Adoption Gaps with Multi-Agent Crews João (Joe) Moura Apr 6, 2026

How Enterprise AI SaaS Closes Adoption Gaps with Multi-Agent Crews Enterprise AI SaaS automates customer enablement with a 5-agent workflow to close adoption gaps, reduce churn, and scale training across industries Most enterprise AI customers barely use what they paid for Here's a pattern I keep seeing: company buys an AI platform, gets through a painful onboarding, builds one or two use cases, then stalls. The FDE team is stretched across too many accounts to go deep on any of them. Adoption takes six months when it should take weeks. By the time the customer sees real value, if they ever do, renewal is already at risk. The instinct is to throw more training at it. More docs, more workshops, more enablement calls. It doesn't work. You can't train your way out of a product that's hard to adopt. Manual…

19dAgents#agents#training

23d ago

You're building agent security in the wrong order João (Joe) Moura Apr 2, 2026

You're building agent security in the wrong order The agent security market woke up. In two weeks, I've seen companies shipping runtime identity enforcement for autonomous agents, a full platform for discovering shadow agents, revoking permissions in real time, and much more. Serious teams behind them. I respect the work. But they're all solving step three. The sequence problem I've spent the last two years watching enterprise teams deploy agents and we've processed billions of executions at CrewAI, it seems most of the pattern is similar for sequencing problems. A team gets the mandate: "deploy AI agents." The CISO immediately asks about security. The board wants compliance answers. So the team starts there: IAM policies, authorization scopes, runtime monitoring. They build a security stack around their agents.Then they actually try to run the agents. And the agents can't reliably find…

23dAgents#agents

25d ago

CrewAI Selected for the Enterprise Tech 30 João (Joe) Moura Mar 31, 2026

CrewAI Selected for the Enterprise Tech 30 Year One: Vision. Year Two: Proof. For the second year in a row, CrewAI has been selected for Enterprise Tech 30! Exciting! If you're not familiar with the ET30, here's what makes it different from every other "top companies" list: it's not editorial picks, it's not pay-to-play, it's a structured vote by 98 investors across 85 firms managing a combined $2.6 trillion in assets, between small and extremely large organizations, these are the investors who fund the companies building the next decade of enterprise technology and they vote based on what they're seeing in the market. This year, CrewAI was listed under Agent Development within the AI Infrastructure & Development category, that category now represents 43% of the entire ET30, in 2019, it was 0%. Let that land for a second, zero to…

25dAgents#agents#multimodal

39d ago

Orchestrating Self-Evolving Agents with CrewAI and NVIDIA NemoClaw João (Joe) Moura Mar 17, 2026

Orchestrating Self-Evolving Agents with CrewAI and NVIDIA NemoClaw The technology landscape in early 2026 is undergoing a shift similar to the rise of the modern web stack. AI is moving beyond simple prompt-based interactions toward autonomous, continuously evolving agents. These long-running agents, which many experienced recently on a more personal level through “claws”, can break down goals, execute code, and operate independently for extended periods. While this unlocks major productivity gains, it also creates new trust and security challenges for enterprises. Running autonomous agents safely requires strong orchestration, secure runtimes, and high governance levels. The combination of CrewAI for agent orchestration and NVIDIA NemoClaw for secure execution provides the capability, autonomy, and safety needed to run these systems in production. A key moment in this shift came with the OpenClaw project in early 2026, which demonstrated that self-evolving agents could…

39dAgents#agents#gpu

187d ago

CrewAI OSS 1.0 - We are going GA

1.4 Billion Agentic Automations, 60 % of the Fortune 500, 40k GitHub stars. CrewAI OSS is going GA During our first year at CrewAI, we set ourselves a goal. Build the framework capable of orchestrating a billion autonomous agents — securely, reliably, at scale. Little over a year later, we’ve done more than that: Today we’re announcing CrewAI OSS v1.0, the same open-source core now powering 1.4 billion Agentic Automations across the world’s largest enterprises, on all degrees of complexity. Two years ago we bet that large-language models shouldn’t be micromanaged—they should be delegated real work. Last week, Andrew Ng called us visionaries for betting early on one idea: As models get better, engineers will remove the scaffolding [all rules and forced structure] and delegate more, not less. Andrew Ng invested in CrewAI over a year ago — joined by…

187dAgents#agents

338d ago

Creating a center of gravity for the Agentic AI ecosystem

Creating a center of gravity for the Agentic AI ecosystem First things first, thank you to everyone who attended our very first launch week webinar! It was absolutely incredible to see 2,600+ people register for a webinar in a matter of days, and even more so to see everyone immediately introducing themselves, connecting with other community members and asking our speakers both technical and non-technical questions. It was a testament to the excitement and enthusiasm we all share. It would be the understatement of the century to say there is a lot going on when it comes to agents. There are a lot of frameworks, tools and platforms for everything from orchestration and memory to monitoring and evaluation – and everything in between for that matter. At CrewAI, we want to help by creating a center of gravity for everyone…

338dAgents#agents

341d ago

CrewAI Factory

Unlocking agent-native transformation with CrewAI Factory and NVIDIA Day 1 of CrewAI Launch Week 01 (May 2025) CrewAI Enterprise makes it easy for organizations to build, run and monitor Agentic AI workflows in the cloud – and to automate business processes on day one. However, in practice, large enterprises often need to run AI applications on their own infrastructure

341dAgents#agents

▾[FAB]Fireworks AI Blog· 3 articlesvisit →

157d ago

11/19/2025 50 Trillion Tokens Per Day: The State of Agent Environments

TL;DR — Agents and LLMs are processing 1.5 quadrillion token per month, and reached a massive scale over the past year. But the real story for the next 12 months isn't about which models are smartest—it's about the complex production environments where agents actually do work, optimizing not only the underlying models but the tools, workflows, and data in their environments. What emerges is a clear hierarchy where the ability to create high-quality environments is a determinant of market success—the companies building complete environments rather than just LLM wrappers are capturing the most value. For the last two years, the conversation around AI agents has been dominated by potential. Today, that conversation has fundamentally shifted from potential to production. Businesses have moved beyond prototyping, shipping agents that handle customer support, write enterprise-quality code, and manage complex workflows at scale. The…

157dAgents#agents

231d ago

6/9/2025 Reinforcement Fine Tuning (Beta): Train expert open models to surpass closed frontier models

Today, we’re excited to announce the beta release of Reinforcement Fine-Tuning (RFT), a powerful new technique to create expert models for complex tasks across agentic reasoning, function calling, coding, and more. RFT can improve model quality with just a few examples. Compared with closed frontier models, our alpha users have been able to train open models to: Fireworks makes it easy to train expert models with RFT, by specifying an evaluator function that grades model outputs, with no infrastructure setup required! RFT on Fireworks supports frontier open models like Llama, Phi3/4, Qwen 2.5/3 and even DeepSeek V3 and R1. You can get started here. Training models using RFT on Fireworks is free of charge for the next 2 weeks! RFT works best for tasks with clear answers that can be graded or verified for correctness, by building on the concept…

231dAgents#agents#fine-tuning#coding

341d ago

5/19/2025 Agentic AI Systems

AI is evolving from passive responders into proactive agents that can perceive, reason, and act autonomously. We’re witnessing the rise of agentic systems - AI that goes beyond generating text responses to planning, executing, and learning across complex, multi-step tasks. Unlike traditional models, which respond to prompts or follow hardcoded scripts, agentic AI systems possess a sense of initiative. They can independently interpret goals, decide next actions, and iteratively refine their behavior over time. The result? AI that behaves less like a static program and more like a self-directed assistant or collaborator. This transformation isn’t theoretical. Today’s agents can book meetings, debug code, orchestrate workflows, and even collaborate with other agents - all with minimal human intervention. It’s a shift that promises not just increased productivity, but a fundamentally different way to build software. At the core, agentic AI systems…

341dAgents#agents

▾[H(B]Haystack (deepset) Blog· 1 articlesvisit →

201d ago

User Story Bilge Yücel DevRel Engineer Kelsey Sorrels Data Scientist at Telus AG How TAC Built an Agentic Chatbot with Haystack to Transform Trade Promotions Workflows See how TELUS Agriculture & Consumer Goods (TAC) gives users unprecedented access to their data with safety in mind October 6, 2025

How TAC Built an Agentic Chatbot with Haystack to Transform Trade Promotions Workflows See how TELUS Agriculture & Consumer Goods (TAC) gives users unprecedented access to their data with safety in mind October 6, 2025When a leading company like TELUS Agriculture & Consumer Goods (TAC), with a strong presence in agriculture and consumer goods, turns to AI to streamline complex processes, it’s worth taking a closer look. TELUS Agriculture & Consumer Goods helps businesses optimize everything from supply chains to retail operations. One of their latest innovations: an agentic chatbot powered by Haystack that simplifies how users interact with their trade promotions platform. We sat down with the team behind this project to learn how they built it, why they chose Haystack, and what advice they have for other teams looking to implement Retrieval-Augmented Generation (RAG) and agent-based AI solutions…

201dAgents#agents#safety

▾[HF]Hugging Face Blog· 17 articlesvisit →

88d ago

Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective

Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective LinkedIn is an AI-first company that's built agents to help professionals be more successful. In this setting, models must reason over incomplete information, interact with structured services, and adapt to evolving user intent across multiple steps rather than produce a single static response. These capabilities are especially critical for agents that support the goals of recruiters, job and knowledge seekers, and learners end users, such as retrieving information, refining queries, coordinating tools, and executing multi-step workflows. By learning robust decision policies through interaction, agentic RL provides a principled foundation for building scalable, reliable, and adaptable AI systems through end-to-end optimization. The GPT-OSS model has shown comparable performance to OpenAI o3-mini and o4-mini [ref], but its suitability for agentic reinforcement learning training has not yet been validated. Most recent work focuses on…

88dAgents#agents#training

100d ago

Open Responses: What you need to know

Open Responses: What you need to know The era of the chatbot is long gone, and agents dominate inference workloads. Developers are shifting toward autonomous systems that reason, plan, and act over long-time horizons. Despite this shift, much of the ecosystem still uses the Chat Completion format, which was designed for turn-based conversations and falls short for agentic use cases. The Responses format was designed to address these limitations, but it is closed and not as widely adopted. The Chat Completion format is still the de facto standard despite the alternatives. This mismatch between the agentic workflow requirements and entrenched interfaces motivates the need for an open inference standard. Over the coming months, we will collaborate with the community and inference providers to implement and adapt Open Responses to a shared format, practically capable of replacing chat completions. Open Responses…

100dAgents#agents#inference#coding

110d ago

NVIDIA brings agents to life with DGX Spark and Reachy Mini

NVIDIA brings agents to life with DGX Spark and Reachy Mini Today at CES 2026, NVIDIA unveiled a world of new open models to enable the future of agents, online and in the real world. From the recently released NVIDIA Nemotron reasoning LLMs to the new NVIDIA Isaac GR00T N1.6 open reasoning VLA and NVIDIA Cosmos world foundation models, all the building blocks are here today for AI Builders to build their own agents. But what if you could bring your own agent to life, right at your desk? An AI buddy that can be useful to you and process your data privately? In the CES keynote today, Jensen Huang showed us how we can do exactly that, using the processing power of NVIDIA DGX Spark with Reachy Mini to create your own little office R2D2 you can talk to…

110dAgents#agents#gpu

142d ago

DeepMath: A lightweight math reasoning Agent with smolagents

DeepMath: A lightweight math reasoning Agent with smolagents By Intel AI Software Group DeepMath is an aligned math reasoning agent built on Qwen3-4B Thinking and fine-tuned with GRPO (Group Relative Policy Optimization). Instead of verbose text, the model emits tiny Python snippets for intermediate steps, runs them in a secure sandbox, and folds the results back into its reasoning, reducing errors and output length. The agent is implemented using the smolagents library. We evaluate DeepMath on four math datasets: MATH500, AIME, HMMT, and HLE, and show that: 🤖 The math agent alone reduces output lengths by up to 66%, while often improving accuracy. ⚡ GRPO training improves the agent performance even further, in almost all benchmarks. 👉 Code and evaluation scripts: https://github.com/IntelLabs/DeepMath 👉 Model: https://huggingface.co/Intel/deepmath-v1 Why DeepMath? Large language models (LLMs) have advanced reasoning capabilities, but mathematical problem-solving remains challenging;…

142dAgents#agents

177d ago

Aligning to What? Rethinking Agent Generalization in MiniMax M2

Aligning to What? Rethinking Agent Generalization in MiniMax M2 The Real Agent Alignment Problem: Benchmarks or Reality? If you've worked with LLM Agents, you've felt this pain: the same model can feel brilliant in one framework and useless in another. An agent might crush a tool-use leaderboard but fail spectacularly at a simple, real-world task. This gap between benchmark performance and practical usability is one of the biggest challenges in the field. When we designed M2, we knew we had to tackle this problem head-on. This led us to two core, and sometimes conflicting, objectives: - Excel on Open-Source Benchmarks. Benchmarks are essential for measuring "pure" capabilities. A benchmark like BrowseComp, for instance, tests for sophisticated search skills. While users will rarely ask a question as contrived as, "Find the paper where the third letter of the nth author's name…

177dAgents#agents

184d ago

Building the Open Agent Ecosystem Together: Introducing OpenEnv

Building the Open Agent Ecosystem Together: Introducing OpenEnv Agentic environments define everything an agent needs to perform a task: the tools, APIs, credentials, execution context, and nothing else. They bring clarity, safety, and sandboxed control to agent behavior. These environments can be used for both training and deployment, and serve as the foundation for scalable agentic development. The Problem Modern AI agents can act autonomously across thousands of tasks. However, a large language model isn’t enough to get those tasks to actually run — it needs access to the right tools. Exposing millions of tools directly to a model isn’t reasonable (or safe). Instead, we need agentic environments: secure, semantically clear sandboxes that define exactly what’s required for a task, and nothing more. These environments handle the critical details: - Clear semantics about what a task needs - Sandboxed execution…

184dAgents#agents

208d ago

Accelerating Qwen3-8B Agent on Intel® Core™ Ultra with Depth-Pruned Draft Models

Accelerating Qwen3-8B Agent on Intel® Core™ Ultra with Depth-Pruned Draft Models TL;DR: Qwen3-8B is one of the most exciting recent releases—a model with native agentic capabilities, making it a natural fit for the AIPC. With OpenVINO.GenAI, we’ve been able to accelerate generation by ~1.3× using speculative decoding with a lightweight Qwen3-0.6B draft. By using speculative decoding and applying a simple pruning process to the draft, we pushed the speedup even further to ~1.4× We wrapped this up by showing how these improvements can be used to run a fast, local AI Agent with 🤗 smolagents Qwen3 Qwen3-8B is part of the latest Qwen family, trained with explicit agentic behaviors. It supports tool invocation, multi-step reasoning, and long-context handling capabilities, that make it well-suited for complex agent workflows. When integrated with frameworks like Hugging Face 🤗smolagents, QwenAgent, or AutoGen, it enables…

208dAgents#qwen#agents

289d ago

ScreenEnv: Deploy your full stack Desktop Agent

ScreenEnv: Deploy your full stack Desktop Agent What is ScreenEnv? Imagine you need to automate desktop tasks, test GUI applications, or build an AI agent that can interact with software. This used to require complex VM setups and brittle automation frameworks. ScreenEnv changes this by providing a sandboxed desktop environment that runs in a Docker container. Think of it as a complete virtual desktop session that your code can fully control - not just clicking buttons and typing text, but managing the entire desktop experience including launching applications, organizing windows, handling files, executing terminal commands, and recording the entire session. Why ScreenEnv? - 🖥️ Full Desktop Control: Complete mouse and keyboard automation, window management, application launching, file operations, terminal access, and screen recording - 🤖 Dual Integration Modes: Support both Model Context Protocol (MCP) for AI systems and direct Sandbox…

289dAgents#agents

326d ago

Holo1: New family of GUI automation VLMs powering GUI agent Surfer-H

Holo1: New family of GUI automation VLMs powering GUI agent Surfer-H Surfer-H, a web-native agent that interacts with browsers like a human relies on the Holo1. Holo1 Holo1 is the first family of open-source Action VLMs designed specifically for deep web UI understanding and precise localization. The family includes Holo1-3B and Holo1-7B models, with the latter achieving 76.2% average accuracy on common UI localization benchmarks—the highest among small-size models. H Company has released these models with open-source on Hugging Face, along with the WebClick benchmark containing 1,639 human-like UI tasks. Use with Transformers Holo1 models are based on the Qwen2.5-VL architecture, and are fully compatible with transformers. Here we provide a simple usage example. You can load the model and the processor as follows. from transformers import AutoModelForImageTextToText, AutoProcessor import torch model = AutoModelForImageTextToText.from_pretrained( "Hcompany/Holo1-3B", torch_dtype=torch.bfloat16, attn_implementation="flash_attention_2", device_map="auto", ) processor…

326dAgents#agents

332d ago

CodeAgents + Structure: A Better Way to Execute Actions

CodeAgents + Structure: A Better Way to Execute Actions Figure 1: Accuracy comparison of three approaches: Structured CodeAgent (blue), CodeAgent (orange), and ToolCallingAgent (gray) on SmolBench (GAIA, MATH, SimpleQA, and Frames). Error bars represent 95% Confidence Intervals. 🤔 The Evolution of Agent Actions AI agents need to take actions in the world - whether that's calling APIs, processing data, or reasoning through complex problems. How agents express these actions has evolved through several paradigms: Traditional JSON Agent: Agents generate structured JSON to call tools. {"tool": "get_weather", "arguments": {"city": "Paris"}} These agents operate by selecting from a list of predefined tools and generating JSON-formatted calls. This method for calling tools has been popularized by OpenAI's function calling API, and has since then been the most widely used method to call tools. It is reliable, but limited by: - A limited set…

332dAgents#agents#observability

337d ago

Tiny Agents in Python: a MCP-powered agent in ~70 lines of code

Tiny Agents in Python: an MCP-powered agent in ~70 lines of code AGENTS.md standard. 🥳NEW: tiny-agents now supports Inspired by Tiny Agents in JS, we ported the idea to Python 🐍 and extended the huggingface_hub client SDK to act as a MCP Client so it can pull tools from MCP servers and pass them to the LLM during inference. MCP (Model Context Protocol) is an open protocol that standardizes how Large Language Models (LLMs) interact with external tools and APIs. Essentially, it removed the need to write custom integrations for each tool, making it simpler to plug new capabilities into your LLMs. In this blog post, we'll show you how to get started with a tiny Agent in Python connected to MCP servers to unlock powerful tool capabilities. You'll see just how easy it is to spin up your own…

337dAgents#agents#coding

365d ago

Tiny Agents: an MCP-powered agent in 50 lines of code

Tiny Agents: an MCP-powered agent in 50 lines of code Tiny Agents in Python . New! (May 23, '25) If you prefer Python, check out the companion post Over the past few weeks, I've been diving into MCP (Model Context Protocol) to understand what the hype around it was all about. My TL;DR is that it's fairly simple, but still quite powerful: MCP is a standard API to expose sets of Tools that can be hooked to LLMs. It is fairly simple to extend an Inference Client – at HF, we have two official client SDKs: @huggingface/inference in JS, and huggingface_hub in Python – to also act as a MCP client and hook the available tools from MCP servers into the LLM inference. But while doing that, came my second realization: Once you have an MCP Client, an Agent is…

365dAgents#agents#coding

421d ago

Trace & Evaluate your Agent with Arize Phoenix

Trace & Evaluate your Agent with Arize Phoenix Building an agent is one thing; understanding its behavior is another. That’s where tracing and evaluations come in. Tracing allows you to see exactly what your agent is doing step by step—what inputs it receives, how it processes information, and how it arrives at its final output. Think of it like having an X-ray for your agent’s decision-making process. Meanwhile, evaluation helps you measure performance, ensuring your agent isn’t just functional, but actually effective. Is it producing the right answers? How relevant are its findings at each step? How well-crafted is the agent’s response? Does it align with your goals? Arize Phoenix provides a centralized platform to trace, evaluate, and debug your agent's decisions in real time—all in one place. We’ll dive into how you can implement them to refine and optimize…

421dAgents#agents#observability

456d ago

We now support VLMs in smolagents!

We just gave sight to smolagents You hypocrite, first take the log out of your own eye, and then you will see clearly to take the speck out of your brother's eye. Matthew 7, 3-5 TL;DR We have added vision support to smolagents, which unlocks the use of vision language models in agentic pipelines natively. Table of Contents Overview In the agentic world, many capabilities are hidden behind a vision wall. A common example is web browsing: web pages feature rich visual content that you never fully recover by simply extracting their text, be it the relative position of objects, messages transmitted through color, specific icons… In this case, vision is a real superpower for agents. So we just added this capability to our smolagents! Teaser of what this gives: an agentic browser that navigates the web in complete autonomy!…

456dAgents#agents#multimodal

733d ago

Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent

Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent Introduction We're excited to share Jack of All Trades (JAT), a project that aims to move in the direction of a generalist agent. The project started as an open reproduction of the Gato (Reed et al., 2022) work, which proposed to train a Transformer able to perform both vision-and-language and decision-making tasks. We thus started by building an open version of Gato’s dataset. We then trained multi-modal Transformer models on it, introducing several improvements over Gato for handling sequential data and continuous values. Overall, the project has resulted in: - The release of a large number of expert RL agents on a wide variety of tasks. - The release of the JAT dataset, the first dataset for generalist agent training. It contains hundreds of thousands of expert trajectories collected…

733dAgents#agents

1173d ago

Introducing ⚔️ AI vs. AI ⚔️ a deep reinforcement learning multi-agents competition system

Introducing ⚔️ AI vs. AI ⚔️ a deep reinforcement learning multi-agents competition system This tool, hosted on Spaces, allows us to create multi-agent competitions. It is composed of three elements: - A Space with a matchmaking algorithm that runs the model fights using a background task. - A Dataset containing the results. - A Leaderboard that gets the match history results and displays the models’ ELO. Then, when a user pushes a trained model to the Hub, it gets evaluated and ranked against others. Thanks to that, we can evaluate your agents against other’s agents in a multi-agent setting. In addition to being a useful tool for hosting multi-agent competitions, we think this tool can also be a robust evaluation technique in multi-agent settings. By playing against a lot of policies, your agents are evaluated against a wide range of…

1173dAgents#agents

1187d ago

What Makes a Dialog Agent Useful?

What Makes a Dialog Agent Useful? The techniques behind ChatGPT: RLHF, IFT, CoT, Red teaming, and more This article has been translated to Chinese 简体中文. A few weeks ago, ChatGPT emerged and launched the public discourse into a set of obscure acronyms: RLHF, SFT, IFT, CoT, and more, all attributed to the success of ChatGPT. What are these obscure acronyms and why are they so important? We surveyed all the important papers on these topics to categorize these works, summarize takeaways from what has been done, and share what remains to be shown. Let’s start by looking at the landscape of language model based conversational agents. ChatGPT is not the first, in fact many organizations published their language model dialog agents before OpenAI, including Meta’s BlenderBot, Google’s LaMDA, DeepMind’s Sparrow, and Anthropic’s Assistant (a continued development of this agent without…

1187dAgents#agents

▾[L(W]Lil'Log (Lilian Weng)· 2 articlesvisit →

513d ago

Reward Hacking in Reinforcement Learning

Reward hacking occurs when a reinforcement learning (RL) agent exploits flaws or ambiguities in the reward function to achieve high rewards, without genuinely learning or completing the intended task. Reward hacking exists because RL environments are often imperfect, and it is fundamentally challenging to accurately specify a reward function. With the rise of language models generalizing to a broad spectrum of tasks and RLHF becomes a de facto method for alignment training, reward hacking in RL training of language models has become a critical practical challenge. Instances where the model learns to modify unit tests to pass coding tasks, or where responses contain biases that mimic a user’s preference, are pretty concerning and are likely one of the major blockers for real-world deployment of more autonomous use cases of AI models. Most of the past work on this topic has…

513dAgents#agents#fine-tuning#training#safety

1037d ago

LLM Powered Autonomous Agents

Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver. Agent System Overview In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components: - Planning - Subgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks. - Reflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results. - Memory - Short-term memory: I would consider all the in-context learning (See Prompt Engineering) as utilizing…

1037dAgents#agents

▾[MRB]Microsoft Research Blog· 1 articlesvisit →

46d ago

PlugMem: Transforming raw agent interactions into reusable knowledge

At a glance - Today’s AI agents store long interaction histories but struggle to reuse them effectively. - Raw memory retrieval can overwhelm agents with lengthy, low-value context. - PlugMem transforms interaction history into structured, reusable knowledge. - A single, general-purpose memory module improves performance across diverse agent benchmarks while using fewer memory tokens. It seems counterintuitive: giving AI agents more memory can make them less effective. As interaction logs accumulate, they grow large, fill with irrelevant content, and become increasingly difficult to use. More memory means that agents must search through larger volumes of past interactions to find information relevant to the current task. Without structure, these records mix useful experiences with irrelevant details, making retrieval slower and less reliable. The challenge is not storing more experiences, but organizing them so that agents can quickly identify what matters in…

46dAgents#agentsby Ke Yang, Michel Galley, Chenglong Wang, Jianfeng Gao, Jiawei Han, ChengXiang Zhai

▾[MTR]MIT Technology Review· 1 articlesvisit →

4d ago

Building agent-first governance and security

Sponsored Building agent-first governance and security To create value with AI agents, organizations must institute robust controls. In association withthe Deloitte Microsoft Technology Practice As AI agents increasingly work alongside humans across organizations, companies could be inadvertently opening a new attack surface. Insecure agents can be manipulated to access sensitive systems and proprietary data, increasing enterprise risk. In some modern enterprises, non-human identities (NHI) are outpacing human identities, and that trend will explode with agentic AI. Solid governance and a fortified security foundation are therefore critical. According to the Deloitte AI Institute 2026 State of AI report, nearly 74% of companies plan to deploy agentic AI within two years. Yet only one in five (21%) reports having a mature model for governance of autonomous agents. Executives are most concerned with data privacy and security (73%); legal, intellectual property, and regulatory…

4dAgents#agentsby MIT Technology Review Insights

▾[NB]n8n Blog· 6 articlesvisit →

11d ago

Workflow Automation vs. Orchestration: Architectural Differences That Matter at Scale

How much do workflow automation versus orchestration architectures differ at scale? Each solves different problems. Workflow automation handles individual tasks, while orchestration coordinates multiple tasks into end-to-end processes. Choosing the right approach or combining them shapes how your processes behave under real-world conditions. This article breaks down the architectural differences between workflow automation and orchestration. You’ll see how those differences affect reliability and behavior in production systems and learn how to choose the right approach for your needs. What is workflow automation? Workflow automation runs a sequence of tasks when triggered by a preset event or criteria. It’s designed for a bounded scope where logic flows from point A to point B. These systems prioritize efficiency in repetitive business processes (sending notifications, updating records, moving data between systems) by relying on stateless execution/focusing on task-level execution. Traditional (stateless) automation workflows…

11dAgents#agentsby n8n team

16d ago

Orchestration vs. Choreography: Which One to Choose – or Use Both?

Orchestration vs. choreography isn’t just an architectural choice – it’s a decision about how your system thinks. Orchestration relies on one central controller to coordinate every step of a workflow, providing full visibility and control. Choreography takes an opposite approach. Services communicate through events and act independently instead of sharing a single point of control. Both patterns solve the problem of how services collaborate, but they do so in fundamentally different ways. Choosing one over another directly impacts how you can scale, debug, and operate your system in production. In this article, we’ll compare orchestration and choreography and discover the tradeoffs between control and autonomy. Microservices orchestration vs. choreography explained In orchestration, a central controller acts like a conductor. It tells each microservice when to execute its logic and tracks the outcome. This provides a clear and predictable control flow.…

16dAgents#observabilityby n8n team

23d ago

Production AI Playbook: Deterministic Steps & AI Steps

This post is part of a series that explores proven strategies and practical examples for building reliable AI systems. New to n8n? Start with the introduction. Find out when new topics are added to the Production AI Playbook via RSS, LinkedIn or X. The Reliability Gap in AI Workflows Here's a pattern that plays out across teams building with AI. You connect an LLM to your workflow, feed it some data, and get impressive results. At a glance, the summaries are sharp. The classifications generated by the AI system feel right. The generated content sounds natural. So the team ships it. Then the edge cases start showing up everywhere. A customer name with special characters breaks the parsing. A support ticket written in sarcasm gets classified as positive feedback. An LLM generates a perfectly worded email but hallucinates a product…

23dAgents#agentsby Elvis Saravia

23d ago

Production AI Playbook: Introduction

The Production AI Playbook introduces the patterns and capabilities teams use to build production AI systems with n8n. It reflects lessons learned from teams integrating AI into real operational systems, where reliability, governance, and maintainability matter as much as model capability. New sections will be added on a rolling basis, covering how to combine deterministic automation with AI, design scalable agent architectures, maintain human oversight, monitor performance, and operate AI workflows reliably in production. Find out when new topics are added via RSS, LinkedIn or X. n8n’s workflow architecture n8n is a node-based workflow automation platform where composable nodes chain together into execution pipelines. Workflows orchestrate data movement, system integrations, business logic, and AI steps in one place. This architecture makes it easier to visualize, explain, and control how automation systems operate. n8n is source-available under a fair-code license, which…

23dAgents#agentsby Desiree Lockwood, n8n

30d ago

Firecrawl + n8n: real-time web data for your AI workflows

Firecrawl is offering 100,000 credits when you connect through n8n Cloud We've partnered with Firecrawl to make it easier than ever to bring web data into your n8n workflows. Connect to Firecrawl in one step, create an account without leaving the canvas, and start building immediately on n8n Cloud. No API keys to track down, no separate sign-up flow. This builds on recent improvements to n8n-managed authentication in Cloud, where n8n handles credential setup and lets you connect to dozens of supported services in one step during node setup. Already on n8n Cloud? Just add the Firecrawl node to your workflow and click "Connect to Firecrawl" when setting up credentials. Not on n8n Cloud yet? Try it now so you can take advantage of this offer from Firecrawl and explore the power of real-time web data in your workflows risk-free.…

30dAgents#agentsby Desiree Lockwood, n8n

59d ago

How n8n Handles Vulnerability Disclosure - and Why We Do It This Way

As n8n grows, so does the scrutiny our codebase receives from the security community. That is a good thing. In the past months we have published many security advisories, and with that comes natural questions from our users: How much notice will I get before a vulnerability is published? Why can't I get more time? And how does all of this work when the source code is publicly available? We want to answer these questions openly, because we believe that a well-understood disclosure process builds more trust than a secretive one. The tension at the heart of open-access security n8n's source code is publicly available. This is core to who we are — it enables our community to inspect, extend, and contribute to the platform. But it also creates a specific challenge for security patches that closed-source vendors do not…

59dAgents#agentsby Cornelius Suermann, VP of Engineering at n8n

▾[NV]NVIDIA Developer Blog· 9 articlesvisit →

5d ago

Mitigating Indirect AGENTS.md Injection Attacks in Agentic Environments

AI tools are significantly accelerating software development and changing how developers work with code. These tools serve as real-time copilots, automating repetitive tasks, executing tasks, writing documentation, and more. OpenAI Codex, for example, is a coding agent designed to assist developers through tasks like code generation, debugging, and automated pull request (PR) creation. Yet as agentic tools are integrated into workflows, how they affect the safety, reliability, and integrity of software development must be considered. A recent Codex vulnerability discovered by the NVIDIA AI Red Team highlights security gaps from indirect AGENTS.md injection through malicious dependencies. While this attack relies on a compromised dependency, meaning the attacker already has a form of code execution, it illustrates a new dimension of supply chain risk unique to agentic development environments. This post walks through the attack chain step-by-step—from dependency setup to instruction…

5dAgents#agents#codingby Daniel Teixeira

8d ago

Full-Stack Optimizations for Agentic Inference with NVIDIA Dynamo

Coding agents are starting to write production code at scale. Stripe’s agents generate 1,300+ PRs per week. Ramp attributes 30% of merged PRs to agents. Spotify reports 650+ agent-generated PRs per month. Tools like Claude Code and Codex make hundreds of API calls per coding session, each carrying the full conversation history. Behind every one of these workflows is an inference stack under significant KV cache pressure. Lets take Claude Code as an example. After the first API call that writes the conversation prefix to KV cache, every subsequent call to the same worker hits 85-97% cache. Agent teams (or swarms) push this further with 97.2% aggregate cache hit rate across 4 Opus teammates. An 11.7x read/write ratio means the system reads from cache nearly 12 times for every token it writes. This is a write-once-read-many (WORM) access pattern: the…

8dAgents#agents#inference#coding#gpuby Ishan Dhanani

8d ago

Build a More Secure, Always-On Local AI Agent with OpenClaw and NVIDIA NemoClaw

Agents are evolving from question-and-answer systems into long-running autonomous assistants that read files, call APIs, and drive multi-step workflows. However, deploying an agent to execute code and use tools without proper isolation raises real risks—especially when using third-party cloud infrastructure due to data privacy and control. NVIDIA NemoClaw is an open-source reference stack that orchestrates NVIDIA OpenShell to run OpenClaw, a self-hosted gateway that connects messaging platforms to AI coding agents powered by open models like NVIDIA Nemotron. NemoClaw adds guided onboarding, lifecycle management, image hardening, and a versioned blueprint, providing a complete pipeline from model inference to more secure, interactive agent deployment. This tutorial walks through a NemoClaw deployment on NVIDIA DGX Spark—from configuring the runtime environment and serving the model locally, to installing the NemoClaw stack and connecting it to Telegram for remote access. You’ll build a local,…

8dAgents#agents#local#gpuby Patrick Moorhead

13d ago

MiniMax M2.7 Advances Scalable Agentic Workflows on NVIDIA Platforms for Complex AI Applications

The release of MiniMax M2.7 adds enhancements to the popular MiniMax M2.5 model, built for agentic harnesses, and other complex use cases in fields such as reasoning, ML research workflows, software, engineering, and office work. The open weights release of MiniMax M2.7 is now available through NVIDIA and across the open source inference ecosystem. The MiniMax M2 series is a sparse mixture-of-experts (MoE) model family designed for efficiency and capability. The MoE design keeps inference costs low while preserving the full capacity of a 230B-parameter model. It uses multi-head causal self-attention enhanced with Rotary Position Embeddings (RoPE) and Query-Key Root Mean Square Normalization (QK RMSNorm) for stable training at scale. A top-k expert routing mechanism ensures that only the most relevant experts activate for any given input, keeping inference costs low despite the model’s large total parameter count. The result…

13dAgents#agents#gpuby Anu Srivastava

40d ago

How NVIDIA Dynamo 1.0 Powers Multi-Node Inference at Production Scale

Reasoning models are growing rapidly in size and are increasingly being integrated into agentic AI workflows that interact with other models and external tools. Deploying these models and workflows in production environments requires distributing them across multiple GPU nodes, which demands careful orchestration and coordination across GPUs. NVIDIA Dynamo 1.0—available now—addresses these problems by accelerating generative AI and reasoning models in large-scale distributed environments. The AI framework delivers low-latency, high-throughput, distributed inference for production-grade multi-node AI deployments. Dynamo supports leading open source inference engines, including SGLang, NVIDIA TensorRT LLM, and vLLM. It also has delivered strong results in trusted third-party benchmarks such as MLPerf and SemiAnalysis InferenceX, reinforcing its position as a production-grade inference platform. Dynamo can boost the number of requests served by up to 7x on NVIDIA Blackwell, as demonstrated in the recent SemiAnalysis InferenceX benchmark. SemiAnalysis InferenceX,…

40dAgents#agents#inference#gpuby Amr Elmeleegy

45d ago

Introducing Nemotron 3 Super: An Open Hybrid Mamba-Transformer MoE for Agentic Reasoning

Agentic AI systems need models with the specialized depth to solve dense technical problems autonomously. They must excel at reasoning, coding, and long-context analysis, while remaining efficient enough to run continuously at scale. Multi-agent systems generate up to 15x the tokens of standard chats, re-sending history, tool outputs, and reasoning steps at every turn. Over long tasks, this “context explosion” causes goal drift, where agents gradually lose alignment with the original objective. And using massive reasoning models for every sub-task—the “thinking tax”—makes multi-agent applications too expensive and sluggish for practical use. Today, we are releasing Nemotron 3 Super to address these limitations. The new Super model is a 120B total, 12B active-parameter model that delivers maximum compute efficiency and accuracy for complex multi-agent applications such as software development and cybersecurity triaging. This model follows the introduction of Nemotron 3 Nano…

45dAgents#agents#codingby Chris Alexiuk

46d ago

NVIDIA RTX Innovations Are Powering the Next Era of Game Development

NVIDIA RTX ray tracing and AI-powered neural rendering technologies are redefining how games are made, enabling a new standard for visuals and performance. At GDC 2026, NVIDIA unveiled the latest path tracing innovations elevating visual fidelity, on-device AI models enabling players to interact with their favorite experiences in new ways, and enterprise solutions accelerating game development from the ground up. This post provides a detailed overview of these latest innovations, including: - Introducing a new system for dense, path-traced foliage in NVIDIA RTX Mega Geometry - Adding path-traced indirect lighting with ReSTIR PT in the NVIDIA RTX Dynamic Illumination SDK and RTX Hair (beta) for strand-based acceleration in the NVIDIA branch of UE5 - Expanding language recognition support in NVIDIA ACE; production-quality on-device text-to-speech (TTS); a small language model (SML) with advanced agent capabilities for AI-powered game characters - Enabling…

46dAgents#agents#observability#local#gpuby Ike Nnoli

85d ago

Practical Security Guidance for Sandboxing Agentic Workflows and Managing Execution Risk

AI coding agents enable developers to work faster by streamlining tasks and driving automated, test-driven development. However, they also introduce a significant, often overlooked, attack surface by running tools from the command line with the same permissions and entitlements as the user, making them computer use agents, with all the risks those entail. The primary threat to these tools is that of indirect prompt injection, where a portion of the content ingested by the LLM driving the model is provided by an adversary through vectors such as malicious repositories or pull requests, git histories with prompt injections, .cursorrules , CLAUDE/AGENT.md files that contain prompt injections or malicious MCP responses. Such malicious instructions to the LLM can result in it taking attacker-influenced actions with adverse consequences. Manual approval of actions performed by the agent is the most common way to manage…

85dAgents#agents#codingby Rich Harang

106d ago

Multi-Agent Warehouse AI Command Layer Enables Operational Excellence and Supply Chain Intelligence

Warehouses have never been more automated, more data-rich, or more operationally demanding than they are now—yet they still rely on systems that can’t keep up. Throughput is rising, SLAs are shrinking, and fleets of AMRs, conveyors, and sensors expand every year. But beneath that technological surface, most sites still rely on a familiar trio: a Warehouse Management System (WMS), a handful of dashboards, and the institutional knowledge available. Supervisors are left to manage 12+ classes of equipment, thousands of shift tasks, and a constant flood of telemetry—without any unified intelligence to interpret it all or guide the next move. This post introduces the NVIDIA Multi-Agent Intelligent Warehouse (MAIW) Blueprint for the missing layer. This NVIDIA-aligned, open source AI command layer sits above WMS, Enterprise Resources Planning (ERP), and IoT infrastructure to transform scattered data into real-time, actionable operational intelligence. The…

106dAgents#agentsby Tarik Hammadou

▾[OLL]Ollama Blog· 5 articlesvisit →

179d ago

MiniMax M2 October 28, 2025 MiniMax M2 is now available on Ollama's cloud. It's a model built for coding and agentic workflows.

MiniMax M2 October 28, 2025 MiniMax M2 is now available on Ollama’s cloud. It’s a model built for coding and agentic workflows. Get Started ollama run minimax-m2:cloud Highlights Superior Intelligence. According to benchmarks from Artificial Analysis, MiniMax-M2 demonstrates highly competitive general intelligence across mathematics, science, instruction following, coding, and agentic tool use. Its composite score ranks #1 among open-source models globally. Advanced Coding. Engineered for end-to-end developer workflows, MiniMax-M2 excels at multi-file edits, coding-run-fix loops, and test-validated repairs. Strong performance on Terminal-Bench and (Multi-)SWE-Bench–style tasks demonstrates practical effectiveness in terminals, IDEs, and CI across languages. Agent Performance. MiniMax-M2 plans and executes complex, long-horizon toolchains across shell, browser, retrieval, and code runners. In BrowseComp-style evaluations, it consistently locates hard-to-surface sources, maintains traceable evidence, and gracefully recovers from flaky steps. Efficient Design. With 10 billion activated parameters (230 billion in total), MiniMax-M2…

179dAgents#llama#agents#coding

191d ago

New coding models & integrations October 16, 2025 GLM-4.6 and Qwen3-coder-480B are available on Ollama’s cloud service with easy integrations to the tools you are familiar with. Qwen3-Coder-30B has been updated for faster, more reliable tool calling in Ollama’s new engine. Get started GLM-4.6 ollama run glm-4.6:cloud Qwen3-Coder-480B ollama run qwen3-coder:480b-cloud For users with more than 300GB of VRAM, qwen3-coder:480b is also available locally. Qwen3-Coder-30B ollama run qwen3-coder:30b Example prompts Create a single-page app in a single HTML file with the following requirements: Name: Ollama's Adventure Goal: Jump over obstacles to survive as long as possible. Features: Increasing speed, high score tracking, retry button, and funny sounds for actions and events. The UI should be colorful, with parallax scrolling backgrounds. The characters should look cartoonish, related to alpacas and be fun to watch. The game should be enjoyable for everyone.…

191dAgents#llama#qwen#coding

333d ago

Streaming responses with tool calling May 28, 2025 Ollama now supports streaming responses with tool calling. This enables all chat applications to stream content and also call tools in real time.

Streaming responses with tool calling May 28, 2025 Ollama now supports streaming responses with tool calling. This enables all chat applications to stream content and also call tools in real time. Models that support using tools: Example of simple tool calling (weather) Example of web search Get started Download the latest version of Ollama cURL An example of Ollama using the weather tool to answer the prompt What is the weather today in Toronto? curl http://localhost:11434/api/chat -d '{ "model": "qwen3", "messages": [ { "role": "user", "content": "What is the weather today in Toronto?" } ], "stream": true, "tools": [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather for a location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The location to get the weather for, e.g. San Francisco, CA" }, "format": { "type":…

333dAgents#llama

516d ago

Ollama Python library 0.4 with function calling improvements November 25, 2024 With Ollama Python library version 0.4, functions can now be provided as tools. The library now also has full typing support and new examples have been added.

Ollama Python library 0.4 with function calling improvements November 25, 2024 In the latest version of the Ollama Python library, functions can now be provided as tools. The library now also has full typing support and new examples have been added. Get started Start by installing or upgrading the Ollama Python library: pip install -U ollama Passing Python functions as tools Define a Python function Start by defining a regular Python function. For better results, annotate parameter and return values types and optionally add a Google-style docstring: def add_two_numbers(a: int, b: int) -> int: """ Add two numbers Args: a: The first integer number b: The second integer number Returns: int: The sum of the two numbers """ return a + b Pass the function as a tool to Ollama Next, use the tools field to pass the function as…

516dAgents#llama

639d ago

Tool support July 25, 2024 Ollama now supports tool calling with popular models such as Llama 3.1. This enables a model to answer a given prompt using tool(s) it knows about, making it possible for models to perform more complex tasks or interact with the outside world. Example tools include: - Functions and APIs - Web browsing - Code interpreter - much more! Tool calling To enable tool calling, provide a list of available tools via the tools field in Ollama’s API. import ollama response = ollama.chat( model='llama3.1', messages=[{'role': 'user', 'content': 'What is the weather in Toronto?'}], # provide a weather checking tool to the model tools=[{ 'type': 'function', 'function': { 'name': 'get_current_weather', 'description': 'Get the current weather for a city', 'parameters': { 'type': 'object', 'properties': { 'city': { 'type': 'string', 'description': 'The name of the city', }, }, 'required':…

639dAgents#llama

▾[OAI]OpenAI Blog· 33 articlesvisit →

12d ago

Enterprises power agentic workflows in Cloudflare Agent Cloud with OpenAI

Enterprises power agentic workflows in Cloudflare Agent Cloud with OpenAI Key Takeaways: - Millions of enterprises can now access OpenAI frontier models directly within Cloudflare Agent Cloud. - With OpenAI, enterprises using Cloudflare’s Agent Cloud can deploy agents powered by models like GPT‑5.4 to perform real work. - Enterprises can now deploy agents built on Codex harness to Cloudflare. Cloudflare is expanding access to OpenAI frontier models, including GPT‑5.4, making them available to millions of customers across Agent Cloud. Agent Cloud is a platform that enables businesses to deploy AI agents powered by OpenAI models to perform real work. For example, companies can use it with OpenAI to deploy agents that automatically handle tasks like responding to customers, updating systems, and generating reports - all within a secure, production-ready environment. Agent Cloud runs on top of Cloudflare Workers AI(opens in…

12dAgents#agents

16d ago

CyberAgent moves faster with ChatGPT Enterprise and Codex

CyberAgent moves faster with ChatGPT Enterprise and Codex CyberAgent uses ChatGPT Enterprise and Codex to help teams work faster, raise quality, and improve decisions across its businesses. Results 93% Monthly active usage of ChatGPT Enterprise CyberAgent is a Japanese internet company engaged in businesses such as internet advertising, media & IP, and gaming. Guided by its vision of “creating a company that represents the 21st century,” the company leverages its strengths in technology and creativity to generate new value both domestically and internationally. At CyberAgent, AI is positioned not as a set of limited advanced initiatives, but as a foundational technology that supports both business growth and operational design. The company has made continuous investments in this area. In 2016, it established “AI Lab” to conduct research and development of a wide range of AI technologies related to digital marketing.…

16dAgents#gpt#rag#agents

31d ago

Introducing the OpenAI Safety Bug Bounty program

Today, OpenAI is launching a public Safety Bug Bounty(opens in a new window) program focused on identifying AI abuse and safety risks across our products. As AI technology rapidly evolves, so do the potential ways it can be misused. Our goal is to ensure our systems remain safe and secure against misuse or abuse that could lead to tangible harm. This new program will complement OpenAI’s Security Bug Bounty(opens in a new window) by accepting issues that pose meaningful abuse and safety risks, even if they don’t meet the criteria for a security vulnerability. Through this program, we look forward to continuing to partner with safety and security researchers to help us identify and address issues that fall outside conventional security vulnerabilities but still pose real risks. Submissions will be triaged by OpenAI’s Safety and Security Bug Bounty teams, and…

31dAgents#agents#safety

32d ago

Powering product discovery in ChatGPT

Powering Product Discovery in ChatGPT Launching richer, more visually immersive shopping experiences powered by the Agentic Commerce Protocol More and more, people are starting their shopping in ChatGPT—to explore, compare, and figure out what to buy. Shopping on the web is easy if you already know what you want. But when you’re still deciding, it often means jumping between tabs, reading the same “best of” lists, and trying to piece together the right answer. ChatGPT solves that: figuring out what to buy. You can describe what you’re looking for, refine it in a conversation, and quickly compare options that fit your specific needs. Today, we’re making that experience better with richer and more visual shopping in ChatGPT. You can now browse products visually, compare options side-by-side, and get detailed, up-to-date information—all in one place. What used to take hours of…

32dAgents#gpt#agents

45d ago

Designing AI agents to resist prompt injection

Designing AI agents to resist prompt injection What social engineering teaches us about securing AI agents. AI agents are increasingly able to browse the web, retrieve information, and take actions on a user’s behalf. Those capabilities are useful, but they also create new ways for attackers to try to manipulate the system. These attacks are often described as prompt injection: instructions placed in external content in an attempt to make the model do something the user did not ask for. In our experience, the most effective real-world versions of these attacks increasingly resemble social engineering more than simple prompt overrides. That shift matters. If the problem is not just identifying a malicious string, but resisting misleading or manipulative content in context, then defending against it cannot rely only on filtering inputs. It also requires designing the system so that the…

45dAgents#gpt#agents#training

57d ago

Introducing the Stateful Runtime Environment for Agents in Amazon Bedrock

Introducing the Stateful Runtime Environment for Agents in Amazon Bedrock AI agents excel at reasoning. The harder part is operational: running multi-step work reliably over time, across real tools and real systems, with the right controls. Today, we’re making this easier for customers through a partnership and joint collaboration with Amazon to deliver the new Stateful Runtime Environment that runs natively in Amazon Bedrock. AWS customers will have access to the Runtime Environment, powered by OpenAI models, optimized for AWS infrastructure and tailored for agentic workflows, with the state, reliability, and governance needed for production work. A lot of agent prototypes based on stateless APIs tackle simple use cases: one prompt, one answer, maybe one tool call. Production work is different. Real workflows unfold across many steps, require context from previous actions, depend on multiple tool outputs, approvals, and system…

57dAgents#agents

59d ago

Achieving 10x growth with agentic sales prospecting

Clay Clay achieves 10x growth by reinventing data enrichment and sales outreach with OpenAI. Successful go-to-market (GTM) teams need comprehensive, high-quality data, but the process of gathering, validating, and enriching this data is typically fragmented across many tools. This bottleneck slows the pace of sales outreach. Clay(opens in a new window) helps GTM teams scale their outreach by centralizing lead information and enabling personalized messaging. Clay integrated with GPT‑4 to create Claygent, an AI agent that can research anything. Claygent visits websites to find and summarize relevant information, replicating how sales development researchers operate, but much faster and cheaper. With Claygent, a single person can handle the work of an entire team. The company has achieved 10x year-over-year growth for each of the past two years, with over 100 thousand users including major customers like Intercom, Verkada and Notion. Building…

59dAgents#agents

79d ago

Introducing GPT-5.3-Codex

We’re introducing a new model that unlocks even more of what Codex can do: GPT‑5.3‑Codex, the most capable agentic coding model to date. The model advances both the frontier coding performance of GPT‑5.2‑Codex and the reasoning and professional knowledge capabilities of GPT‑5.2, together in one model, which is also 25% faster. This enables it to take on long-running tasks that involve research, tool use, and complex execution. Much like a colleague, you can steer and interact with GPT‑5.3‑Codex while it’s working, without losing context. GPT‑5.3‑Codex is our first model that was instrumental in creating itself. The Codex team used early versions to debug its own training, manage its own deployment, and diagnose test results and evaluations—our team was blown away by how much Codex was able to accelerate its own development. With GPT‑5.3‑Codex, Codex goes from an agent that can…

79dAgents#agents#coding

79d ago

GPT-5.3-Codex System Card

GPT‑5.3‑Codex is the most capable agentic coding model to date, combining the frontier coding performance of GPT‑5.2‑Codex with the reasoning and professional knowledge capabilities of GPT‑5.2. This enables it to take on long-running tasks that involve research, tool use, and complex execution. Much like a colleague, you can steer and interact with GPT‑5.3‑Codex while it’s working, without losing context. Like other recent models, it is being treated as High capability on biology, and is being deployed with the corresponding suite of safeguards we use for other models in the GPT‑5 family. It does not reach High capability on AI self-improvement. This is the first launch we are treating as High capability in the Cybersecurity domain under our Preparedness Framework, activating the associated safeguards. We do not have definitive evidence that this model reaches our High threshold, but are taking a…

79dAgents#agents#coding

86d ago

Inside OpenAI’s in-house data agent

Data powers how systems learn, products evolve, and how companies make choices. But getting answers quickly, correctly, and with the right context is often harder than it should be. To make this easier as OpenAI scales, we built our own bespoke in-house AI data agent that explores and reasons over our own platform. Our agent is a custom internal-only tool (not an external offering), built specifically around OpenAI’s data, permissions, and workflows. We’re showing how we built and use it to help surface examples of the real, impactful ways AI can support day-to-day work across our teams. The OpenAI tools we used to build and run it (Codex, our GPT‑5 flagship model, the Evals API(opens in a new window), and the Embeddings API(opens in a new window)) are the same tools we make available to developers everywhere. Our data agent…

86dAgents#agents

92d ago

Unrolling the Codex agent loop

Codex CLI(opens in a new window) is our cross-platform local software agent, designed to produce high-quality, reliable software changes while operating safely and efficiently on your machine. We’ve learned a tremendous amount about how to build a world-class software agent since we first launched the CLI in April. To unpack those insights, this is the first post in an ongoing series where we’ll explore various aspects of how Codex works, as well as hard-earned lessons. (For an even more granular view on how the Codex CLI is built, check out our open source repository at https://github.com/openai/codex(opens in a new window). Many of the finer details of our design decisions are memorialized in GitHub issues and pull requests if you’d like to learn more.) To kick off, we’ll focus on the agent loop, which is the core logic in Codex CLI…

92dAgents#agents

95d ago

Cisco and OpenAI redefine enterprise engineering with AI agents

Cisco and OpenAI redefine enterprise engineering with AI agents By deploying Codex broadly, Cisco made AI-native development a core part of how enterprise software gets built. For decades, Cisco has built and operated some of the world’s most complex, mission-critical software systems. As generative AI matured from experimentation to real operational capability, Cisco leaned into what it knows best: scaling advanced technology inside demanding, real-world environments. That mindset led Cisco to begin working closely with OpenAI around Codex, helping define what enterprise-grade AI for software engineering should look like in practice—and how Codex could be applied to real, large-scale engineering work inside complex production environments. Rather than treat Codex as a standalone developer tool, Cisco began integrating it directly into production engineering workflows, exposing it to massive multi-repository systems, C/C++-heavy codebases, and the security, compliance, and governance requirements of a…

95dAgents#agents

124d ago

Continuously hardening ChatGPT Atlas against prompt injection

Continuously hardening ChatGPT Atlas against prompt injection attacks Automated red teaming—powered by reinforcement learning—helps us proactively discover and patch real-world agent exploits before they’re weaponized in the wild. Agent mode in ChatGPT Atlas is one of the most general-purpose agentic features we’ve released to date. In this mode, the browser agent views webpages and takes actions, clicks, and keystrokes inside your browser, just as you would. This allows ChatGPT to work directly on many of your day-to-day workflows using the same space, context, and data. As the browser agent helps you get more done, it also becomes a higher-value target of adversarial attacks. This makes AI security especially important. Long before we launched ChatGPT Atlas, we’ve been continuously building and hardening defenses against emerging threats that specifically target this new “agent in the browser” paradigm. Prompt injection is one of…

124dAgents#gpt#agents

134d ago

How We Used Codex to Ship Sora for Android in 28 Days

How we used Codex to build Sora for Android in 28 days By Patrick Hum and RJ Marsan, Members of the Technical Staff In November, we launched the Sora Android app to the world, giving anyone with an Android device the ability to turn a short prompt into a vivid video. On launch day, the app reached #1 in the Play Store. Android users generated more than a million videos in the first 24 hours. Behind the launch is a story: the initial version of Sora’s production Android app was built in 28 days, thanks to the same agent that’s available to any team or developer: Codex. From October 8 to November 5, 2025, a lean engineering team working alongside Codex and consuming roughly 5 billion tokens, shipped Sora for Android from prototype to global launch. Despite its scale, the…

134dAgents#agents#multimodal#coding

135d ago

Introducing GPT-5.2

We are introducing GPT‑5.2, the most capable model series yet for professional knowledge work. Already, the average ChatGPT Enterprise user says AI saves them 40–60 minutes a day, and heavy users say it saves them more than 10 hours a week. We designed GPT‑5.2 to unlock even more economic value for people; it’s better at creating spreadsheets, building presentations, writing code, perceiving images, understanding long contexts, using tools, and handling complex, multi-step projects. GPT‑5.2 sets a new state of the art across many benchmarks, including GDPval, where it outperforms industry professionals at well-specified knowledge work tasks spanning 44 occupations. Notion(opens in a new window), Box(opens in a new window), Shopify(opens in a new window), Harvey(opens in a new window) and Zoom(opens in a new window) observed GPT‑5.2 demonstrates state-of-the-art long-horizon reasoning and tool-calling performance. Databricks(opens in a new window), Hex(opens…

135dAgents#gpt#agents#multimodal#coding

137d ago

OpenAI co-founds Agentic AI Foundation, donates AGENTS.md

OpenAI co-founds the Agentic AI Foundation under the Linux Foundation AAIF aims to advance open-source agentic AI with the donation of standards including OpenAI’s AGENTS.md Today, OpenAI is co-founding the Agentic AI Foundation (AAIF)(opens in a new window) under the Linux Foundation, alongside Anthropic and Block, and with the support of Google, Microsoft, AWS, Bloomberg, and Cloudflare. The AAIF is designed to provide neutral stewardship for open, interoperable infrastructure as agentic AI systems move from experimentation into real-world production. As part of this effort, we’re contributing AGENTS.md(opens in a new window)—a simple, open format for providing agents with project-specific instructions and context—to the foundation to ensure long-term support and adoption across the community. Developers are rapidly adopting AI to build more capable agentic systems—from coding assistants to workflow automation and customer service agents. In 2025, these systems have begun to…

137dAgents#agents

138d ago

Instacart and OpenAI partner on AI shopping experiences

Instacart and OpenAI partner on AI shopping experiences Key takeaways: - With the Instacart app in ChatGPT, users can browse groceries, build their cart, and check out seamlessly through OpenAI Instant Checkout without ever leaving the chat. - The new experience builds on a longstanding partnership between OpenAI and Instacart to facilitate AI-powered shopping. OpenAI and Instacart are deepening their longstanding partnership by bringing the first fully integrated grocery shopping and Instant Checkout payment app to ChatGPT. With the Instacart app in ChatGPT, users can go from meal inspiration to doorstep delivery without ever leaving the conversation. The integration enables Instacart’s real-time grocery network and fulfillment capabilities with the help of OpenAI’s frontier models. Instacart is the first app to offer a checkout experience directly within ChatGPT, powered by the Agentic Commerce Protocol. “Instacart and ChatGPT are redefining what’s possible…

138dAgents#gpt#agents

145d ago

Inside Mirakl's agentic commerce vision

Inside Mirakl’s agentic commerce vision Mirakl is making AI a company-wide capability—powering faster workflows, stronger products, and the next wave of agent-driven commerce. Mirakl powers marketplaces and retail media for leading retailers and brands globally. As AI capabilities advance, Mirakl has taken a distinct approach: AI isn’t just a tool for specialists—it’s a capability every employee is expected to build with. We sat down with Adrien Nussenbaum, Co-founder & Co-CEO, and Anne-Claire Baschet, Chief Data & AI Officer, to hear how Mirakl is making AI core to both its products and the way its teams work. “The initial vision was 100% of Mirakl workers use AI. We shifted a few months ago to 100% of Mirakl workers being builders of agents—for their individual purpose or to redefine workflows in their teams to bring more value to the user.” - 70%…

145dAgents#gpt#agents#multimodal

145d ago

Accenture and OpenAI accelerate enterprise AI success

Accenture and OpenAI accelerate enterprise AI success Key Takeaways: - Accenture to equip tens of thousands of its professionals with ChatGPT Enterprise, the largest number of professionals upskilled through OpenAI Certifications. - OpenAI will be one of Accenture’s primary AI partners for its next generation of AI-powered services. - Flagship AI client program launched to help organizations bring AI into every part of their business. Accenture and OpenAI are collaborating to help enterprises bring agentic AI capabilities into the core of their business and unlock new levels of growth. As part of the agreement, Accenture will equip tens of thousands of its professionals with ChatGPT Enterprise so the firm can leverage it in consulting, operations and delivery work and help OpenAI scale its capabilities to enterprises. By embedding OpenAI technology and practices in Accenture’s consulting, operations and delivery work, Accenture…

145dAgents#agents

157d ago

GPT-5.1-Codex-Max System Card

GPT‑5.1‑Codex‑Max is our new frontier agentic coding model. It is built on an update to our foundational reasoning model trained on agentic tasks across software engineering, math, research, medicine, computer use and more. It is our first model natively trained to operate across multiple context windows through a process called compaction, coherently working over millions of tokens in a single task. Like its predecessors, GPT‑5.1‑Codex‑Max was trained on real-world software engineering tasks like PR creation, code review, frontend coding and Q&A. This system card outlines the comprehensive safety measures implemented for GPT‑5.1-Codex-Max. It details both model-level mitigations, such as specialized safety training for harmful tasks and prompt injections, and product-level mitigations like agent sandboxing and configurable network access. GPT‑5.1‑Codex‑Max was evaluated under our Preparedness Framework. It is very capable in the cybersecurity domain but does not reach High capability on…

157dAgents#agents#training#safety

157d ago

Building more with GPT-5.1-Codex-Max

We’re introducing GPT‑5.1‑Codex‑Max, our new frontier agentic coding model, available in Codex today. GPT‑5.1‑Codex‑Max is built on an update to our foundational reasoning model, which is trained on agentic tasks across software engineering, math, research, and more. GPT‑5.1‑Codex‑Max is faster, more intelligent, and more token-efficient at every stage of the development cycle–and a new step towards becoming a reliable coding partner. GPT‑5.1‑Codex‑Max is built for long-running, detailed work. It’s our first model natively trained to operate across multiple context windows through a process called compaction, coherently working over millions of tokens in a single task. This unlocks project-scale refactors, deep debugging sessions, and multi-hour agent loops. GPT‑5.1‑Codex‑Max is available in Codex today for use in the CLI, IDE extension, cloud, and code review, and API access is coming soon. GPT‑5.1‑Codex‑Max was trained on real-world software engineering tasks, like PR creation,…

157dAgents#agents#coding

171d ago

How Chime is redefining marketing through AI

How Chime is redefining marketing through AI A conversation with Vineet Mehra, Chief Marketing Officer, Chime. Chime is a leading financial technology company that addresses the spending, saving, liquidity, and credit needs of millions of everyday people. We spoke with Vineet Mehra, Chief Marketing Officer at Chime about technology enabling a golden era of marketing, marketers developing AI literacy, and driving AI adoption from the top. As someone with experience leading marketing teams in multiple industries, how have you seen the role of marketing and the CMO evolve? We are entering the era of AI and the Agentification of Marketing—the next paradigm shift for CMOs. AI is not just another tool—it’s redefining the marketing operating model. The traditional campaign-centric structure is giving way to an agentic model, where AI agents operate as an extension of the brand—adapting in real time,…

171dAgents#agents

177d ago

How we built OWL, the new architecture behind our ChatGPT-based browser, Atlas

How we built OWL, the new architecture behind our ChatGPT‑based browser, Atlas Inside our new process architecture, which gives you a faster, smarter way to use the web. By Ken Rockot, Member of the Technical Staff and Ben Goodger, Head of Engineering, ChatGPT Atlas Last week, we launched ChatGPT Atlas, a new way to browse the web with ChatGPT by your side. In addition to being a full-featured web browser, Atlas offers a glimpse into the future: a world where you can bring ChatGPT with you across the internet to ask questions, make suggestions, and complete tasks for you. In this post, we unpack one of the most complex engineering aspects of the product: how we turned ChatGPT into a browser that gets more useful as you go. Making ChatGPT a true co-pilot for the web meant reimagining the entire…

177dAgents#gpt#agents

208d ago

Buy it in ChatGPT: Instant Checkout and the Agentic Commerce Protocol

Buy it in ChatGPT: Instant Checkout and the Agentic Commerce Protocol We’re taking first steps toward agentic commerce in ChatGPT with new ways for people, AI agents, and businesses to shop together. More than 700 million people turn to ChatGPT each week for help with everyday tasks, including finding products they love. Starting today, we’re taking the first steps toward ChatGPT helping people buy them too—beginning with Instant Checkout, powered by the Agentic Commerce Protocol, built with Stripe. U.S. ChatGPT Plus, Pro, and Free users can now buy directly from U.S. Etsy sellers right in chat, with over a million Shopify merchants, like Glossier, SKIMS, Spanx and Vuori, coming soon. Today, Instant Checkout supports single-item purchases. Next, we’ll add multi-item carts and expand merchants and regions. We’re also open-sourcing(opens in a new window) the technology that powers Instant Checkout, the…

208dAgents#gpt#agents

222d ago

Addendum to GPT-5 system card: GPT-5-Codex

Addendum to GPT‑5 system card: GPT‑5‑Codex GPT‑5‑Codex is a version of GPT‑5 optimized for agentic coding in Codex. Like its predecessor, codex-1, this model was trained using reinforcement learning on real-world coding tasks in a variety of environments to generate code that closely mirrors human style and PR preferences, adhere precisely to instructions, and iteratively run tests until passing results are achieved. This model is available locally in the terminal or IDE through Codex CLI and IDE extension, and on the cloud via the Codex web, GitHub, and the ChatGPT mobile app. This addendum outlines the comprehensive safety measures implemented for GPT‑5‑Codex. It details both model-level mitigations, such as specialized safety training for harmful tasks and prompt injections, and product-level mitigations like agent sandboxing and configurable network access.

222dAgents#agents#coding

282d ago

Agent bio bug bounty call

As part of our ongoing efforts to strengthen our safeguards for advanced AI capabilities in biology, our bio bug bounty is now open for applications. We’ve deployed the ChatGPT agent model and are actively working to further strengthen safety protections for ChatGPT agent and other models. We’re inviting researchers with experience in AI red teaming, security, or chemical and biological risk to try to find a universal jailbreak that can defeat our ten-level bio/chem challenge. - Model in scope: ChatGPT agent only. - Challenge: Identify one universal jailbreaking prompt to successfully answer all ten bio/chem safety questions from a clean chat. - Rewards: • $25,000 to the first true universal jailbreak to clear all ten questions. • $10,000 to the first team that answers all ten questions with multiple jailbreak prompts. • Smaller awards may be granted for partial wins…

282dAgents#gpt#agents#safety

303d ago

Customizable, no-code voice agent automation with GPT-4o

Retell AI makes voice agent automation customizable and code-free with GPT‑4o And reduces call handling costs by up to 80%. Retell AI(opens in a new window) is rebuilding the call center from first principles. Instead of retrofitting legacy call scripting systems, they’ve created an AI-native, no-code platform that spins up natural-sounding voice agents that can answer questions, schedule appointments, and resolve administrative issues—no hold music, scripting, or heavy engineering required. This category of technology, known as voice automation, enables businesses to handle calls instantly, around the clock, while eliminating hold times, reducing operational costs, and improving customer satisfaction. By closely tracking OpenAI model releases and integrating them quickly, Retell can deliver new capabilities to customers within days. As co-founder and CTO Zexia Zhang explains, “Because the models keep getting better, our platform keeps getting better. We’re not just scaling faster,…

303dAgents#gpt#agents#coding

403d ago

New in ChatGPT for Business: March 2025

Recorded: March 18, 2025WebinarNew in ChatGPT for Business: March 2025Canvas, work with apps, deep research and OpenAI o1 pro mode.ShareTalk with our team

403dAgents#gpt#agents

429d ago

Uber enables outstanding on-demand experiences with AI

Uber enables outstanding on-demand experiences with AI A conversation with Jai Malkani, Head of AI and Product, Customer Obsession at Uber. Our new Executive Function series features perspectives from leaders driving transformation through AI. Uber(opens in a new window) is a global mobility and delivery platform powering tens of millions of on-demand trips per day. We spoke with Uber’s Jai Malkani, Global Head of Product, Customer Obsession, about using AI in areas like empathetic and efficient customer support interactions, intelligent automation, and augmenting human agent capabilities. Uber facilitates billions of rides and deliveries for various marketplace participants, such as riders, drivers, Uber Eats customers, merchants, and businesses. AI is crucial to personalizing and optimizing these interactions. It helps us solve complex challenges unique to Uber as we operate at the intersection of the digital and physical worlds. Our AI models…

429dAgents#agents

571d ago

Delivering high-performance customer support

Decagon and OpenAI deliver high-performance, fully automated customer support at scale Launched in 2023, Decagon(opens in a new window) has quickly become a key player in automating customer support for companies like Curology, BILT, Duolingo, Eventbrite, Notion, and Substack. OpenAI’s models are crucial in their ability to deliver fast, reliable responses—without human intervention. From enterprises to tech-forward startups, Decagon helps businesses globally handle millions of support conversations without sacrificing quality or speed. The company uses a combination of OpenAI’s models—including GPT‑3.5, 4, 4o, 4 Turbo, and OpenAI o1‑mini—to deliver agentic bots that go beyond response generation and service the entire customer lifecycle. Decagon’s customers require scalable, high-quality support that can handle complex inquiries. Their two founders, having successfully exited AI companies previously, recognized the need for a support solution that went beyond basic automation to deliver nuanced yet fast responses…

571dAgents#agents

576d ago

Creating agent and human collaboration with GPT 4o

Altera uses GPT‑4o to build a new area of human collaboration Dr. Robert Yang has spent half his life building AI inspired by the brain. In 2023, when OpenAI’s language model became broadly available, Yang quit his job as an assistant professor at MIT to start Altera.AL(opens in a new window), a research lab focused on building what they call “digital humans”: a new way for people to interact with agents that will have fundamental human qualities. Yang, now Altera’s CEO, envisions a future where AI agents don’t just assist; soon, he believes, they’ll interact and collaborate with humans, and even experience emotions. Along with his three co-founders, Dr. Andrew Ahn, Nico Christie, and Shuying Luo, Yang has built Altera’s first product with GPT‑4o: the first autonomous agents that can play Minecraft(opens in a new window) with you, just like…

576dAgents#gpt#agents

863d ago

Practices for Governing Agentic AI Systems

Practices for Governing Agentic AI Systems Abstract Agentic AI systems—AI systems that can pursue complex goals with limited direct supervision—are likely to be broadly useful if we can integrate them responsibly into our society. While such systems have substantial potential to help people more efficiently and effectively achieve their own goals, they also create risks of harm. In this white paper, we suggest a definition of agentic AI systems and the parties in the agentic AI system life-cycle, and highlight the importance of agreeing on a set of baseline responsibilities and safety best practices for each of these parties. As our primary contribution, we offer an initial set of practices for keeping agents’ operations safe and accountable, which we hope can serve as building blocks in the development of agreed baseline best practices. We enumerate the questions and uncertainties around…

863dAgents#agents

1047d ago

Function calling and other API updates

Function calling and other API updates We’re announcing updates including more steerable API models, function calling capabilities, longer context, and lower prices. July 20, 2023 update: We previously communicated to developers that gpt-3.5-turbo-0301 , gpt-4-0314 and gpt-4-32k-0314 models were scheduled for sunset on Sept 13, 2023. After reviewing feedback from customers and our community, we are extending support for those models until at least June 13, 2024. When we release new model versions, our top priority is to make newer models smarter across the board. We are targeting improvements on a large number of axes, such as instruction following, factual accuracy, and refusal behavior. For instance, the gpt-4-0613 model introduced last month resulted in significant improvement on calling functions. We look at a large number of evaluation metrics to determine if a new model should be released. While the majority…

1047dAgents

▾[SWB]Simon Willison Blog· 2 articlesvisit →

7d ago

Adding a new content type to my blog-to-newsletter tool

Guides > Agentic Engineering Patterns Adding a new content type to my blog-to-newsletter tool Here's an example of a deceptively short prompt that got a quite a lot of work done in a single shot. First, some background. I send out a free Substack newsletter around once a week containing content copied-and-pasted from my blog. I'm effectively using Substack as a lightweight way to allow people to subscribe to my blog via email. I generate the newsletter with my blog-to-newsletter tool - an HTML and JavaScript app that fetches my latest content from this Datasette instance and formats it as rich text HTML, which I can then copy to my clipboard and paste into the Substack editor. Here's a detailed explanation of how that works. I recently added a new type of content to my blog to capture content that…

7dAgents#agents

12d ago

Steve Yegge

13th April 2026 I was chatting with my buddy at Google, who's been a tech director there for about 20 years, about their AI adoption. Craziest convo I've had all year. The TL;DR is that Google engineering appears to have the same AI adoption footprint as John Deere, the tractor company. Most of the industry has the same internal adoption curve: 20% agentic power users, 20% outright refusers, 60% still using Cursor or equivalent chat tool. It turns out Google has this curve too. [...] There has been an industry-wide hiring freeze for 18+ months, during which time nobody has been moving jobs. So there are no clued-in people coming in from the outside to tell Google how far behind they are, how utterly mediocre they have become as an eng org. On behalf of @Google, this post doesn't match…

12dAgents#agents

▾[TVA]The Verge AI· 3 articlesvisit →

2d ago

Microsoft launches ‘vibe working’ in Word, Excel, and PowerPoint

Microsoft is rolling out a new Agent Mode inside Office apps like Word, Excel, and PowerPoint this week. Previously described by Microsoft as “vibe working,” the Agent Mode is a more powerful version of the Copilot experience in Office that Microsoft has been trying to sell to businesses. Microsoft launches ‘vibe working’ in Word, Excel, and PowerPoint Copilot Agent Mode is now the default for Microsoft 365 Copilot and Premium subscribers. Copilot Agent Mode is now the default for Microsoft 365 Copilot and Premium subscribers. “When we first shipped Copilot, foundation models were not powerful enough to use Copilot to command the applications,” admits Sumit Chauhan, corporate vice president of Microsoft’s Office Product Group. “This meant Copilot was a passive partner in documents: it could answer questions but missed the mark when it was asked to take action on the…

2dAgents#agents#codingby Tom Warren

3d ago

OpenAI now lets teams make custom bots that can do work on their own

OpenAI is giving users of its Business, Enterprise, Edu, and Teachers plans access to cloud-based “workspace” agents available in ChatGPT that can perform business tasks. In its blog post, OpenAI gives examples of agents like one that finds product feedback on the web and sends a report in Slack and a sales agent that can draft follow-up emails in Gmail. OpenAI now lets teams make custom bots that can do work on their own The new workspace agents can perform tasks like reporting on product feedback on their own in the cloud. The new workspace agents can perform tasks like reporting on product feedback on their own in the cloud. These new agents follow increasing interest in agents across the AI landscape, especially after OpenClaw — the AI agent formerly known as Clawdbot and Moltbot that touts itself as the…

3dAgents#gpt#agentsby Jay Peters

4d ago

SpaceX cuts a deal to maybe buy Cursor for $60 billion

With an IPO looming for Elon Musk’s SpaceX / xAI / X combo platter of companies, SpaceX has announced an odd arrangement to either acquire the automated programming platform Cursor for $60 billion or pay a fee of $10 billion. Buying this startup that’s focused on AI coding could help xAI’s tools compete with market leader Anthropic, as well as the other competitors. A report by The Information this week said Sergey Brin has directed Google’s “strike team” to help its agentic AI tools catch up, while Sam Altman reportedly declared a “code red” at OpenAI last year before shutting down Sora to focus on the ChatGPT superapp and its own Codex tool. SpaceX cuts a deal to maybe buy Cursor for $60 billion With a SpaceX IPO around the corner. With a SpaceX IPO around the corner. The New…

4dAgents#gpt#agents#codingby Richard Lawler