★ TOP STORY[ AWS ]API·5d ago

ToolSimulator: scalable tool testing for AI agents

Artificial Intelligence ToolSimulator: scalable tool testing for AI agents You can use ToolSimulator, an LLM-powered tool simulation framework within Strands Evals, to thoroughly and safely test AI agents that rely on external tools, at scale. Instead of risking live API calls that expose personally identifiable information (PII), trigger unintended actions, or settling for static mocks that break with multi-turn workflows, you can use ToolSimulator’s large language model (LLM)-powered simulations to validate your agents. Available today as part of the Strands Evals Software Development Kit (SDK), ToolSimulator helps you catch integration bugs early, test edge cases comprehensively, and ship production-ready agents with confidence. Prerequisites Before you begin, make sure that you have the following: - Python 3.10 or later installed in your environment - Strands Evals SDK installed: pip install strands-evals - Basic familiarity with Python, including decorators and type hints…

AWS Machine Learning Blogread →

▲ trending · last 48hview all →

▾[HF]Hugging Face Blog· 14 articlesvisit →

10d ago

Inside VAKRA: Reasoning, Tool Use, and Failure Modes of Agents

Inside VAKRA: Reasoning, Tool Use, and Failure Modes of Agents VAKRA Dataset | LeaderBoard | Release Blog | GitHub | Submit to Leaderboard We recently introduced VAKRA, a tool-grounded, executable benchmark for evaluating how well AI agents reason and act in enterprise-like environments. Unlike traditional benchmarks that test isolated skills, VAKRA measures compositional reasoning across APIs and documents, using full execution traces to assess whether agents can reliably complete multi-step workflows. VAKRA provides an executable environment where agents interact with over 8,000+ locally hosted APIs backed by real databases spanning 62 domains, along with domain-aligned document collections. Tasks can require 3-7 step reasoning chains that combine structured API interaction with unstructured retrieval under natural-language tool-use constraints. As can be seen below, models perform poorly on VAKRA - in this blog, we include additional dataset details about the tasks in VAKRA…

10dAPI

220d ago

Public AI on Hugging Face Inference Providers 🔥

Public AI on Hugging Face Inference Providers 🔥 We're thrilled to share that Public AI is now a supported Inference Provider on the Hugging Face Hub! Public AI joins our growing ecosystem, enhancing the breadth and capabilities of serverless inference directly on the Hub’s model pages. Inference Providers are also seamlessly integrated into our client SDKs (for both JS and Python), making it super easy to use a wide variety of models with your preferred providers. This launch makes it easier than ever to access public and sovereign models from institutions like the Swiss AI Initiative and AI Singapore — right from Hugging Face. You can browse Public AI’s org on the Hub at https://huggingface.co/publicai and try trending supported models at https://huggingface.co/models?inference_provider=publicai&sort=trending. The Public AI Inference Utility is a nonprofit, open-source project. The team builds products and organizes advocacy to…

220dAPI#inference#open-source

268d ago

Implementing MCP Servers in Python: An AI Shopping Assistant with Gradio

Implementing MCP Servers in Python: An AI Shopping Assistant with Gradio For Python developers, Gradio makes implementing powerful MCP servers a breeze, offering features like: - Automatic conversion of python functions into LLM tools: Each API endpoint in your Gradio app is automatically converted into an MCP tool with a corresponding name, description, and input schema. The docstring of your function is used to generate the description of the tool and its parameters. - Real-time progress notifications: Gradio streams progress notifications to your MCP client, allowing you to monitor the status in real-time without having to implement this feature yourself. - Automatic file uploads, including support for public URLs and handling of various file types. Imagine this: you hate shopping because it takes too much time, and you dread trying on clothes yourself. What if an LLM could handle this…

268dAPI#fine-tuning#coding

347d ago

Blazingly fast whisper transcriptions with Inference Endpoints

Blazingly fast whisper transcriptions with Inference Endpoints Through this release, we would like to make Inference Endpoints more community-centric and allow anyone to come and contribute to create incredible inference deployments on the Hugging Face Platform. Along with the community, we would like to propose optimized deployments for a wide range of tasks through the use of awesome and available open-source technologies. The unique position of Hugging Face, at the heart of the Open-Source AI Community, working hand-in-hand with individuals, institutions and industrial partners, makes it the most heterogeneous platform when it comes to deploying AI models for inference on a wide variety of hardware and software. Inference Stack The new Whisper endpoint leverages amazing open-source community projects. Inference is powered by the vLLM project, which provides efficient ways of running AI models on various hardware families – especially, but…

347dAPI#fine-tuning#inference

400d ago

The New and Fresh analytics in Inference Endpoints

Analytics is important Analytics and metrics are the cornerstone of understanding what's happening with your deployment. Are your Inference Endpoints overloaded? How many requests are they handling? Having well-visualized, relevant metrics displayed in real-time is crucial for monitoring and debugging. We realized that our analytics dashboard needed a refresh. Since we debug a lot of endpoints ourselves, we’ve felt the same pain as our users. That’s why we sat down to plan and make several improvements to provide a better experience for you. What’s New? ⏰ Real-Time Metrics: Data now updates in real-time, ensuring you get an accurate and up-to-the-second view of your endpoint’s performance. Whether you’re monitoring request latency, response times, or error rates, you can now see the events as they happen. We’ve also reworked the backend of our analytics dashboard to ensure that data loads swiftly, especially…

400dAPI#fine-tuning#inference

425d ago

Remote VAEs for decoding with Inference Endpoints 🤗

Remote VAEs for decoding with Inference Endpoints 🤗 When operating with latent-space diffusion models for high-resolution image and video synthesis, the VAE decoder can consume quite a bit more memory. This makes it hard for the users to run these models on consumer GPUs without going through latency sacrifices and others alike. For example, with offloading, there is a device transfer overhead, causing delays in the overall inference latency. Tiling is another solution that lets us operate on so-called “tiles” of inputs. However, it can have a negative impact on the quality of the final image. Therefore, we want to pilot an idea with the community — delegating the decoding process to a remote endpoint. No data is stored or tracked, and code is open source. We made some changes to huggingface-inference-toolkit and use custom handlers. This experimental feature is…

425dAPI#fine-tuning#inference#coding

621d ago

Tool Use, Unified

Tool Use, Unified Introduction Tool use is a curious feature – everyone thinks it’s great, but most people haven’t tried it themselves. Conceptually, it’s very straightforward: you give some tools (callable functions) to your LLM, and it can decide to call them to help it respond to user queries. Maybe you give it a calculator so it doesn’t have to rely on its internal, unreliable arithmetic abilities. Maybe you let it search the web or view your calendar, or you give it (read-only!) access to a company database so it can pull up information or search technical documentation. Tool use overcomes a lot of the core limitations of LLMs. Many LLMs are fluent and loquacious but often imprecise with calculations and facts and hazy on specific details of more niche topics. They don’t know anything that happened after their training…

621dAPI

724d ago

Powerful ASR + diarization + speculative decoding with Hugging Face Inference Endpoints

Powerful ASR + diarization + speculative decoding with Hugging Face Inference Endpoints We'll solve this challenge using a custom inference handler, which will implement the Automatic Speech Recognition (ASR) and Diarization pipeline on Inference Endpoints, as well as supporting speculative decoding. The implementation of the diarization pipeline is inspired by the famous Insanely Fast Whisper, and it uses a Pyannote model for diarization. This will also be a demonstration of how flexible Inference Endpoints are and that you can host pretty much anything there. Here is the code to follow along. Note that during initialization of the endpoint, the whole repository gets mounted, so your handler.py can refer to other files in your repository if you prefer not to have all the logic in a single file. In this case, we decided to separate things into several files to keep…

724dAPI#fine-tuning#inference#coding

914d ago

Deploy Embedding Models with Hugging Face Inference Endpoints

Deploy Embedding Models with Hugging Face Inference Endpoints Compared to LLMs are Embedding Models smaller in size and faster for inference. That is very important since you need to recreate your embeddings after you changed your model or improved your model fine-tuning. Additionally, is it important that the whole retrieval augmentation process is as fast as possible to provide a good user experience. In this blog post, we will show you how to deploy open-source Embedding Models to Hugging Face Inference Endpoints using Text Embedding Inference, our managed SaaS solution that makes it easy to deploy models. Additionally, we will teach you how to run large scale batch requests. - What is Hugging Face Inference Endpoints - What is Text Embedding Inference - Deploy Embedding Model as Inference Endpoint - Send request to endpoint and create embeddings Before we start,…

914dAPI#fine-tuning#inference#embeddings

995d ago

Deploy MusicGen in no time with Inference Endpoints

Deploy MusicGen in no time with Inference Endpoints Inference Endpoints allow us to write custom inference functions called custom handlers. These are particularly useful when a model is not supported out-of-the-box by the transformers high-level abstraction pipeline . transformers pipelines offer powerful abstractions to run inference with transformers -based models. Inference Endpoints leverage the pipeline API to easily deploy models with only a few clicks. However, Inference Endpoints can also be used to deploy models that don't have a pipeline, or even non-transformer models! This is achieved using a custom inference function that we call a custom handler. Let's demonstrate this process using MusicGen as an example. To implement a custom handler function for MusicGen and deploy it, we will need to: - Duplicate the MusicGen repository we want to serve, - Write a custom handler in handler.py and any…

995dAPI#fine-tuning#inference

1026d ago

Deploy LLMs with Hugging Face Inference Endpoints

Deploy LLMs with Hugging Face Inference Endpoints In this blog post, we will show you how to deploy open-source LLMs to Hugging Face Inference Endpoints, our managed SaaS solution that makes it easy to deploy models. Additionally, we will teach you how to stream responses and test the performance of our endpoints. So let's get started! Before we start, let's refresh our knowledge about Inference Endpoints. What is Hugging Face Inference Endpoints Hugging Face Inference Endpoints offers an easy and secure way to deploy Machine Learning models for use in production. Inference Endpoints empower developers and data scientists alike to create AI applications without managing infrastructure: simplifying the deployment process to a few clicks, including handling large volumes of requests with autoscaling, reducing infrastructure costs with scale-to-zero, and offering advanced security. Here are some of the most important features for…

1026dAPI#fine-tuning#inference

1165d ago

Why we’re switching to Hugging Face Inference Endpoints, and maybe you should too

Why we’re switching to Hugging Face Inference Endpoints, and maybe you should too - Deploy (almost) any model on Hugging Face Hub - To any cloud (AWS, and Azure, GCP on the way) - On a range of instance types (including GPU) - We’re switching some of our Machine Learning (ML) models that do inference on a CPU to this new service. This blog is about why, and why you might also want to consider it. What were we doing? The models that we have switched over to Inference Endpoints were previously managed internally and were running on AWS Elastic Container Service (ECS) backed by AWS Fargate. This gives you a serverless cluster which can run container based tasks. Our process was as follows: - Train model on a GPU instance (provisioned by CML, trained with transformers) - Upload to…

1165dAPI#fine-tuning#inference

1551d ago

Supercharged Searching on the 🤗 Hub

Supercharged Searching on the Hugging Face Hub huggingface_hub library is a lightweight interface that provides a programmatic approach to exploring the hosting endpoints Hugging Face provides: models, datasets, and Spaces. Up until now, searching on the Hub through this interface was tricky to pull off, and there were many aspects of it a user had to "just know" and get accustomed to. In this article, we will be looking at a few exciting new features added to huggingface_hub to help lower that bar and provide users with a friendly API to search for the models and datasets they want to use without leaving their Jupyter or Python interfaces. Before we begin, if you do not have the latest version of the huggingface_hub library on your system, please run the following cell: !pip install huggingface_hub -U Situating the Problem: First, let's…

1551dAPI#fine-tuning

1674d ago

Summer at Hugging Face

Summer At Hugging Face 😎 In this blog post you'll catch up on everything that happened at Hugging Face in June, July and August! This post covers a wide range of areas our team has been working on, so don't hesitate to skip to the parts that interest you the most 🤗 New Features In the last few months, the Hub went from 10,000 public model repositories to over 16,000 models! Kudos to our community for sharing so many amazing models with the world. And beyond the numbers, we have a ton of cool new features to share with you! Spaces Beta (hf.co/spaces) Spaces is a simple and free solution to host Machine Learning demo applications directly on your user profile or your organization hf.co profile. We support two awesome SDKs that let you build cool apps easily in Python:…

1674dAPI

▾[OLL]Ollama Blog· 1 articlesvisit →

823d ago

Python & JavaScript Libraries January 23, 2024 The initial versions of the Ollama Python and JavaScript libraries are now available, making it easy to integrate your Python or JavaScript, or Typescript app with Ollama in a few lines of code. Both libraries include all the features of the Ollama REST API, are familiar in design, and compatible with new and previous versions of Ollama.

Python & JavaScript Libraries January 23, 2024 The initial versions of the Ollama Python and JavaScript libraries are now available: Both libraries make it possible to integrate new and existing apps with Ollama in a few lines of code, and share the features and feel of the Ollama REST API. Getting Started Python pip install ollama import ollama response = ollama.chat(model='llama2', messages=[ { 'role': 'user', 'content': 'Why is the sky blue?', }, ]) print(response['message']['content']) JavaScript npm install ollama import ollama from 'ollama' const response = await ollama.chat({ model: 'llama2', messages: [{ role: 'user', content: 'Why is the sky blue?' }], }) console.log(response.message.content) Use cases Both libraries support Ollama’s full set of features. Here are some examples in Python: Streaming for chunk in chat('mistral', messages=messages, stream=True): print(chunk['message']['content'], end='', flush=True) Multi-modal with open('image.png', 'rb') as file: response = ollama.chat( model='llava', messages=[ {…

823dAPI#llama

▾[OAI]OpenAI Blog· 5 articlesvisit →

10d ago

The next evolution of the Agents SDK

The next evolution of the Agents SDK The updated Agents SDK helps developers build agents that can inspect files, run commands, edit code, and work on long-horizon tasks within controlled sandbox environments. We’re introducing new capabilities to the Agents SDK(opens in a new window) that give developers standardized infrastructure that is easy to get started with and is built correctly for OpenAI models: a model-native harness that lets agents work across files and tools on a computer, plus native sandbox execution for running that work safely. For example, developers can give an agent a controlled workspace, explicit instructions, and the tools it needs to inspect evidence: Developers need more than the best models to build useful agents—they need systems that support how agents inspect files, run commands, write code, and keep working across many steps. The systems that exist today…

10dAPI#coding

201d ago

Introducing apps in ChatGPT and the new Apps SDK

Introducing apps in ChatGPT and the new Apps SDK A new generation of apps you can chat with and the tools for developers to build them. Update on November 13, 2025: Apps are now available in preview to ChatGPT Business, Enterprise and Edu customers. Today we’re introducing a new generation of apps you can chat with, right inside ChatGPT. Developers can start building them today with the new Apps SDK, available in preview. Apps in ChatGPT fit naturally into conversation. You can discover them when ChatGPT suggests one at the right time, or by calling them by name. Apps respond to natural language and include interactive interfaces you can use right in the chat. For ChatGPT users, apps meet you in the chat and adapt to your context to help you create, learn, and do more. For developers, building with…

201dAPI#gpt#coding

201d ago

Codex is now generally available

We’re announcing the general availability of Codex and three new features that make it even more useful for engineering teams: - A new Slack integration: Delegate tasks or ask questions to Codex directly from a team channel or thread, just like you would a coworker. - Codex SDK: Embed the same agent that powers the Codex CLI into your own workflows, tools, and apps for state-of-the-art performance on GPT‑5‑Codex without extra tuning. - New admin tools: With environment controls, monitoring, and analytics dashboards, ChatGPT workspace admins now have more visibility and control to manage Codex at scale. Since the Codex cloud agent launched in research preview in May, Codex has steadily evolved into a more reliable and capable coding collaborator. You can now work with it everywhere you code—in your editor, terminal, and the cloud, all connected by your ChatGPT…

201dAPI#coding

834d ago

Building AI-powered apps for business

Retool is a platform that helps developers build custom business software faster. Businesses use millions of Retool applications and workflows to do everything from scheduling the Olympics, to sending rockets into space, to simply making daily business processes more efficient. “Every business can improve operations with AI,” said Anthony Guo, Chief Technology Officer at Retool. “Many critical business operations—that create trillions of dollars in value—are stuck on pen and paper or spreadsheets. We want to make building custom AI-powered applications for business more accessible for everyone. And that means making AI actionable with real data and users faster.” This month, Retool announced Retool AI(opens in a new window)—a suite of capabilities aimed at lowering the barrier for teams to build AI-powered apps for any business process. Powered by GPT‑4, Retool AI turns common AI actions like generating text or describing…

834dAPI#gpt

1354d ago

New and improved content moderation tooling

To help developers protect their applications against possible misuse, we are introducing the faster and more accurate Moderation endpoint(opens in a new window). This endpoint provides OpenAI API developers with free access to GPT‑based classifiers that detect undesired content—an instance of using AI systems to assist with human supervision of these systems. We have also released both a technical paper(opens in a new window) describing our methodology and the dataset(opens in a new window) used for evaluation. When given a text input, the Moderation endpoint assesses whether the content is sexual, hateful, violent, or promotes self-harm—content prohibited by our content policy(opens in a new window). The endpoint has been trained to be quick, accurate, and to perform robustly across a range of applications. Importantly, this reduces the chances of products “saying” the wrong thing, even when deployed to users at-scale.…

1354dAPI#fine-tuning#coding