$ timeahead_
all sourcesAhead of AI (Sebastian Raschka)Anthropic NewsApple Machine Learning ResearchArs Technica AIAWS Machine Learning BlogCerebras BlogCohere BlogCrewAI BlogDeepSeek BlogDistill.pubfast.ai BlogFireworks AI BlogGoogle AI BlogGoogle Cloud AI BlogGoogle DeepMind BlogGroq BlogHaystack (deepset) BlogHugging Face BlogImport AI (Jack Clark)LangChain BlogLangFuse BlogLil'Log (Lilian Weng)LlamaIndex BlogMeta AI BlogMicrosoft AutoGen BlogMicrosoft Research BlogMistral AI NewsMIT Technology ReviewModal Blogn8n BlogNathan Lambert (RLHF)NVIDIA Developer BlogOllama BlogOpenAI BlogPerplexity AI BlogPyTorch BlogReplicate BlogSimon Willison BlogTensorFlow BlogThe Batch (DeepLearning.AI)The GradientThe Verge AITogether AI BlogVentureBeat AIvLLM BlogWeights & Biases BlogWired AIxAI (Grok) Blog
allapiagentsframeworkshardwareinframodelopen sourcereleaseresearchtutorial
★ TOP STORY[ AWS ]Infra·2d ago

Applying multimodal biological foundation models across therapeutics and patient care

Artificial Intelligence Applying multimodal biological foundation models across therapeutics and patient care Healthcare and life sciences decision making increasingly relies on multimodal data to diagnose diseases, prescribe medicine and predict treatment outcomes, develop and optimize innovative therapies accurately. Traditional approaches analyze fragmented data, such as ‘omics for drug discovery, medical images for diagnostics, clinical trial reports for validation, and electronic health records (EHR) for patient treatment. As a result, decision makers (CxOs, VPs, Directors) often miss critical insights hidden in the relationships between data types. Recent advancements in AI enable you to integrate and analyze these fragmented data streams efficiently to support a more complete understanding of therapeutics and patient care. AWS provides a unified environment for multimodal biological foundation models (BioFMs), enabling you to make more confident, timely decision-making in personalized medicine. This AI system combines biological data, model…

AWS Machine Learning Blogread →
▲ trending · last 48hview all →
[AWS]AWS Machine Learning Blog· 28 articlesvisit →
2d ago
Amazon Quick for marketing: From scattered data to strategic action
Artificial Intelligence Amazon Quick for marketing: From scattered data to strategic action Imagine the following scenario: You’re leading marketing campaigns, creating content, or driving demand generation. Your campaigns are scattered and your insights are buried. By the time you’ve pieced together what’s working, the moment to act has already passed. This isn’t a tools problem because you have plenty of those. It’s a connection problem. Your marketing systems and tools are disconnected, so you spend time moving data between systems instead of improving campaigns or sharing results with your team. Amazon Quick changes how you work. You can set it up in minutes and by the end of the day, you will wonder how you ever worked without it. Quick connects with your applications, tools, and data, creating a personal knowledge graph that learns your priorities, preferences, and network. It…
2dby Zach Conley
3d ago
Company-wise memory in Amazon Bedrock with Amazon Neptune and Mem0
Artificial Intelligence Company-wise memory in Amazon Bedrock with Amazon Neptune and Mem0 This post is cowritten by Shawn Tsai from TrendMicro. Delivering relevant, context-aware responses is important for customer satisfaction. For enterprise-grade AI chatbots, understanding not only the current query but also the organizational context behind it is key. Company-wise memory in Amazon Bedrock, powered by Amazon Neptune and Mem0, provides AI agents with persistent, company-specific context—enabling them to learn, adapt, and respond intelligently across multiple interactions. TrendMicro, one of the largest antivirus software companies in the world, developed the Trend’s Companion chatbot, so their customers can explore information through natural, conversational interactions (learn more). TrendMicro aimed to enhance its AI chatbot service to deliver personalized, context-aware support for enterprise customers. The chatbot needed to retain conversation history for continuity, reference company-specific knowledge at scale, and ensure that memory remained…
3dTutorialby Shawn Tsai
3d ago
Get to your first working agent in minutes: Announcing new features in Amazon Bedrock AgentCore
Artificial Intelligence Get to your first working agent in minutes: Announcing new features in Amazon Bedrock AgentCore Getting an agent running has always meant solving a long list of infrastructure problems before you can test whether the agent itself is any good. You wire up frameworks, storage, authentication, and deployment pipelines, and by the time your agent handles its first real task, you’ve spent days on infrastructure instead of agent logic. We built AgentCore from the ground up to help developers focus on building agent logic instead of backend plumbing, working with frameworks and models they already use, including LangGraph, LlamaIndex, CrewAI, Strands Agents, and more. Today, we’re introducing new capabilities that further streamline the agent building experience, removing the infrastructure barriers that slow teams down at every stage of agent development from the first prototype through production deployment. Go…
3dInfra#agentsby Madhu Parthasarathy
3d ago
Amazon SageMaker AI now supports optimized generative AI inference recommendations
Artificial Intelligence Amazon SageMaker AI now supports optimized generative AI inference recommendations Organizations are racing to deploy generative AI models into production to power intelligent assistants, code generation tools, content engines, and customer-facing applications. But deploying these models to production remains a weeks-long process of navigating GPU configurations, optimization techniques, and manual benchmarking, delaying the value these models are built to deliver. Today, Amazon SageMaker AI supports optimized generative AI inference recommendations. By delivering validated, optimal deployment configurations with performance metrics, Amazon SageMaker AI keeps your model developers focused on building accurate models, not managing infrastructure. We evaluated several benchmarking tools and chose NVIDIA AIPerf, a modular component of NVIDIA Dynamo, because it exposes detailed, consistent metrics and supports diverse workloads out of the box. Its CLI, concurrency controls, and dataset options give us the flexibility to iterate quickly and…
3dInfra#inference#codingby Mona Mona
3d ago
Cost-effective multilingual audio transcription at scale with Parakeet-TDT and AWS Batch
Artificial Intelligence Cost-effective multilingual audio transcription at scale with Parakeet-TDT and AWS Batch Many organizations are archiving large media libraries, analyzing contact center recordings, preparing training data for AI, or processing on-demand video for subtitles. When data volumes grow significantly, managed automatic speech recognition (ASR) service costs can quickly become the primary constraint on scalability. To address this cost-scalability challenge, we use the NVIDIA Parakeet-TDT-0.6B-v3 model, deployed through AWS Batch on GPU-accelerated instances. Parakeet-TDT’s Token-and-Duration Transducer architecture simultaneously predicts text tokens and their duration to intelligently skip silence and redundant processing. This helps achieve inference speeds orders of magnitude faster than real-time. By paying only for brief bursts of compute rather than the full length of your audio, you can transcribe at scale for fractions of a cent per hour of audio based on the benchmarks described in this post.…
3dTutorial#rag#inference#multimodalby Gleb Geinke
4d ago
From developer desks to the whole organization: Running Claude Cowork in Amazon Bedrock
Artificial Intelligence From developer desks to the whole organization: Running Claude Cowork in Amazon Bedrock Today, we’re excited to announce Claude Cowork in Amazon Bedrock. You can now run Cowork and Claude Code Desktop through Amazon Bedrock, directly or using an LLM gateway. From startups to global enterprises across every industry, organizations build with Claude Code in Amazon Bedrock to boost developer productivity and accelerate delivery. With Amazon Bedrock you can build within your existing AWS environment, maintain enterprise security and regional data residency, and scale inference. Your data stays under your account’s controls: Amazon Bedrock does not store prompts, files, tool inputs and outputs, or model responses, and does not use them to train foundation models. With Claude Cowork in Amazon Bedrock, you can expand AI adoption to every knowledge worker in your organization, with a desktop application that…
4dModel#claude#codingby Sofian Hamiti
4d ago
End-to-end lineage with DVC and Amazon SageMaker AI MLflow apps
Artificial Intelligence End-to-end lineage with DVC and Amazon SageMaker AI MLflow apps Production machine learning (ML) teams struggle to trace the full lineage of a model through the data and the code that trained it, the exact dataset version it consumed, and the experiment metrics that justified its deployment. Without this traceability, questions like “which data trained the model currently in production?” or “can we reproduce the model we deployed six months ago?” become multi-day investigations through scattered logs, notebooks, and Amazon Simple Storage Service (Amazon S3) buckets. This gap is especially acute in regulated industries. For example, healthcare, financial services, autonomous vehicles, where audit requirements demand that you link deployed models to their precise training data, and where individual records might need to be excluded from future training on request. In this post, we show how to combine three…
4dTutorial#observabilityby Manuwai Korber
5d ago
Omnichannel ordering with Amazon Bedrock AgentCore and Amazon Nova 2 Sonic
Artificial Intelligence Omnichannel ordering with Amazon Bedrock AgentCore and Amazon Nova 2 Sonic Introduction Building a voice-enabled ordering system that works across mobile apps, websites, and voice interfaces (an omnichannel approach) presents real challenges. You need to process bidirectional audio streams, maintain conversation context across multiple turns, integrate backend services without tight coupling, and scale to handle peak traffic. In this post, we’ll show you how to build a complete omnichannel ordering system using Amazon Bedrock AgentCore, an agentic platform, to build, deploy, and operate highly effective AI agents securely at scale using any framework and foundation model and Amazon Nova 2 Sonic. You’ll deploy infrastructure that handles authentication, processes orders, and provides location-based recommendations. The system uses managed services that scale automatically, reducing the operational overhead of building voice AI applications. By the end, you’ll have a working system…
5dTutorial#agentsby Sergio Barraza
5d ago
Accelerate Generative AI Inference on Amazon SageMaker AI with G7e Instances
Artificial Intelligence Accelerate Generative AI Inference on Amazon SageMaker AI with G7e Instances As the demand for generative AI continues to grow, developers and enterprises seek more flexible, cost-effective, and powerful accelerators to meet their needs. Today, we are thrilled to announce the availability of G7e instances powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs on Amazon SageMaker AI. You can provision nodes with 1, 2, 4, and 8 RTX PRO 6000 GPU instances, with each GPU providing 96 GB of GDDR7 memory. This launch provides the capability to use a single-node GPU, G7e.2xlarge instance to host powerful open source foundation models (FMs) like GPT-OSS-120B, Nemotron-3-Super-120B-A12B (NVFP4 variant), and Qwen3.5-35B-A3B, offering organizations a cost-effective and high-performing option. This makes it well suited for those looking to improve costs while maintaining high performance for inference workloads. The key highlights…
5d ago
ToolSimulator: scalable tool testing for AI agents
Artificial Intelligence ToolSimulator: scalable tool testing for AI agents You can use ToolSimulator, an LLM-powered tool simulation framework within Strands Evals, to thoroughly and safely test AI agents that rely on external tools, at scale. Instead of risking live API calls that expose personally identifiable information (PII), trigger unintended actions, or settling for static mocks that break with multi-turn workflows, you can use ToolSimulator’s large language model (LLM)-powered simulations to validate your agents. Available today as part of the Strands Evals Software Development Kit (SDK), ToolSimulator helps you catch integration bugs early, test edge cases comprehensively, and ship production-ready agents with confidence. Prerequisites Before you begin, make sure that you have the following: - Python 3.10 or later installed in your environment - Strands Evals SDK installed: pip install strands-evals - Basic familiarity with Python, including decorators and type hints…
5dAPI#agents#benchmarkby Darren Wang
8d ago
Introducing granular cost attribution for Amazon Bedrock
Artificial Intelligence Introducing granular cost attribution for Amazon Bedrock As AI inference grows into a significant share of cloud spend, understanding who and what are driving costs is essential for chargebacks, cost optimization, and financial planning. Today, we’re announcing granular cost attribution for Amazon Bedrock inference. Amazon Bedrock now automatically attributes inference costs to the IAM principal that made the call. An IAM principal can be an IAM user, a role assumed by an application, or a federated identity from a provider like Okta or Entra ID. Attribution flows to your AWS Billing and works across models, with no resources to manage and no changes to your existing workflows. With optional cost allocation tags, you can aggregate costs by team, project, or custom dimension in AWS Cost Explorer and AWS Cost and Usage Reports (CUR 2.0). In this post, we…
8dReleaseby Ba'Carri Johnson
8d ago
Optimize video semantic search intent with Amazon Nova Model Distillation on Amazon Bedrock
Artificial Intelligence Optimize video semantic search intent with Amazon Nova Model Distillation on Amazon Bedrock Optimizing models for video semantic search requires balancing accuracy, cost, and latency. Faster, smaller models lack routing intelligence, while larger, accurate models add significant latency overhead. In Part 1 of this series, we showed how to build a multimodal video semantic search system on AWS with intelligent intent routing using the Anthropic Claude Haiku model in Amazon Bedrock. While the Haiku model delivers strong accuracy for user search intent, it increases end-to-end search time to 2-4 seconds. This contributes to 75% of the overall latency. Now consider what happens as the routing logic grows more complex. Enterprise metadata can be far more complex than the five attributes in our example (title, caption, people, genre, and timestamp). Customers may factor in camera angles, mood and sentiment,…
8dTutorial#inference#multimodal#embeddingsby Amit Kalawat
8d ago
Power video semantic search with Amazon Nova Multimodal Embeddings
Artificial Intelligence Power video semantic search with Amazon Nova Multimodal Embeddings Video semantic search is unlocking new value across industries. The demand for video-first experiences is reshaping how organizations deliver content, and customers expect fast, accurate access to specific moments within video. For example, sports broadcasters need to surface the exact moment a player scored to deliver highlight clips to fans instantly. Studios need to find every scene featuring a specific actor across thousands of hours of archived content to create personalized trailers and promotional content. News organizations need to retrieve footage by mood, location, or event to publish breaking stories faster than competitors. The goal is the same: deliver video content to end users quickly, capture the moment, and monetize the experience. Video is naturally more complex than other modalities like text or image because it amalgamates multiple unstructured…
8dTutorial#multimodal#embeddingsby Amit Kalawat
8d ago
Nova Forge SDK series part 2: Practical guide to fine-tune Nova models using data mixing capabilities
Artificial Intelligence Nova Forge SDK series part 2: Practical guide to fine-tune Nova models using data mixing capabilities This hands-on guide walks through every step of fine-tuning an Amazon Nova model with the Amazon Nova Forge SDK, from data preparation to training with data mixing to evaluation, giving you a repeatable playbook you can adapt to your own use case. This is the second part in our Nova Forge SDK series, building on the SDK introduction and first part, which covered kicking off customization experiments. The focus of this post is data mixing: the technique that lets you fine-tune on domain-specific data without sacrificing a model’s general capabilities. In the previous post, we made the case for why this matters, blending customer data with Amazon-curated datasets preserved near-baseline Massive Multitask Language Understanding (MMLU) scores while delivering a 12-point F1 improvement…
8dTutorial#fine-tuning#trainingby Gideon Teo
8d ago
From hours to minutes: How Agentic AI gave marketers time back for what matters
Artificial Intelligence From hours to minutes: How Agentic AI gave marketers time back for what matters Your marketing team loses hours to page assembly, coordination emails, and review cycles. These manual workflows keep teams from their most important work: identifying what problems customers face, crafting messages that resonate, and building campaigns that drive meaningful engagement. In this post, we share how AWS Marketing’s Technology, AI, and Analytics (TAA) team worked with Gradial to build an agentic AI solution on Amazon Bedrock for accelerating content publishing workflows. The solution reduced webpage assembly time from up to four hours to approximately ten minutes (a reduction of over 95%) while maintaining quality standards across enterprise content management systems (CMS). Our marketing teams can now publish content faster and more consistently, freeing them to focus on finding more effective ways to reach and serve…
8dAgents#agentsby Ishara Premadasa
9d ago
Cost-efficient custom text-to-SQL using Amazon Nova Micro and Amazon Bedrock on-demand inference
Artificial Intelligence Cost-efficient custom text-to-SQL using Amazon Nova Micro and Amazon Bedrock on-demand inference Text-to-SQL generation remains a persistent challenge in enterprise AI applications, particularly when working with custom SQL dialects or domain-specific database schemas. While foundation models (FMs) demonstrate strong performance on standard SQL, achieving production-grade accuracy for specialized dialects requires fine-tuning. However, fine-tuning introduces an operational trade-off: hosting custom models on persistent infrastructure incurs continuous costs, even during periods of zero utilization. The on-demand inference of Amazon Bedrock with fine-tuned Amazon Nova Micro models offers an alternative. By combining the efficiency of LoRA (Low-Rank Adaptation) fine-tuning with serverless and pay-per-token inference, organizations can achieve custom text-to-SQL capabilities without the overhead cost incurred by persistent model hosting. Despite the additional inference time overhead of applying LoRA adapters, testing demonstrated latency suitable for interactive text-to-SQL applications, with costs scaling by…
9dModel#fine-tuning#inferenceby Zeek Granston
9d ago
Transform retail with AWS generative AI services
Artificial Intelligence Transform retail with AWS generative AI services Online retailers face a persistent challenge: shoppers struggle to determine the fit and look when ordering online, leading to increased returns and decreased purchase confidence. The cost? Lost revenue, operational overhead, and customer frustration. Meanwhile, consumers increasingly expect immersive, interactive shopping experiences that bridge the gap between online and in-store retail. Retailers implementing virtual try-on technology can improve purchase confidence and reduce return rates, translating directly to improved profitability and customer satisfaction. This post demonstrates how to build a virtual try-on and recommendation solution on AWS using Amazon Nova Canvas, Amazon Rekognition and Amazon OpenSearch Serverless. Whether you’re an AWS Partner developing retail solutions or a retailer exploring generative AI transformation, you’ll learn the architecture, implementation approach, and key considerations for deploying this solution. You can find the code base to…
9dTutorial#codingby Bhavya Chugh
9d ago
How Automated Reasoning checks in Amazon Bedrock transform generative AI compliance
Artificial Intelligence How Automated Reasoning checks in Amazon Bedrock transform generative AI compliance Compliance teams in regulated industries spend weeks on manual reviews, pay for outside consultants, and still face audit gaps when AI outputs lack formal proof. Automated Reasoning checks in Amazon Bedrock Guardrails address this by replacing probabilistic AI validation with mathematical verification, turning AI-generated decisions into provably correct, auditable results. In this post, you’ll learn why probabilistic AI validation falls short in regulated industries and how Automated Reasoning checks use formal verification to deliver mathematically proven results. You’ll also see how customers across six industries use this technology to produce formally verified, auditable AI outputs, and how to get started. The compliance challenge Regulated industries face high-stakes compliance challenges. Hospitals navigate radiation safety regulations. Financial institutions classify AI risk under the EU AI Act. Insurance carriers answer…
9dTutorialby Nafi Diallo
10d ago
Accelerating decode-heavy LLM inference with speculative decoding on AWS Trainium and vLLM
Artificial Intelligence Accelerating decode-heavy LLM inference with speculative decoding on AWS Trainium and vLLM Practical benchmarks showing faster inter-token latency when deploying Qwen3 models with vLLM, Kubernetes, and AWS AI Chips. Speculative decoding on AWS Trainium can accelerate token generation by up to 3x for decode-heavy workloads, helping reduce the cost per output token and improving throughput without sacrificing output quality. If you build AI writing assistants, coding agents, or other generative AI applications, your workloads likely produce far more tokens than they consume, making the decode stage the dominant cost of inference. During autoregressive decoding, tokens are generated sequentially, leaving hardware accelerators memory-bandwidth-bound and underutilized. This drives up the cost per generated token. Speculative decoding addresses this bottleneck by letting a small draft model propose multiple tokens at once, which the target model verifies in a single forward pass.…
10dTutorial#inference#codingby Yahav Biran
10d ago
Rede Mater Dei de Saúde: Monitoring AI agents in the revenue cycle with Amazon Bedrock AgentCore
Artificial Intelligence Rede Mater Dei de Saúde: Monitoring AI agents in the revenue cycle with Amazon Bedrock AgentCore This post is cowritten by Renata Salvador Grande, Gabriel Bueno and Paulo Laurentys at Rede Mater Dei de Saúde. The growing adoption of multi-agent AI systems is redefining critical operations in healthcare. In large hospital networks, where thousands of decisions directly impact cash flow, service delivery times, and the risk of claim denials, the ability to monitor, track, and govern AI agents has become essential for operational sustainability. This is the journey of Rede Mater Dei de Saúde, which is implementing its suite of 12 AI agents using Amazon Bedrock AgentCore, a comprehensive service that provides agent runtime, tool integration, memory management, and built-in observability for production AI agents. About Rede Mater Dei de Saúde With 45 years of history, Rede Mater…
10dAgents#agents#observabilityby Renata Salvador Grande
10d ago
Create rich, custom tooltips in Amazon Quick Sight
Artificial Intelligence Create rich, custom tooltips in Amazon Quick Sight Amazon Quick Sight, the business intelligence (BI) capability of Amazon Quick, is a unified BI service. It provides modern interactive dashboards, natural language querying, pixel-perfect reports, machine learning (ML) insights, and embedded analytics at scale. Amazon Quick brings together AI agents for business insights, research, and automation in one integrated experience, helping you work smarter and faster while maintaining security and access policies. Today, we’re announcing sheet tooltips in Amazon Quick Sight. Dashboard authors can now design custom tooltip layouts using free-form layout sheets. These layouts combine charts, key performance indicator (KPI) metrics, text, and other visuals into a single tooltip that renders dynamically when readers hover over data points. Sheet tooltips work with most chart types, including tables and pivot tables, and authors can reuse the same tooltip sheet…
10dReleaseby Meshan Khosla
11d ago
Spring AI SDK for Amazon Bedrock AgentCore is now Generally Available
Artificial Intelligence Spring AI SDK for Amazon Bedrock AgentCore is now Generally Available Agentic AI is transforming how organizations use generative AI, moving beyond prompt-response interactions to autonomous systems that can plan, execute, and complete complex multi-step tasks. While early proof of concepts in Agentic AI spaces excite business stakeholders, scaling them to production requires addressing scalability, governance, and security challenges. Amazon Bedrock AgentCore is an Agentic AI platform to build, deploy, and operate agents at scale using any framework and any model. Java developers want to build AI agents using known Spring patterns, but production deployment requires infrastructure that’s complex to implement from scratch. Amazon Bedrock AgentCore provides building blocks like managed runtime infrastructure (scalability, reliability, security, observability), short- and long-term memory, browser automation, sandboxed code execution, and evaluations. Integrating these capabilities into a Spring application currently requires writing…
11dAgents#agents#fine-tuning#coding#open-sourceby Andrei Shakirin
11d ago
How Guidesly built AI-generated trip reports for outdoor guides on AWS
Artificial Intelligence How Guidesly built AI-generated trip reports for outdoor guides on AWS This is guest post by David Lord, Taylor Lord, Shiva Prasad, Anup Banasavalli Hiriyanagowda, Nikhil Chandra from Guidesly. Guidesly is reshaping how outdoor recreation is booked, run, and experienced. Founded in 2019, it began as a way to connect anglers, hunters, divers, and outdoor recreation enthusiasts with trusted guides, dive shops, and charters. It has since grown into a vertical AI software as a service (SaaS) system serving the entire industry. With Guidesly Pro, outdoor professionals gain a business solution that powers every part of their operation—bookings, payments, websites, client management, and marketing—all from a single system. For many guides, the toughest challenge is getting discovered and cutting through the noise online. Even those who know what must be done can spend up to eight hours a…
11dTutorial#rag#multimodalby David Lord, Taylor Lord, Shiva Prasad, Anup Banasavalli Hiriyanagowda, Nikhil Chandra
11d ago
Best practices to run inference on Amazon SageMaker HyperPod
Artificial Intelligence Best practices to run inference on Amazon SageMaker HyperPod Deploying and scaling foundation models for generative AI inference presents challenges for organizations. Teams often struggle with complex infrastructure setup, unpredictable traffic patterns that lead to over-provisioning or performance bottlenecks, and the operational overhead of managing GPU resources efficiently. These pain points result in delayed time-to-market, suboptimal model performance, and inflated costs that can make AI initiatives unsustainable at scale. This post explores how Amazon SageMaker HyperPod addresses these challenges by providing a comprehensive solution for inference workloads. We walk you through the platform’s key capabilities for dynamic scaling, simplified deployment, and intelligent resource management. By the end of this post, you’ll understand how to use the HyperPod automated infrastructure, cost optimization features, and performance enhancements to reduce your total cost of ownership by up to 40% while accelerating…
11dTutorial#inferenceby Vinay Arora
11d ago
Use-case based deployments on SageMaker JumpStart
Artificial Intelligence Use-case based deployments on SageMaker JumpStart Amazon SageMaker JumpStart provides pretrained models for a wide range of problem types to help you get started with AI workloads. SageMaker JumpStart offers access to solutions for top use cases that can be deployed to SageMaker AI Managed Inference endpoints or SageMaker HyperPod clusters. Through pre-set deployment options, customers can quickly move from model selection to model deployment. Model deployments through SageMaker JumpStart are fast and straightforward. Customers could select options based on expected concurrent users, with visibility into P50 latency, time-to-first token (TTFT), and throughput (token/second/user). While concurrent user configuration options are helpful for general-purpose scenarios, they aren’t task-aware, and we recognize that customers use SageMaker JumpStart for diverse, specific use cases like content generation, content summarization, or Q&A. Each use case might require specific configurations to improve performance. Moreover,…
11dInfraby Dan Ferguson
11d ago
Navigating the generative AI journey: The Path-to-Value framework from AWS
Artificial Intelligence Navigating the generative AI journey: The Path-to-Value framework from AWS Generative AI is reshaping how organizations approach productivity, customer experiences, and operational capabilities. Across industries, teams are experimenting with generative AI to unlock new ways of working. Many of these efforts produce compelling proofs of concept (POC) that demonstrate technical feasibility. The real challenge begins after those early wins. Although POCs frequently demonstrate technical feasibility, organizations often struggle to translate them into production-ready systems that deliver measurable business value. The journey from concept to production, and from production to sustained value creation, introduces challenges across technical, organizational, and governance dimensions. The Generative AI Path-to-Value (P2V) framework was created to address this gap. It provides a mental model and practical guide to help organizations systematically move generative AI initiatives from ideation and experimentation to production at scale. The goal…
11dTutorialby Nitin Eusebius
12d ago
How to build effective reward functions with AWS Lambda for Amazon Nova model customization
Artificial Intelligence How to build effective reward functions with AWS Lambda for Amazon Nova model customization Building effective reward functions can help you customize Amazon Nova models to your specific needs, with AWS Lambda providing the scalable, cost-effective foundation. Lambda’s serverless architecture lets you focus on defining quality criteria while it handles the computational infrastructure. Amazon Nova offers multiple customization approaches, with Reinforcement fine-tuning (RFT) standing out for its ability to teach models desired behaviors through iterative feedback. Unlike Supervised fine-tuning (SFT) that requires thousands of labeled examples with annotated reasoning paths, RFT learns from evaluation signals on final outputs. At the heart of RFT lies the reward function—a scoring mechanism that guides the model toward better responses. This post demonstrates how Lambda enables scalable, cost-effective reward functions for Amazon Nova customization. You’ll learn to choose between Reinforcement Learning via…
12dTutorial#trainingby Manoj Gupta
16d ago
Understanding Amazon Bedrock model lifecycle
Artificial Intelligence Understanding Amazon Bedrock model lifecycle Amazon Bedrock regularly releases new foundation model (FM) versions with better capabilities, accuracy, and safety. Understanding the model lifecycle is essential for effective planning and management of AI applications built on Amazon Bedrock. Before migrating your applications, you can test these models through the Amazon Bedrock console or API to evaluate their performance and compatibility. This post shows you how to manage FM transitions in Amazon Bedrock, so you can make sure your AI applications remain operational as models evolve. We discuss the three lifecycle states, how to plan migrations with the new extended access feature, and practical strategies to transition your applications to newer models without disruption. Amazon Bedrock model lifecycle overview A model offered on Amazon Bedrock can exist in one of three states: Active, Legacy, or End-of-Life (EOL). Their current…
16dTutorialby Saurabh Trikande