$ timeahead_
all sourcesAhead of AI (Sebastian Raschka)Anthropic NewsApple Machine Learning ResearchArs Technica AIAWS Machine Learning BlogCerebras BlogCohere BlogCrewAI BlogDeepSeek BlogDistill.pubfast.ai BlogFireworks AI BlogGoogle AI BlogGoogle Cloud AI BlogGoogle DeepMind BlogGroq BlogHaystack (deepset) BlogHugging Face BlogImport AI (Jack Clark)LangChain BlogLangFuse BlogLil'Log (Lilian Weng)LlamaIndex BlogMeta AI BlogMicrosoft AutoGen BlogMicrosoft Research BlogMistral AI NewsMIT Technology ReviewModal Blogn8n BlogNathan Lambert (RLHF)NVIDIA Developer BlogOllama BlogOpenAI BlogPerplexity AI BlogPyTorch BlogReplicate BlogSimon Willison BlogTensorFlow BlogThe Batch (DeepLearning.AI)The GradientThe Verge AITogether AI BlogVentureBeat AIvLLM BlogWeights & Biases BlogWired AIxAI (Grok) Blog
allapiagentsframeworkshardwareinframodelopen sourcereleaseresearchtutorial
★ TOP STORY[ RB ]Tutorial·10d ago

How to make remarkable videos with Seedance 2.0

How to make remarkable videos with Seedance 2.0 Run Seedance 2.0 AI video used to be utterly bad. (We’ve all seen Will Smith eat spaghetti more times than we can count, so I’ll spare you.) Last year, however, we really began to see AI video take off with front-runners like Google’s Veo 3 series and Kling from Kuaishou. With each new model release, we inched toward improvements with prompt adherence, audio integration, and solving the “AI look.” Seedance 2.0 is the largest step change we’ve seen in months. You can make movies with this thing. A catastrophic collision between two massive space stations in low Earth orbit. Metal shears apart in slow motion as the stations grind into each other, sending a hailstorm of debris spiraling outward. Entire modules crumple like tin cans. Pressurized compartments blow out in violent bursts…

Replicate Blogread →
▲ trending · last 48hview all →
[RB]Replicate Blog· 99 articlesvisit →
60d ago
How to prompt Seedream 5.0
How to prompt Seedream 5.0 Run Seedream 5.0 ByteDance’s Seedream line has been on a tear. We spent a bunch of time throwing prompts at it. Here’s what we found. Aesthetics Before we get into the meat, let’s talk about how the images actually look. Seedream 5.0 produces genuinely beautiful output — the kind of images where you zoom in and the details hold up. A color film-inspired portrait of a young man looking to the side with a shallow depth of field that blurs the surrounding elements, drawing attention to his eye. The fine grain and cast suggest a high ISO film stock, while the wide aperture lens creates a motion blur effect, enhancing the candid and natural documentary style. The model understands photographic language at a deep level. You can reference specific film stocks, lens characteristics, and lighting…
60dTutorial#multimodal
66d ago
Recraft V4: image generation with design taste
Recraft V4: image generation with design taste Recraft V4 is Recraft’s latest image generation model, rebuilt from the ground up. The big idea behind it is what the Recraft team calls “design taste” — the model makes visual decisions about composition, lighting, and color that feel intentional rather than generic. Images come out looking art-directed, even from simple prompts. V4 comes in four versions — two raster, two vector: All four share the same design taste and prompt accuracy. The differences are output format, resolution, and speed. Some examples These prompts are designed to push V4 into territory where most image models fall flat — complex typography layouts, precise material rendering, extreme detail at macro scale, structured vector assets, and stylized illustration with character. Typography and editorial design V4 treats text as a first-class element of composition. This prompt asks…
66dInfra#multimodal
150d ago
Run Isaac 0.1 on Replicate
Run Isaac 0.1 on Replicate Run Isaac 0.1 Perceptron AI has released Isaac 0.1, a 2B-parameter, open-weight vision-language model built for grounded perception. Isaac answers questions about images, reasons about spatial relationships, reads text in cluttered environments, and points to where its answers come from. Despite its small size, Isaac rivals models many times larger at OCR, object recognition, and visual reasoning. What makes Isaac 0.1 special Grounded visual reasoning Isaac not only describes a scene, but can explain why its answers are correct, returning bounding boxes or regions tied to each claim. This helps you build applications that need transparency, traceability, or step-by-step evidence. Strong OCR in real-world conditions The model reads small or partially obstructed text on signs, labels, packaging, and documents. It combines OCR with contextual understanding, so you can ask questions like: “What’s the return address?”…
150dInfra#multimodal
151d ago
Run FLUX.2 on Replicate
Run FLUX.2 on Replicate Run FLUX.2 Black Forest Labs has released FLUX.2, their most advanced image generation model yet. This release brings significant improvements in image quality, editing capabilities, and enterprise-grade efficiency. FLUX.2 is available to run on Replicate today. FLUX.2 comes in three variants: - FLUX.2 [pro]: replicate.com/black-forest-labs/flux-2-pro - FLUX.2 [flex]: replicate.com/black-forest-labs/flux-2-flex - FLUX.2 [dev]: replicate.com/black-forest-labs/flux-2-dev FLUX.2 [pro] generates images in 6 seconds, or 9 seconds with an input image. It accepts up to 8 input images, and costs $0.015 + $0.015 per input and output megapixel. FLUX.2 [flex] generates higher quality images, especially typography and fine-grained details. It generates images in 22 seconds, or 40 seconds with an input image. FLUX.2 [flex] accepts up to 10 input images, and costs $0.06 per input and output megapixel. FLUX.2 [dev] is an open-source, distilled version of FLUX.2 [pro] that we…
151dInfra#multimodal
156d ago
How to prompt Nano Banana Pro
How to prompt Nano Banana Pro Run Nano Banana Pro …so Nano Banana Pro was released yesterday, as we’re sure you are aware. The AI community has already created an insane amount of generations with this model. Yes, it can handle the basics of any image model: style transfer, object removal, text rendering, realistic images. But these are the least shocking of its capabilities. In this post, we really wanted to highlight some of the crazy images the AI community has been able to extract from Nano Banana Pro. Strap in. Logic One of the most impressive facets of Nano Banana Pro is its baked-in logic. Typically, image models are good at constructing new photos from the spatial information found in the input image. However, no image model has been able to deduce, interpret, and answer textual information found in…
156dTutorial#multimodal
157d ago
Retro Diffusion's pixel art models are now on Replicate
Retro Diffusion's pixel art models are now on Replicate Retro Diffusion have crafted a beautiful set of models and styles for creating game assets, character sprites, tiles, and other wonderful retro graphics. These models are trained specifically for grid-aligned, limited-palette pixel art. And now you can run them on Replicate. Retro Diffusion have four models on Replicate: - rd-fast: replicate.com/retro-diffusion/rd-fast - rd-plus: replicate.com/retro-diffusion/rd-plus - rd-tile: replicate.com/retro-diffusion/rd-tile - rd-animation: replicate.com/retro-diffusion/rd-animation These models have tons of different style presets to play with, and most of them support arbitrary width/height, input images, palette images, background removal, and seemless tiling. rd-fast: Fast pixel art image generation retro-diffusion/rd-fast is optimized for speed and supports 15 styles, from portraits to Minecraft-style items. import replicate output = replicate.run( "retro-diffusion/rd-fast", input={ "prompt": "knight character, retro game asset, side view", "style": "game_asset", "width": 128, "height": 128, "num_images": 1, "remove_bg":…
157dInfra
159d ago
Replicate is joining Cloudflare
Replicate is joining Cloudflare Big news: We’re joining Cloudflare. Replicate’s going to carry on as a distinct brand, and all that’ll happen is that it’s going to get way better. It’ll be faster, we’ll have more resources, and it’ll integrate with the rest of Cloudflare’s Developer Platform. The API isn’t changing. The models you’re using today will keep working. If you’ve built something on Replicate, it’ll keep running just like it does now. So, why are we doing this? At Replicate, we’re building the primitives for AI: the tools and abstractions that let software developers use AI without having to understand all the complex stuff underneath. We started with Cog, an open-source tool which defines a standard format for what a model is. Then, we created Replicate, a platform where people can share models and run them with an API.…
159dInfra
186d ago
Extract text from documents and images with Datalab Marker and OCR
Extract text from documents and images with Datalab Marker and OCR Datalab’s state-of-the-art document parsing and text extraction models are now on Replicate. Marker turns PDF, DOCX, PPTX, images (and more!) into markdown or JSON. It formats tables, math, and code, extracts images, and can pull specific fields when you pass a JSON Schema. OCR detects text in ninety languages from images and documents, and returns reading order and table grids. The Marker model is based on the popular open source Marker project (29k Github stars) and OCR is based on Surya (19k Github stars). Run Marker and OCR on Replicate: Run Marker import replicate output = replicate.run( "datalab-to/marker", input={ "file": open("report.pdf", "rb"), "mode": "balanced", # fast / balanced / accurate "include_metadata": True, # return page-level JSON metadata }, ) print(output["markdown"][:400]) Run OCR import replicate output = replicate.run( "datalab-to/ocr", input={…
186dModel
191d ago
How to prompt Veo 3.1
How to prompt Veo 3.1 Run Veo 3.1 Google just came out with Veo 3.1 which offers a few new shiny tools with video generation, including character reference images and first/last frame input. We made a quick prompting guide to show you the capabilities of this model. As always with the Google video models, there is a general guide you should follow to ensure your outputs are as strong as they can be. - Shot composition: Specify the framing and number of subjects in the shot (e.g., “single shot,” “two shot,” “over-the-shoulder shot”). - Focus and lens effects: Use terms like “shallow focus,” “deep focus,” “soft focus,” “macro lens,” and “wide-angle lens” to achieve specific visual effects. - Overall style and subject: Guide creative direction by specifying styles like “sci-fi,” “romantic comedy,” “action movie,” or “animation.” - Camera positioning and…
191dTutorial#multimodal
205d ago
IBM's Granite 4.0 is now on Replicate
IBM's Granite 4.0 is now on Replicate IBM has released Granite 4.0, their latest family of open-source small language models built for speed and low cost. The Granite 4.0 models use a hybrid architecture that uses less memory than traditional models, so you can run them on regular consumer GPUs instead of expensive server hardware. They work well for document summarization, RAG systems, and AI agents. ibm-granite/granite-4.0-h-small is a 30 billion parameter long-context instruct model and it’s now available on Replicate. Running Granite 4.0 with an API You can start using Granite models right away on Replicate. Here’s how to run them with an API: cURL curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -d $'{ "version": "ibm-granite/granite-4.0-h-small", "input": { "messages": [ { "role": "user", "content": "Explain the key benefits of using open-source models…
205dInfra
214d ago
Which image editing model should I use?
Which image editing model should I use? Replicate Playground In the past few weeks, nearly every major AI lab has released an image editing model. The first was FLUX.1 Kontext from Black Forest Labs in May, which stood out for style transformations and simple image edits. Since then, we’ve seen a wave of models, each strong in its own way. With so many options, it can be hard to figure out which one works best for your needs. In this post, we’re putting them head to head and evaluating each across a range of image editing tasks. By the end, you should have a clear sense of which one fits your workflow. To start, here’s an overview of the cost and average inference time for each model we’re evaluating. The cheapest is GPT-image-1 from OpenAI which starts $0.01 per image,…
221d ago
Introducing our new search API
Introducing our new search API We’ve added a new search API to help you find the best models. This API is currently in beta, but it’s already available to all users in our TypeScript and Python SDKs, and our MCP servers. Here’s an example of how to use it with cURL: curl -s \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ "https://api.replicate.com/v1/search?query=lip+sync" Here’s a video of the search API in action using our MCP server with Claude Desktop: More metadata The new search API returns results for models, collections, and documentation pages that match your query. { query: "lip sync", models: [ {model: { url, run_count, etc }, metadata: { tags, score, etc }}, {model: { url, run_count, etc }, metadata: { tags, score, etc }}, {model: { url, run_count, etc }, metadata: { tags, score, etc }}, ], collections: [ {name,…
221dRelease
229d ago
Torch compile caching for inference speed
Torch compile caching for inference speed We now cache torch.compile artifacts to reduce boot times for models that use PyTorch. Models like black-forest-labs/flux-kontext-dev, prunaai/flux-schnell, and prunaai/flux.1-dev-lora now start 2-3x faster. We’ve published a guide to improving model performance with torch.compile that covers more of the details. What is torch.compile? Many models, particularly those in the FLUX family, apply various torch.compile technique/tricks to improve inference speed. The first call to a compiled function traces and compiles the code, which adds overhead. Subsequent calls run the optimized code and are significantly faster. In our tests of inference speed with black-forest-labs/flux-kontext-dev, the compiled version runs over 30% faster than the uncompiled one. Performance improvements By caching the compiled artifacts across model container lifecycles, we’ve seen dramatic improvements in cold boot times: - black-forest-labs/flux-kontext-dev: ~120s → ~60s (50% faster) - prunaai/flux-schnell: ~150s → ~70s…
258d ago
Announcing Replicate's remote MCP server
Announcing Replicate's remote MCP server Last month we quietly published a local MCP server for Replicate’s HTTP API. Today we’re announcing a hosted remote MCP server that you can use with apps like Claude Desktop, Claude Code, Cursor, and VS Code, giving you the power to explore and run all of Replicate’s HTTP APIs from a familiar chat-based natural language interface. To get started, head over to 👉 mcp.replicate.com 👈 What is MCP? MCP stands for Model Context Protocol. It’s a standard developed at Anthropic for giving language models access to external tools. This is commonly called “tool use” or “function calling”. This makes language models way more powerful, as they can now access external tools and data sources, instead of just their own internal knowledge. Once you’ve installed the server, you can ask questions in Claude or Cursor like:…
258dInfra#claude
267d ago
How to prompt Veo 3 with images
How to prompt Veo 3 with images Try Veo 3 Image input in Veo 3 has been highly anticipated, and it’s now on Replicate. Here are some of the coolest and most useful tricks we discovered. Style preservation The biggest appeal of image input with Veo 3 is being able to animate your images while preserving their unique visual style. Whether it’s a cartoon, painting, or photograph, Veo 3 maintains the artistic integrity of your original image throughout the video. The fire in the room begins to burn. Maintain the style of the image. In this example, we just told Veo 3 to keep the style the same with no constraints on the actual action or direction of the video. The model still does a great job at creating interesting motion and camera movement while preserving your image’s style: Keep…
267dTutorial#multimodal
268d ago
Open source video is back
Open source video is back Try WAN 2.2 Wan 2.2 has set the open source video community ablaze. It’s a huge leap forward from 2.1 with sharp physics, faster generation, and more control. And because it’s fully open source, it opens the door to high-quality video tools for anyone at an astounding fraction of the cost. We’ve teamed up with Pruna AI to release our own optimized version of Wan 2.2. You can test it in these ways: - FAST image-to-video, 480p — $0.05 per video - FAST text-to-video, 480p — $0.05 per video - FAST text-to-video, 720p — $0.10 per video You heard right. 5 cents a video. That’s even less than leading image generators on the AI market. With inference times at ~30s, Wan 2.2 can become your leading way to rapidly test video prompts. And don’t let…
278d ago
Generate consistent characters
Generate consistent characters Until recently, the best way to generate images of a consistent character was from a trained lora. You would need to create a dataset of images and then train a FLUX lora on them. If you want to go back further, you might remember having to use a ComfyUI workflow. A workflow that would combine SDXL, controlnets, IPAdapters and some non-commercial face landmark models. Things have got remarkably simpler. Today we have a choice of state of the art image models that can do this accurately from a single reference. In this blog post we’ll highlight which models can do this, and which is best depending on your needs. she is wearing a pink t-shirt with the text “Replicate” on it The best models for consistent characters As of July 2025, there are four models on Replicate…
282d ago
Bria is now on Replicate
Bria is now on Replicate We’ve partnered with Bria to bring a suite of commercial-safe visual AI models to Replicate - perfect for enterprises and developers building responsibly with generative AI. Unlike most models trained on scraped web data, Bria’s models are trained exclusively on licensed datasets from Getty Images, Envato, Freepik, and others. That means the models are ready for real-world use, with no copyright baggage. 🛠️ Try the models You can now run Bria’s models directly on Replicate: - Image Generation (Bria 3.2) — Text-to-image model trained on fully licensed data - Remove Background — Precision background removal with clean edges - Eraser (Inpainting) — Remove objects without leaving artifacts - Genfill — Context-aware inpainting for object addition or extension - Increase Resolution — Upscale images with sharp results - Expand Image — Extend an image’s canvas seamlessly…
284d ago
How we optimized FLUX.1 Kontext [dev]
How we optimized FLUX.1 Kontext [dev] FLUX.1 Kontext [dev] In addition to making our FLUX.1 Kontext [dev] implementation open-source, we wanted to provide more guidance on how we chose to optimize it without compromising on quality. In this post, you will mainly learn about TaylorSeer optimization, a method to approximate intermediate image predictions by using cached image changes (derivatives) and formulae derived from Taylor Series approximations. Fellow optimization nerds, read on. (We pulled most of our implementation info from the following paper.) If you head to the predict function in predict.py from our FLUX.1 Kontext [dev] repo, you will find the main logic. (Highly suggest working through the repo and using this post as a guide for understanding its structure.) Let’s break it down. On TaylorSeer When generating a new image with FLUX.1 Kontext, you apply a diffusion transformation across…
284dTutorial#open-source
292d ago
Compare AI video models
Compare AI video models Posted July 7, 2025 by Last updated: August 11, 2025. It’s hard keeping up with every new video model. In this post we’ll help you pick the best one for your needs. We’ll break this down into two parts: - key model specs like price, resolution, duration, fps, speed, and date of release - features like text-to-video, image-to-video, subject references, and native audio Every video model is available for commercial use on Replicate. Specs Where a price range is given, it’s from the lowest-priced to the highest-priced video (based on duration and resolution). Generation speed is also a range from the fastest to the slowest. Times and prices are correct as of July 7, 2025. Video generation speed can improve over time, as the model is optimized or switched to better hardware.
292dHardware#multimodal
298d ago
The FLUX.1 Kontext hackathon
The FLUX.1 Kontext hackathon Try FLUX.1 Kontext AI fashion assistants. UI variants of your favorite websites. Voice-assisted photo editors. 3D dinner tables. Text-to-movie apps. Our FLUX.1 Kontext hackathon might’ve been our best one yet. We teamed up with Black Forest Labs to give over 120 participants the chance to build with FLUX.1 Kontext, the image editing model that has taken the AI community by storm. Resemble AI also joined us to get hackers to build with Chatterbox, their latest text-to-speech model. We had some insanely creative projects and we were thoroughly impressed by what people built in a couple hours. Choosing from 30 submissions wasn’t easy, but a few stood out to us. Here were the winners. Replicate’s picks: - Tartan Photographers: AI-automated Shopify listings Rafael Cabrera and Christian Cherry created a site which automates uploading listings on Shopify and…
298dTutorial
319d ago
How to prompt Veo 3 for the best results
How to prompt Veo 3 for the best results Try Veo 3 Google’s Veo 3 generates videos with audio from text prompts. The audio can be dialogue, voice-overs, sound effects and music. Let our resident AI podcaster introduce us: Write what happens First the basics. A well-crafted prompt is the key to generating good videos. The more you can specify in your prompt, in plain language, the easier it is for Veo 3 to understand and generate the video you want. Try to include these visual elements in your prompt: - Subject: Who or what is in the scene — a person, animal, object, or landscape. - Context: Where is the subject? Indoors? A city street? A forest? - Action: Is your subject walking, jumping, turning their head? - Style: The visual aesthetic you’re aiming for (cinematic, animated, stop-motion, etc).…
319dTutorial#multimodal
324d ago
Get the most from Google Veo 3
Get the most from Google Veo 3 Try Veo 3 Veo 3 from Google has taken the AI community by storm. And with good reason. With Veo 3, you can generate not just visuals, but native audio too. That includes sound effects, ambient noise, and dialogue. This model also understands your prompts better. It’s more accurate, more consistent, and more grounded in the real world. Researchers at Google DeepMind have worked to craft a model with strong prompt adherence, accurate physics, and hyperrealism. We’ve been testing the limits of the model, and it’s been surprising us in the best ways. NO WAY. It did it. And, was that, actually funny? — fofr (@fofrAI) May 20, 2025 Prompt: > a man doing stand up comedy in a small venue tells a joke (include the joke in the dialogue) https://t.co/GFvPAssEHx pic.twitter.com/LrCiVAp1Bl Why…
324dResearch
327d ago
FLUX.1 Kontext from the community
FLUX.1 Kontext from the community Try FLUX.1 Kontext In case you missed it: FLUX.1 Kontext launched last week. Judging by the reaction, most of you didn’t - but here’s the TL;DR: FLUX.1 Kontext is a new image editing model from Black Forest Labs. It’s top of its class for text-based image editing - better, cheaper, and no yellow tint compared to OpenAI’s 4o. There are three models: - FLUX.1 Kontext [pro]: High-quality image edits with strong prompt following. - FLUX.1 Kontext [max]: Top-tier performance with sharper typography. - FLUX.1 Kontext [dev] (coming soon): Open-weight version. Kontext can handle small tweaks and big changes—like color swaps, background edits, text replacements, and style transfers—while keeping characters consistent. Some prompting tips for you: - Be specific. - Start simple. - Preserve key elements. - Break big edits into steps. - Quote text. Since…
327dOpen Source
331d ago
Use FLUX.1 Kontext to edit images with words
Use FLUX.1 Kontext to edit images with words Try FLUX.1 Kontext FLUX.1 Kontext is a new image editing model from Black Forest Labs. It is the best in class model for editing images using text prompts, and the latest addition to the FLUX.1 family. In our tests we’ve found Kontext to give accurate and brilliant results. It’s better and cheaper than OpenAI’s 4o/gpt-image-1 model (and there’s no yellow tint). There are three models, two are available now, and a third open-weight version is coming soon: - FLUX.1 Kontext [pro]: State-of-the-art performance for image editing. High-quality outputs, great prompt following, and consistent results. - FLUX.1 Kontext [max]: A premium model that brings maximum performance, improved prompt adherence, and high-quality typography generation without compromise on speed. - Coming soon: FLUX.1 Kontext [dev]: An open-weight, guidance-distilled version of Kontext. We’re so excited with…
331dTutorial
338d ago
Generate incredible images with Google's Imagen 4
Generate incredible images with Google's Imagen 4 Google’s latest flagship image model, Imagen 4, is now available for you to run on Replicate. This is a preview release, so it might change, but it’s already showing off some impressive capabilities for creating high-quality images. What is Imagen 4? Imagen 4 is Google DeepMind’s most advanced image generation model yet, designed to help you bring your creative visions to life. It excels at producing photorealistic images with sharp clarity and has made significant strides in rendering text accurately. Key features include: - Fine detail rendering: Imagen 4 captures intricate details beautifully, whether it’s the texture of fabric, tiny water droplets, or the softness of animal fur. - Style versatility: The model is adept at generating images in a wide range of styles, from hyperrealistic photographs to abstract art and illustrations. -…
338dInfra#multimodal
338d ago
Run OpenAI’s latest models on Replicate
Run OpenAI’s latest models on Replicate Posted May 22, 2025 by You can now run OpenAI’s latest chat, vision, and reasoning models on Replicate, including GPT-4.1, GPT-4o, and the o-series. Here are the new models: - GPT-4.1 series: Handles long context (up to 1 million tokens). Good for large documents, full codebases, and agent workflows. - GPT-4o series: Fast, multimodal models that understand text, images, and audio. - o-series: Models built for structured reasoning in math, science, and complex problem solving. - GPT-4o-transcribe: Converts audio to text with GPT-4o. Fast, accurate, and ready for real-time use. - GPT-image-1, DALL-E: OpenAI’s image models. You can swap between full, mini, and nano variants to match your cost and speed needs. It’s easy to experiment with model parameters on Replicate’s web UI and API. For example, this is how you run GPT 4.1…
338dInfra#gpt
344d ago
NVIDIA H100 GPUs are here
NVIDIA H100 GPUs are here You can now run NVIDIA H100 GPUs on Replicate. You can also now use 2x, 4x, and 8x configurations of A100s and L40S GPUs. These were previously only available in deployments, but now you can use them for regular models and training runs. If you’ve been waiting to speed up your model or try something more powerful, now’s a good time. H100 pricing 1x H100s are now available to everyone. 2x, 4x, and 8x H100s are currently reserved for committed spend contracts. Email us at team@replicate.com if you want access. A100 pricing (2x, 4x, 8x) These multi-GPU setups for A100s are now available for models (they were already available for deployments): See the full hardware pricing list for more details. L40S pricing (2x, 4x, 8x) These multi-GPU setups for L40S GPUs are now available for…
344dHardware#gpu
345d ago
Run 30,000+ LoRAs on Hugging Face with Replicate
Run 30,000+ LoRAs on Hugging Face with Replicate LoRAs have become the leading way to train image models to express specific concepts or styles. Think Studio Ghibli stills or capturing the vibe of 80s cyberpunk. Hugging Face is a go-to spot for sharing and trying LoRAs. Artists, researchers, and tinkerers upload their custom styles there, making them easy for anyone to use. It’s grown into one of the largest open collections of LoRAs online. Now, LoRAs can run directly on the Hugging Face Hub using Replicate for inference. This is possible thanks to a small update to Hugging Face’s inference client that hooks into Replicate behind the scenes. The result: fast, low-cost inference with your favorite LoRAs, right from the Hugging Face interface. How it works When you choose a supported LoRA model on Hugging Face and select Replicate as…
353d ago
Ideogram 3.0 on Replicate
Ideogram 3.0 on Replicate TL;DR: Ideogram 3.0 is a major update to the text-to-image model, with big improvements in realism, style control, and layout generation. It comes in three varieties, all of which are live on Replicate. About 3.0 Ideogram just launched version 3.0 of its text-to-image model, and it’s packed with features. The model comes in three varieties: “Turbo,” “Balanced,” and “Quality.” The “Turbo” model is perfect for rapid iterations, while the “Quality” model delivers the highest fidelity results when precision matters most. The “Balanced” option sits comfortably in between, offering a good compromise for most use cases. With all of the models, you get rich image results with more control, better text rendering, and a serious boost in realism. Here’s how you can run each of these models with Replicate’s JS client: import { writeFile } from "fs/promises";…
353dInfra
354d ago
Run MiniMax Speech-02 models with an API
Run MiniMax Speech-02 models with an API The Speech-02 series from MiniMax are text-to-speech models that let you create natural-sounding voices with emotional expression. The models have support for more than 30 languages. According to the Artificial Analysis Speech Arena, Speech-02-HD is the best text-to-speech model available today, while Speech-02-Turbo comes in third. With Replicate, you can run these models with one line of code. Listen to MiniMax Speech-02 Here’s a sample of the Speech-02-HD model reading an adapted version of this blog post, and the prediction that generated it. Try MiniMax Speech-02 You can choose between two models: Speech-02-HD for high-quality voiceovers and audiobooks, and Speech-02-Turbo, a cheaper model that’s faster and best suited for real-time applications. Both models can be used with a cloned voice. Voice cloning needs at least 10 seconds of audio and takes about 30…
354dResearch#multimodal
374d ago
Easel AI is now on Replicate
Easel AI is now on Replicate Two new models from Easel AI are now available on Replicate: advanced face swap and AI avatars. Both models are fast, flexible, and designed for production use across messaging, social, and creative apps. Advanced face swap Easel AI’s advanced face swap replaces not just the face, but the full body, preserving the user’s likeness — including skin tone, racial features, and gender. It maintains the original image’s outfits, lighting, and style for natural-looking results. Here are some highlights. - Swaps one or two people in a single image - Maintains racial identity, skin tone, and body features - Automatically detects gender in when using multi-face swap - Preserves clothing, aesthetics, and image details - Includes built-in upscaling for higher quality outputs This unlocks use cases across: - Marketing campaigns: Let users see themselves in…
374dInfra
389d ago
Stylized video with Wan2.1
Stylized video with Wan2.1 Wan 2.1 is a fast, open-source model for generating videos from text or images. A particularly fun way to use Wan is for video style transfer. Whether you want to make dreamy Studio Ghibli-style loops, gritty cyberpunk trailers, or something entirely your own, Wan makes it easy to create stylized videos. You can start right away with a premade style, or train a custom one in minutes using your own images. Replicate also supports fast inference with LoRAs on Wan 2.1, so you can apply styles to your videos with shorter wait times and no extra setup. Here’s how you can get started. Use a premade style You can run premade styles on top of these two text-to-video Wan 2.1 models: To use a style, type your prompt, set fast_mode to “Fast”, and add a link…
389dTutorial#multimodal
393d ago
Creative roundup: avatars, lightsabers, and LoRA tricks
Creative roundup: avatars, lightsabers, and LoRA tricks There has never been a more exciting time to play around with AI. Every week, new models drop, unexpected use cases emerge, and people push boundaries in ways that are equal parts strange and delightful. Here are some highlights of the coolest things happening — new models you can try, creative experiments from the community, and novel creations. ShieldGemma 2 by Google DeepMind ShieldGemma 2 is a powerful new model that detects NSFW (“not safe for work”) content, violent material, and unsafe instructions with high accuracy. It’s the first DeepMind model of its kind on Replicate, and a useful tool for building safer AI experiences — especially for social or user-facing apps. Hunyuan3D 2Mini by Tencent Hunyuan3D 2Mini is a faster, smaller version of Hunyuan’s earlier 3D generation model. It’s perfect for game…
393dResearch#fine-tuning
416d ago
Wan2.1: generate videos with an API
Wan2.1: generate videos with an API If you’ve been following the AI video space lately, you’ve probably noticed that it’s exploding. New models are coming out every week with better outputs, higher resolution, and faster generation speeds. Wan2.1 is the newest and most capable open-source video model. It was released last week, and it’s topping the leaderboards. There’s a lot to like about Wan2.1: - It’s fast on Replicate. A 5s video takes 39s at 480p, or 150s at 720p. - It’s open source, both the model weights and the code. The community is already building tools to enhance it. - It produces stunning videos with real-world accuracy. - It’s small enough to run on consumer GPUs. In this post we’ll cover the new models and how to run them with an API. Model flavors The model is available on…
416d ago
Wan2.1 parameter sweep
Wan2.1 parameter sweep We’ve been playing with Alibaba’s WAN2.1 text-to-video model lately. Like most image and video generation models, Wan has a lot of input parameters, and each of them can have a profound impact on the quality of the generated output. What happens when you tweak those mysterious inputs? Let’s find out. The experiment We wanted to see how the guidance scale and shift input parameters affect the output. For our experiment, we used the WAN2.1 14b text-to-video model with 720p resolution. To do this, we did what’s called a “parameter sweep”, systematically testing different combinations of input values to understand how they affect the output. We generated videos for each combination of guidance scale and shift values, keeping all other parameters constant. We kept the following inputs consistent across all the videos: prompt :"A smiling woman walking in…
416dResearch#multimodal
456d ago
You can now fine-tune open-source video models
You can now fine-tune open-source video models AI video generation has gotten really good. Some of the best video models like tencent/hunyuan-video are open-source, and the community has been hard at work building on top of them. We’ve adapted the Musubi Tuner by @kohya_tech to run on Replicate, so you can fine-tune HunyuanVideo on your own visual content. HunyuanVideo is good at capturing the style of the training data, not only in the visual appearance of the imagery and the color grading, but also in the motion of the camera and the way the characters move. This in-motion style transfer is unique to this implementation: other video models that are trained only on images cannot capture it. Here are some examples of videos created using different fine-tunes, all with the same settings, size, prompt and seed: You can make your…
463d ago
Generate short videos with the Replicate playground
Generate short videos with the Replicate playground AI video generation is here, but it’s not always easy to get the results you want. In this guide, I’ll share a convenient workflow for creating AI video with the Replicate playground. The playground is an experimental web interface that gives you a scrapbook-style UI for testing different models, comparing their outputs, and keeping a record of your experiments. We built it for quick iteration with image models, but we’ve found it works great for video models too. Step 1: Start with an image Text-to-video generation is not yet as fast as text-to-image. You should start with an image for more predictable video output, instead of starting with a text prompt, waiting a few minutes for each output, and hoping to luck into a good result. You might find an existing image in…
495d ago
AI video is having its Stable Diffusion moment
AI video is having its Stable Diffusion moment AI video used to not be very good: Then, 10 months later, OpenAI announced Sora: Sora reset expectations about what a video model could be. The output was high resolution, smooth, and coherent. The examples looked like real video. It felt like we’d jumped into the future. The problem was, nobody could use it! It was just a preview. This was like when OpenAI announced the DALL-E image generation model back in 2021. It was one of the most extraordinary pieces of software that had been seen for years, but nobody could use it. This created all of this pent-up demand that led to Stable Diffusion, which we wrote about last year. Now the same thing is happening with video. Sora made everyone realize what is possible. There are lots of models…
495dHardware#multimodal
515d ago
FLUX fine-tunes are now fast
FLUX fine-tunes are now fast You can fine-tune FLUX on Replicate with your own data. We’ve made running fine-tunes on Replicate much faster, and the optimizations are open-source. This builds upon our work from last month, where we made the FLUX base models much faster. Running a fine-tune is now the same speed as the base model: - FLUX.1 [schnell] at 512x512 and 4 steps: 0.6 seconds (P50) - FLUX.1 [dev] at 1024x1024 and 28 steps: 2.8 seconds (P50) In addition, the first time you run a fine-tune, it’ll take a bit of time to load the model. That’s usually about 2.5 seconds. Once it’s been loaded, we will attempt to route your requests to an instance that already has it loaded, and it will run as fast as the base model. To enable all optimizations, pass go_fast=true to your…
520d ago
FLUX.1 Tools – Control and steerability for FLUX
FLUX.1 Tools – Control and steerability for FLUX The team at Black Forest Labs is back with FLUX.1 Tools, a new set of models that add control and steerability to their FLUX text-to-image model. The FLUX.1 Tools lineup includes four new features: - Fill: Inpainting and outpainting, like a magic AI paintbrush for precise edits. - Canny: Use edge detection to generate images with precise structure. - Depth: Use depth maps to generate images with realistic perspective. - Redux: An adapter for the FLUX.1 base models that you can use to create variations of images. Each of these new features is available for both the FLUX.1 [dev] and FLUX.1 [pro] models, with Redux also available for FLUX.1 [schnell]. All of these models are now on Replicate. FLUX Fill is great at text inpainting One of the most exciting new features…
520dInfra
526d ago
NVIDIA L40S GPUs are here
NVIDIA L40S GPUs are here Today we added NVIDIA L40S GPUs to our supported hardware types. These new GPUs are around 40% faster than A40 GPUs. We’re also going to be removing support for A40 GPUs. We will begin migrating all existing models and deployments from A40 GPUs to L40S GPUs over the coming weeks. You’ll continue to pay the same price for your private models and deployments, but you might pay more if you’re using public models or training models on A40 GPUs. You can now run L40S GPUs for any new models, existing models, or deployments. To learn how to change the hardware type for your models and deployments, check out the docs. Starting today, you have the option to switch any of your existing models and deployments to L40S GPUs, but you are not required to do…
526dHardware#gpu
550d ago
Ideogram v2 is an outstanding new inpainting model
Ideogram v2 is an outstanding new inpainting model Update, May 2025: Ideogram has released their v3 series of models, including Turbo, Balanced, and Quality variants. Read more about Ideogram v3 here. Today Ideogram are launching their new inpainting feature for Ideogram v2. We’re thrilled to be partnering with Ideogram, to bring Ideogram v2 to Replicate’s API. We’ve been blown away by the quality of this model. It’s really good. Ideogram v2 comes in two flavors: - ideogram-ai/ideogram-v2 - Produces the best image quality. - ideogram-ai/ideogram-v2-turbo - Still high quality, but faster. For example, here is a herd of dinosaurs grazing on the Bucolic Green Hills: Ideogram v2 is not just for inpainting: you can use it to generate any type of image. In our tests, we found it to be particularly good at generating text. Run Ideogram v2 with an…
550dInfra
550d ago
Stable Diffusion 3.5 is here
Stable Diffusion 3.5 is here We’re excited to announce that Stable Diffusion 3.5, the latest and most powerful text-to-image model from Stability AI, is now available on Replicate. It brings significant improvements in image quality, better prompt understanding, and supports a wide range of artistic styles. Stable Diffusion 3.5 comes in three variants: - Stable Diffusion 3.5 Large generates the highest quality images - Stable Diffusion 3.5 Large Turbo is almost as high quality, but much faster - Stable Diffusion 3.5 Medium is a smaller model that can be run on the cloud as well as on consumer GPUs You can generate images using Stable Diffusion 3.5 right away. Try this in Python: import replicate output = replicate.run( "stability-ai/stable-diffusion-3.5-large", input={"prompt": "A watercolor painting of a futuristic city skyline at dawn"} ) print(output.url) Or use JavaScript: import Replicate from "replicate"; const…
550dInfra
562d ago
FLUX is fast and it's open source
FLUX is fast and it's open source FLUX is now much faster on Replicate, and we’ve made our optimizations open-source so you can see exactly how they work and build upon them. Here are the end-to-end speeds: - FLUX.1 [schnell] at 512x512 and 4 steps: 0.29 seconds (P90: 0.49 seconds) - FLUX.1 [schnell] at 1024x1024 and 4 steps: 0.72 seconds (P90: 0.95 seconds) - FLUX.1 [dev] at 1024x1024 and 28 steps: 3.03 seconds (P90: 3.90 seconds) This is from the west coast of the US using the Python client. Here’s a demo of FLUX.1 [schnell]. (It’s live, just start typing!) Here’s the full app, and source code, if you’d like to check it out. How did we do it? Most of the models on Replicate are contributed by our community, but we maintain the FLUX models in collaboration with Black…
562dInfra#open-source
569d ago
FLUX1.1 [pro] is here
FLUX1.1 [pro] is here If you’re paying attention to text-to-image AI leaderboards, you may recently have noticed a mysterious model named “blueberry” topping the charts. Well, today the cat’s out of the bag: blueberry is the codename for a new series of Flux models from our friends at Black Forest Labs. These new models are more powerful than any other open-source image generation models out there, and they are available to run on Replicate today: 🫐 replicate.com/black-forest-labs/flux-1.1-pro 🫐 replicate.com/black-forest-labs/flux-pro FLUX1.1 [pro] is a new model FLUX1.1 [pro] is a new, faster, more powerful version of FLUX.1 [pro]. It generates images six times faster than its predecessor FLUX.1 [pro] with higher image quality, better prompt adherence, and more output diversity. Independent benchmarks show it generates the highest quality images compared to other open source models, as of Oct 1, 2024. Pricing…
569dRelease#multimodal
582d ago
Using synthetic training data to improve Flux finetunes
Using synthetic training data to improve Flux finetunes Update (May 2025): We’ve released a faster version of the Flux trainer — try it here. I know, I know. We keep blogging about Flux. But there’s a reason: It’s really good! People are making so much cool stuff with it, and its capabilities continue to expand as the open-source community experiments with it. In this post I’ll cover some techniques you can use to generate synthetic training data to help improve the accuracy, diversity, and stylistic range of your fine-tuned Flux models. Getting started To use the techniques covered in this post, you should have an existing fine-tuned Flux model that needs a little improvement. If you haven’t created your own fine-tuned Flux model yet, check out our guides to fine-tuning Flux on the web or fine-tuning Flux with an API,…
593d ago
Fine-tune FLUX.1 with an API
Fine-tune FLUX.1 with an API You can now fine-tune models with the fast FLUX trainer on Replicate. It’s fast (under 2 minutes), cheap (under $2), and gives you a warm, runnable model plus LoRA weights to download. FLUX.1 is all the rage these days, and for good reason. It’s a fast, powerful image generation model that’s easy to use and fine-tune, and it generates stunning images. Last week we brought you a guide to fine-tuning Flux with faces. That guide used an entirely web-based flow to create a fine-tuned Flux model, without writing a single line of code. We heard from some users that they would like to fine-tune Flux with an API, so we’re back this week with another tutorial that shows you how to do just that. In this guide, you’ll create and run your own fine-tuned Flux…
593dInfra#fine-tuning
603d ago
Fine-tune FLUX.1 to create images of yourself
Fine-tune FLUX.1 to create images of yourself Update (May 2025): We’ve released a faster version of the Flux trainer — try it here. The FLUX.1 family of image generation models was released earlier this month and took the world by storm, producing images surpassing the quality of existing open-source models. The community quickly started to build new capabilities on top of Flux, and not long after the release we announced Flux fine-tuning support on Replicate. Fine-tuning Flux on Replicate is easy: you just need a handful of images to get started. No deep technical knowledge is required. You can even create a fine-tune entirely on the web, without writing a single line of code. The community has already published hundreds of public Flux fine-tunes on Replicate, plus thousands of private fine-tunes too. One of the most exciting things about Flux…
603dModel#fine-tuning
610d ago
Replicate Intelligence #12
Replicate Intelligence #12 Welcome to Replicate’s weekly bulletin! Each week, we’ll bring you updates on the latest open-source AI models, tools, and research. People are making cool stuff and we want to share it with you. Without further ado, here’s our hacker-in-residence deepfates with an unfiltered take on the week in AI. Editor’s note Earlier this week I turned into Hot Mark Zuckerberg. I just did it to test this deepfake library (which turned out to not even be open source after all but oh well) but now I feel weird about the future of Reality. You know how Zuck has been rehabilitating his PR image? He grew out his hair and started dressing cool. And there was an image that went viral where they used a filter to give him a beard. Well, I have curly hair and a…
610dInfra#fine-tuning
617d ago
Replicate Intelligence #11
Replicate Intelligence #11 Welcome to Replicate’s weekly bulletin! Each week, we’ll bring you updates on the latest open-source AI models, tools, and research. People are making cool stuff and we want to share it with you. Without further ado, here’s our hacker-in-residence deepfates with an unfiltered take on the week in AI. Editor’s note This week I’m thinking about how multimedia AI models will lead to real-time interactive world generation, and how it’s the bull case for VR and the metaverse. I talked with fellow Replicant Mattt about it, and watched his talk (see Research Radar below), and I can’t get it out of my mind. (Editor’s note: Neither Mattt nor Replicate are responsible for the following conjectures) Just this week: You can now fine-tune FLUX.1, Tavus launched their Conversational Video Interface, a “digital twin” API that can looks like…
617dInfra#multimodal
618d ago
Fine-tune FLUX.1 with your own images
Fine-tune FLUX.1 with your own images You can now fine-tune models with the fast FLUX trainer on Replicate. It’s fast (under 2 minutes), cheap (under $2), and gives you a warm, runnable model plus LoRA weights to download. FLUX.1 is a family of text-to-image models released by Black Forest Labs this summer. The FLUX.1 models set a new standard for open-source image models: they can generate realistic hands, legible text, and even the strangely hard task of funny memes. You can now fine-tune FLUX.1 [dev] with Ostris’s AI Toolkit on Replicate. Teach the model to recognize and generate new concepts by showing it a small set of example images, allowing you to customize the model’s output for specific styles, characters, or objects. Ostris’s toolkit uses the LoRA technique for fast, lightweight trainings. People have already made some amazing fine-tunes: How…
624d ago
Replicate Intelligence #10
Replicate Intelligence #10 Welcome to Replicate’s weekly bulletin! Each week, we’ll bring you updates on the latest open-source AI models, tools, and research. People are making cool stuff and we want to share it with you. Without further ado, here’s our hacker-in-residence deepfates with an unfiltered take on the week in AI. Editor’s note The news in open source this week was all FLUX.1. People have been amazed with the open image models, running nearly 5 million predictions on FLUX.1 [schnell] in the first week! Fine-tuning scripts are starting to come out. Expect to see some interesting new downstream models next week. For now we have image to image generation, and a ton of cool images people are creating. Check out our X feed and blog posts for some great examples. Trending models Image to image generation with FLUX.1 FLUX.1…
624dInfra
631d ago
FLUX.1: First Impressions
FLUX.1: First Impressions FLUX.1 is a new AI model (available on Replicate) that makes images from text. Unlike most text-to-image models, which rely on diffusion, FLUX.1 uses an upgraded technique called “flow matching.” While diffusion models create images by gradually removing noise from a random starting point, flow matching takes a more direct approach, learning the precise transformations needed to map noise onto a realistic image. This difference in methodology leads to a distinct aesthetic and unique advantages in terms of speed and control. We were curious to see how this approach impacts the generated images, so we fed it a variety of prompts, many created by other AI models. Here are some observations: Text: It gets it (mostly) One of the challenges in text-to-image generation is accurately translating words into visual representations. FLUX.1 handles this surprisingly well, even in…
631dInfra#multimodal
631d ago
Replicate Intelligence #9
Replicate Intelligence #9 Welcome to Replicate’s weekly bulletin! Each week, we’ll bring you updates on the latest open-source AI models, tools, and research. People are making cool stuff and we want to share it with you. Without further ado, here’s our hacker-in-residence deepfates with an unfiltered take on the week in AI. Editor’s note Open source does not slow down. New models are released as fast as I can try them out. There’s so much to do, so many new creative tools in our hands. Every individual can now be a startup, a brand, a publisher, a movie studio, all from their laptop or their phone. These new AI tools make it easier to generate new things. It’s even become the buzzword in some circles, “GenAI”. It’s certainly more impressive, and more obvious when a neural net generates something than…
632d ago
Run FLUX with an API
Run FLUX with an API FLUX.1 is a new open-source image generation model developed by Black Forest Labs, the creators of Stable Diffusion. It’s available on Replicate today, and you can run it in the cloud with one line of code. Here’s an example of how to run FLUX.1 on Replicate using JavaScript: import Replicate from "replicate"; const replicate = new Replicate(); const model = "black-forest-labs/flux-dev"; const prompt = "Purple striped narwhal devouring a fluffy high-resolution everything bagel"; const output = await replicate.run(model, {input: { prompt }}); console.log(output); You can try out FLUX.1 right in your browser, or run it programmatically in your language of choice. What makes FLUX.1 special? FLUX.1 models have state-of-the-art performance in prompt following, visual quality, image detail, and output diversity. Here are some particular areas where we’ve been impressed: Text! Unlike older models that often…
632dOpen Source#open-source
638d ago
Replicate Intelligence #8
Replicate Intelligence #8 Welcome to Replicate’s weekly bulletin! Each week, we’ll bring you updates on the latest open-source AI models, tools, and research. People are making cool stuff and we want to share it with you. Without further ado, here’s our hacker-in-residence deepfates with an unfiltered take on the week in AI. Editor’s note The big event this week was the release of Llama 3.1, Meta’s new generation of language models, including the 405 billion parameter model. This model is a peer to GPT-4, Claude 3, and Gemini 1.5, the big proprietary models from other labs. But unlike those labs, Meta doesn’t claim to be building superintelligence, or even AGI. They think of AI as a system, and language models as one component. Mark Zuckerberg, in his letter accompanying the release, repeatedly uses the phrase “AI systems”. More than most…
638dInfra#safety
641d ago
Run Meta Llama 3.1 405B with an API
Run Meta Llama 3.1 405B with an API Llama 3.1 is the latest language model from Meta. It features a massive 405 billion parameter model that rivals GPT-4 in quality, with a context window of 8000 tokens. With Replicate, you can run Llama 3.1 in the cloud with one line of code. Try Llama 3.1 in our API playground Before you dive in, try Llama 3.1 in our API playground. Try tweaking the prompt and see how Llama 3.1 responds. Most models on Replicate have an interactive API playground like this, available on the model page: https://replicate.com/meta/meta-llama-3.1-405b-instruct The API playground is a great way to get a feel for what a model can do, and provides copyable code snippets in a variety of languages to help you get started. Running Llama 3.1 with JavaScript You can run Llama 3.1 with…
641dTutorial#llama#open-source
652d ago
Replicate Intelligence #7
Replicate Intelligence #7 Welcome to Replicate’s weekly bulletin! Each week, we’ll bring you updates on the latest open-source AI models, tools, and research. People are making cool stuff and we want to share it with you. Without further ado, here’s our hacker-in-residence deepfates with an unfiltered take on the week in AI. Editor’s note Data. Everybody’s talking about it. Where do you get it? Will there be enough? And most importantly, how soon will the lawyers show up? The cure: synthetic data. Use that questionable internet scrape to create a rock-solid set of (image,caption) or (question,answer) pairs, expand your total data by a factor of 10, and delete the evidence (allegedly! what do i know). But this doesn’t just apply to raw material. We need more data than has ever been created. We need preference data: is this image syrupy…
652dInfra
666d ago
Replicate Intelligence #6
Replicate Intelligence #6 Welcome to Replicate’s weekly bulletin! Each week, we’ll bring you updates on the latest open-source AI models, tools, and research. People are making cool stuff and we want to share it with you. Without further ado, here’s our hacker-in-residence deepfates with an unfiltered take on the week in AI. Editor’s note It’s been a long week for me and I have many more busy days before I can actually catch up on everything. Forgive me for sending you such a short letter. I couldn’t bear to send nothing at all. --- deepfates Trending models New language models from Google The new Gemma2 models were released in 9b and 27b sizes. They’re overtrained on tokens, as seems to be the trend since Llama3 at least. They’re also distilled from larger Gemini models? And everyone’s talking about the alternating…
666dResearch#benchmark
673d ago
Replicate Intelligence #5
Replicate Intelligence #5 Welcome to Replicate’s weekly bulletin! Each week, we’ll bring you updates on the latest open-source AI models, tools, and research. People are making cool stuff and we want to share it with you. Without further ado, here’s our hacker-in-residence deepfates with an unfiltered take on the week in AI. Editor’s note There are some weeks where it seems like open-source AI will never catch up. The powerful models everyone’s talking about are locked behind $20 subscription, if they’re released to the public at all. Open-weight models close the capability gap after a year or so. Actual open-source models, with weights, dataset, training code and inference code? Forget about it. Don’t be discouraged, though. Sure, the megacorps are training multimodal world models in datacenters the size of cities. Maybe they really will build a god in there. I’m…
673dInfra#coding
676d ago
How to get the best results from Stable Diffusion 3
How to get the best results from Stable Diffusion 3 Stability AI recently released the weights for Stable Diffusion 3 Medium, a 2 billion parameter text-to-image model that excels at photorealism, typography, and prompt following. You can run the official Stable Diffusion 3 model on Replicate, and it is available for commercial use. We have also open-sourced our Diffusers and ComfyUI implementations (read our guide to ComfyUI). In this blog post we’ll show you how to use Stable Diffusion 3 (SD3) to get the best images, including how to prompt SD3, which is a bit different from previous Stable Diffusion models. To help you experiment, we’ve created an SD3 explorer model that exposes all of the settings we discuss here. Picking an SD3 version Stability AI have packaged up SD3 Medium in different ways to make sure it can run…
676dTutorial
676d ago
Run Stable Diffusion 3 on your Apple Silicon Mac
Run Stable Diffusion 3 on your Apple Silicon Mac Stable Diffusion 3 (SD3) is the newest version of the open-source AI model that turns text into images. You can run it locally on your Apple Silicon Mac and start making stunning pictures in minutes. Watch this video to see it in action: Prerequisites - A Mac with an M-series Apple Silicon chip - Git Clone the repository and set up the environment Run this to clone the SD3 code repository: git clone https://github.com/zsxkib/sd3-on-apple-silicon.git cd sd3-on-apple-silicon Then, create a new virtual environment with the packages SD3 needs: python3 -m venv sd3-env source sd3-env/bin/activate pip install -r requirements.txt Run it! Now, you can generate your first SD3 image: python sd3-on-mps.py The first run will download the SD3 model and weights, which are around 15.54 GB. Subsequent runs will use the downloaded files.…
676dTutorial
680d ago
Push a custom version of Stable Diffusion 3
Push a custom version of Stable Diffusion 3 When the first version of Stable Diffusion appeared back in August 2022, there was an explosion of innovation around it. In a matter of days after its release, the open-source community created lots of compelling derivatives, like the material diffusion models which generated tiling images, the inpainting models that gave you an AI paintbrush, and animation models that could interpolate between prompts. Fast forward nearly two years, and Stable Diffusion 3 (SD3) just came out this week. People are hyped about the capabilities of this new model, and we expect to see a lot of the same open-source innovation we saw in Stable Diffusion’s early days. In this post, we’ll show you how to get your own custom version of Stable Diffusion 3 running on Replicate. Prerequisites To follow this guide, you’ll…
680dInfra#multimodal
680d ago
Replicate Intelligence #4
Replicate Intelligence #4 Welcome to Replicate’s weekly bulletin! Each week, we’ll bring you updates on the latest open-source AI models, tools, and research. People are making cool stuff and we want to share it with you. Without further ado, here’s our hacker-in-residence deepfates with an unfiltered take on the week in AI. Editor’s note The big open source AI news this week is the release of Stable Diffusion 3 Medium. People are already doing cool things with it, but public reaction has been mixed. On a personal note, I got banned from X Dot Com. Apparently it is against the rules to change your profile picture to the old Twitter logo and announcing “WE ARE SO BACK”. Anyway, here’s some things that caught my eye this week. Find me on Bluesky, I guess. --- deepfates Trending models Stable Diffusion 3…
680dHardware#gpu
680d ago
Run Stable Diffusion 3 on your own machine with ComfyUI
Run Stable Diffusion 3 on your own machine with ComfyUI Stable Diffusion 3 (SD3) just dropped and you can run it in the cloud on Replicate, but it’s also possible to run it locally using ComfyUI right from your own GPU-equipped machine. What is Stable Diffusion 3? Stable Diffusion 3 (SD3) is the latest version of Stability AI’s tool that creates images from text. It’s faster and makes better images than older versions. SD3 is really good at making complex scenes with lots of details and clear text from the prompt. What is ComfyUI? ComfyUI is a graphical user interface (GUI) for Stable Diffusion models like SD3. It lets you connect different AI models (called nodes) together to create custom images, just like connecting Lego blocks. You don’t need to know any code to use it. ComfyUI makes it fun…
680dHardware
682d ago
H100s are coming to Replicate
H100s are coming to Replicate Posted June 12, 2024 by We make it easy to run machine learning models on many different types of hardware include NVIDIA T4, A40, and A100 GPUs, as well as CPUs. Soon we’ll be adding support for NVIDIA’s H100 GPUs, which are even more powerful. If you’re interested in getting early access to H100s, email us: support@replicate.com
682dHardware#training#gpu
682d ago
Run Stable Diffusion 3 with an API
Run Stable Diffusion 3 with an API Stable Diffusion 3 is the latest text-to-image model from Stability . It has greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency. With Replicate, you can run Stable Diffusion 3 with one line of code. Try Stable Diffusion 3 in our API playground Before you dive in, try Stable Diffusion 3 in our API playground. Try tweaking the prompt and see how Stable Diffusion 3 responds. Most models on Replicate have an interactive API playground like this, available on the model page: https://replicate.com/stability-ai/stable-diffusion-3 The API playground is a great way to get a feel for what a model can do, and provides copyable code snippets in a variety of languages to help you get started. Running Stable Diffusion 3 with JavaScript You can run Stable Diffusion 3 with our official…
682dTutorial
687d ago
Replicate Intelligence #3
Replicate Intelligence #3 Welcome to Replicate’s weekly bulletin! Each week, we’ll bring you updates on the latest open-source AI models, tools, and research. People are making cool stuff and we want to share it with you. Without further ado, here’s our hacker-in-residence deepfates with an unfiltered take on the week in AI. Editor’s note Hey everyone. Thanks for your lovely emails last week, and thanks especially to the reader who responded “谢谢你的来信,我已收到。” What a thoughtful sentiment, and such prompt response! I know I promised you a signup form this week. Well, it turns out that is harder to do in an email than you would think. For now: if you’re subscribed to this, forward it to someone you know. If someone forwarded this to you, go sign up for Replicate and click the “hear about stuff from Replicate” checkbox next…
687dTutorial#llama#multimodal
694d ago
Replicate Intelligence #2
Replicate Intelligence #2 Welcome to Replicate’s weekly bulletin! Each week, we’ll bring you updates on the latest open-source AI models, tools, and research. People are making cool stuff and we want to share it with you. Without further ado, here’s our hacker-in-residence deepfates with an unfiltered take on the week in AI. Editor’s note Hey everyone! It was so nice to read all of your very positive feedback last week. I definitely didn’t mess up the reply-to address, thus dooming your well-meaning advice to unreadable limbo. That would be ridiculous. I’ve made all the changes you requested. The letter is now shorter, more informative, easier to read, less dumbed down, more focused, general interest, new, improved and completely reinvented. I have spared no expense to make it exactly what I’m sure you asked for. That said, if you have any…
694dInfra#multimodal
701d ago
Replicate Intelligence #1
Replicate Intelligence #1 Welcome to Replicate’s weekly bulletin! Each week, we’ll bring you updates on the latest open-source AI models, tools, and research. People are making cool stuff and we want to share it with you. Without further ado, here’s our hacker-in-residence deepfates with an unfiltered take on the week in AI. Editor’s note Hey, it’s me, deepfates from the internet. They gave me the keys to the mailing list and said I can experiment with it. Here’s my theory: You’re trying to do cool things with AI. You don’t need any more news bites about the closed AI platforms. Everyone’s already talking about them. You want to know about new and cool things in AI, the DIY hacker stuff that you might have missed while building. Well, at Replicate it’s our job to know. So I’m tapping the deep…
702d ago
Shared network vulnerability disclosure
Shared network vulnerability disclosure This post shares details of a security vulnerability disclosed to us in January 2024 by our friends at Wiz, a cloud security company. Their findings revealed that our infrastructure could have allowed a malicious model to access sensitive data. We took their report seriously, and deployed a full mitigation within 24 hours of speaking with Wiz (just over two weeks after their initial disclosure). We have since deployed additional mitigations for the issue and are now encrypting all internal traffic and restricting privileged network access for all model containers. During our investigation and mitigation, we found no evidence that this vulnerability was exploited. Read on to learn more about the details of the vulnerability and the steps we are taking to keep Replicate secure. Running models safely in production At Replicate, our job is to make…
702dTutorial
732d ago
Run Snowflake Arctic with an API
Run Snowflake Arctic with an API Snowflake Arctic is a new open-source language model from Snowflake. Arctic is on-par or better than both Llama 3 8B and Llama 2 70B on all metrics while using less than half of the training compute budget. It’s massive. At 480B, Arctic is the biggest open-source model to date. As expected from a model from Snowflake, it’s good at SQL and other coding tasks, and it has a liberal Apache 2.0 license. With Replicate, you can run Arctic in the cloud with one line of code. Try Arctic in our API playground Before you dive in, try Arctic in our API playground. Try tweaking the prompt and see how Arctic responds. Most models on Replicate have an interactive API playground like this, available on the model page: https://replicate.com/snowflake/snowflake-arctic-instruct The API playground is a great…
732dTutorial#open-source
737d ago
Run Meta Llama 3 with an API
Run Meta Llama 3 with an API Llama 3 is the latest language model from Meta. It has state of the art performance and a context window of 8000 tokens, double Llama 2’s context window. With Replicate, you can run Llama 3 in the cloud with one line of code. Try Llama 3 in our API playground Before you dive in, try Llama 3 in our API playground. Try tweaking the prompt and see how Llama 3 responds. Most models on Replicate have an interactive API playground like this, available on the model page: https://replicate.com/meta/meta-llama-3-70b-instruct The API playground is a great way to get a feel for what a model can do, and provides copyable code snippets in a variety of languages to help you get started. Running Llama 3 with JavaScript You can run Llama 3 with our official…
737dTutorial#llama
816d ago
Run Code Llama 70B with an API
Run Code Llama 70B with an API Code Llama is a code generation model built on top of Llama 2. It can generate code and natural language about code in many programming languages, including Python, JavaScript, TypeScript, C++, Java, PHP, C#, Bash and more. Today, Meta announced a more powerful new version of Code Llama with 70 billion parameters. It’s one of the highest performing open models. Meta reports a 67.8 on HumanEval, which beats zero-shot GPT-4. With Replicate, you can run Code Llama 70B in the cloud with one line of code. Contents - Contents - Code Llama 70B variants - Run Code Llama 70B with JavaScript - Run Code Llama 70B with Python - Run Code Llama 70B with cURL - Keep up to speed Code Llama 70B variants There are three variants of Code Llama 70B. The…
871d ago
How to create an AI narrator for your life
How to create an AI narrator for your life A couple of weeks ago, Sir David Attenborough watched me drink a cup of water. David Attenborough is now narrating my life — Charlie Holtz (@charlieholtz) November 15, 2023 Here's a GPT-4-vision + @elevenlabsio python script so you can star in your own Planet Earth: pic.twitter.com/desTwTM7RS Or at least, a clone of him did. I recorded the video in a library on a whim, congested and with a bunch of background noise, and it went viral. It hit the top of Hacker News, Business Insider and Ars Technica wrote about it, and the nearly 4 million people watched an AI David Attenborough describe my blue shirt as part of my “mating display.” You might be surprised (I am constantly) by all the things you can build now. I’ve experimented with building…
871dTutorial
871d ago
Clone your voice using open-source models
Clone your voice using open-source models Realistic Voice Cloning (RVC) is a voice-to-voice model that can transform any input voice into a target voice. Here’s an example of Morgan Freeman as Hannibal Lecter: You can try it out with some pre-trained voices here. In this blog post we’ll show you how to create your own RVC voice model on whatever voice you want. We’ll create a dataset, tune the model, then make some examples, all using Replicate. At a high level, the process is: - Create a training dataset: Use the zsxkib/create-rvc-dataset model to generate a dataset of speech audio files from a YouTube video URL. - Train your voice model: Use the replicate/train-rvc-model model to create a fine-tuned RVC model based on your dataset. - Run inference: Finally, user the zsxkib/realistic-voice-cloning model to create new speech audio (or even…
872d ago
Businesses are building on open-source AI
Businesses are building on open-source AI To cut to the chase: we’ve raised a $40 million Series B led by a16z. Let me explain why. Last year, Stable Diffusion was released. It was an open-source image generation model that caught the imagination of tinkerers. An explosion of forks were created: inpainting, animation, texture generation, fine-tunes. At the start, it felt like a toy. It was just people tinkering around and seeing what was possible. But soon, side projects started to turn into real products. Indie hackers like Pieter Levels and Danny Postma made apps that generate profile pictures, redecorate your house, and create professional headshots. They’re now real businesses making over $1 million annual revenue as solo developers. The growth in tinkering and building since then has been astonishing. In the last year and a half, 2 million people have…
872dOpen Source#open-source
884d ago
How to run Yi chat models with an API
How to run Yi chat models with an API Posted November 23, 2023 by The Yi series models are large language models trained from scratch by developers at 01.AI. Today, they’ve released two new models: Yi-6B-Chat and Yi-34B-Chat. These models extend the base models, Yi-6B and Yi-34B, and are fine-tuned for chat completion. Yi-34B currently holds the state-of-the-art for most benchmarks, beating larger models like Llama-70B. How to run Yi-34B-Chat with an API Yi-34B-Chat is on Replicate and you can run it in the cloud with a few lines of code. You can run it with our JavaScript client: import Replicate from "replicate"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, }); const output = await replicate.run( "01-ai/yi-34b-chat:914692bbe8a8e2b91a4e44203e70d170c9c5ccc1359b283c84b0ec8d47819a46", { input: { prompt: "Write a poem about Parmigiano Reggiano.", }, } ); Or, our Python client: import replicate output = replicate.run( "01-ai/yi-34b-chat:914692bbe8a8e2b91a4e44203e70d170c9c5ccc1359b283c84b0ec8d47819a46",…
884dTutorial#coding
885d ago
Scaffold Replicate apps with one command
Replicate Blog Scaffold Replicate apps with one command Posted November 22, 2023 by jakedahn mattrothenberg (This is a stub for the replicate-scaffold.html page.) Next: Using open-source models for faster and cheaper text embeddings
885dInfra
897d ago
Using open-source models for faster and cheaper text embeddings
Using open-source models for faster and cheaper text embeddings Embeddings are a powerful tool for working with text. By “embedding” text into vectors, you encode its meaning into a representation that can more easily be used for tasks like semantic search, clustering, and classification. If you’re new to embeddings, check out this awesome introduction by Simon Willison to get up to speed. These days, embeddings are being used for even more interesting applications like Retrieval Augmented Generation, which uses semantic search over embeddings to improve the quality of responses from language models. In this guide, we’ll see how to use the BAAI/bge-large-en-v1.5 model on Replicate to generate text embeddings. The “BAAI General Embedding” (BGE) suite of models, released by the Beijing Academy of Artificial Intelligence (BAAI), are open source and available on the Hugging Face Hub. As of October 2023,…
899d ago
Generate music from chord progressions and text prompts with MusicGen-Chord
Generate music from chord progressions and text prompts with MusicGen-Chord MusicGen-Chord is a model that generates music in any style based on a text prompt, a chord progression, and a tempo. It is based on Meta’s MusicGen model, where we have changed the melody input to accept chords as text or audio. For example, here are three different generations, each with the repeating chord progression “F:maj7 G E:min A:min”: ”90s euro trance, uplifting, ibiza” in 140bpm “british jangle pop, the smiths, 1980s” in 113 bpm: “sacred chamber choir, choral”, based on a fine-tuned model: How does it work? MusicGen-Chord is built on top of Meta’s MusicGen-Melody model. MusicGen-Melody conditions the generation on both a text prompt and an audio file. In MusicGen-Melody, the audio file is source separated to remove drums and bass, and the melody is extracted by picking…
913d ago
Generate images in one second on your Mac using a latent consistency model
Generate images in one second on your Mac using a latent consistency model Latent consistency models (LCMs) are based on Stable Diffusion, but they can generate images much faster, needing only 4 to 8 steps for a good image (compared to 25 to 50 steps). By running an LCM on your M1 or M2 Mac you can generate 512x512 images at a rate of one per second. Simian Luo et al released the first Stable Diffusion distilled model. It’s distilled from the Dreamshaper fine-tune by incorporating classifier-free guidance into the model’s input. Only one model has been distilled so far, but more will be released. Stable Diffusion 2.1 and SDXL are being worked on by the paper authors. You can run the first latent consistency model in the cloud on Replicate, but it’s also possible to run it locally. As…
913dTutorial
925d ago
Fine-tune MusicGen to generate music in any style
Fine-tune MusicGen to generate music in any style Fine-tune MusicGen if you want to generate music in a certain style. Whether that’s 16bit video game chip-tunes, or the calmness of something choral. A full model training takes 15 minutes using 8x A40 (Large) hardware. You can run your fine-tuned model from the web or using the cloud API, or you can download the fine-tuned model weights for use in other contexts. The fine-tune process was developed by Jongmin Jung (aka. sake). It’s based on Meta’s AudioCraft and their built-in trainer Dora. To make training simple Sake has included automatic audio chunking, auto-labeling, and vocal removal features. Your trained model can also generate music longer than 30 seconds. Here is an example of a choral fine-tune combined with a 16bit video game (as a continuation): Prepare your music dataset Just a…
929d ago
How to use retrieval augmented generation with ChromaDB and Mistral
How to use retrieval augmented generation with ChromaDB and Mistral Over the last few months, Retrieval Augmented Generation (RAG) has emerged as a popular technique for getting the most out of Large Language Models (LLMs) like Llama-2-70b-chat. In this post, we’ll explore the creation of an example RAG “app” which helps you generate click-worthy titles for Hacker News submissions. All you need to do is provide a working title, idea, or phrase, and even the most boring of words will be transformed into a title destined for the front page of Hacker News. Admittedly, this is a basic toy idea. It’s not revolutionary, and it may not land your post on the front page of Hacker News. That’s okay, because that’s not the point: the point is to provide you with a practical hands-on feel for how RAG works, and…
932d ago
How to run Mistral 7B with an API
How to run Mistral 7B with an API Mistral 7B is a new open-source language model from Mistral AI that outperforms not just all other 7 billion parameter language models, but also Llama 2 13B and sometimes even the original Llama 34B. It approaches CodeLlama 7B performance on coding tasks. There’s also Mistral 7B Instruct, a model fine-tuned for chat completions. It’s comparable to Llama 2 13B fine-tuned for chat. @a16z-infra pushed Mistral 7B and Mistral 7B Instruct to Replicate. Let’s take a look at what makes Mistral 7B stand out, then we’ll show you how to run it with an API. It has more recent training data Mistral 7B’s training data cutoff was sometime in 2023, so it knows about things that happened this year. Keep in mind that Mistral is just a language model so it’s prone to…
934d ago
Make smooth AI generated videos with AnimateDiff and an interpolator
Make smooth AI generated videos with AnimateDiff and an interpolator In this blog post we’ll show you how to combine AnimateDiff and the ST-MFNet frame interpolator to create smooth and realistic videos from a text prompt. You can also specify camera movements using new controls. You’ll go from a text prompt to a video, to a high-framerate video. Create animations with AnimateDiff AnimateDiff is a model that enhances existing text-to-image models by adding a motion modeling module. The motion module is trained on video clips to capture realistic motion dynamics. It allows Stable Diffusion text-to-image models to create animated outputs, ranging from anime to realistic photographs. You can try AnimateDiff on Replicate. Control camera movement LoRAs provide an efficient way to speed up the fine-tuning process of big models without using much memory. They are most well known for Stable…
948d ago
Jet-setting with Llama 2 + Grammars
Jet-setting with Llama 2 + Grammars Llamas may be docile by nature, but they have a stubborn streak. Push them too far, and they’re liable to spit out something foul and unpleasant. True to their real-life counterpart, it can be challenging to get Meta’s Llama 2 to do exactly what you want. Which is fine for some generation tasks, but problematic for anything requiring syntactic perfection. Prompt engineering, few-shot examples, and fine-tuning can all help massage output into a desired shape. But grammars are the only sure-fire way to get exactly what you want, every time. In this post, we’ll explore a family of Llama 2 models with built-in support for grammars, and show how you can use it for information extraction tasks. Last month, Replicate hosted its first hackathon in San Francisco. It was lovely. I had a great…
948dTutorial#llama
962d ago
Fine-tuned models now boot in less than one second
Fine-tuned models now boot in less than one second Posted September 6, 2023 by You can fine-tune language models like Llama 2 or image models like SDXL with your own data on Replicate. If you don’t make any requests to your fine-tuned model for a while, it can take some time to start again. This is called a cold boot, and can be as slow as a few minutes for large models. We’ve made some dramatic improvements to cold boots for fine-tuned models. They now boot in less than one second. It works on these models: - meta/llama-2-7b-chat - meta/llama-2-13b-chat - meta/llama-2-70b-chat - meta/llama-2-7b - meta/llama-2-13b - meta/llama-2-70b - stability-ai/sdxl For now, it’s available only for new fine-tuned models created starting today. We’re also working on a more cold boot improvements for all models. Stay tuned. To get started, check…
962dModel#fine-tuning
983d ago
We're cutting our prices in half
We're cutting our prices in half Here’s what’s changing: - We’re cutting the per-second price of public models in half. This is going to be applied to all your usage this month, on all public models, from SDXL to Llama 2, and requires no action on your part. Wahey! 🎉 - Soon, we’ll be cutting the per-second price of private models in half, but we’ll also start charging for setup and idle time. This change will just be for new users. For existing users, this change is opt-in, so you’re not going to pay more. Here are the prices: What’s happening to private models? When you run a model, it is running on a GPU instance. It takes a bit of time to start up the model, then your prediction runs, then we keep the instance idle for a bit…
983dHardware#llama
985d ago
A guide to prompting Llama 2
A guide to prompting Llama 2 Prompting large language models like Llama 2 is an art and a science. In this post we’re going to cover everything I’ve learned while exploring Llama 2, including how to format chat prompts, when to use which Llama variant, when to use ChatGPT over Llama, how system prompts work, and some tips and tricks. There’s still much to be learned, but you should leave this post with a better understanding of how to be a Llama whisperer. 💡 Want to try an interactive version of this post? Check out our colab version. Contents - System Prompts - How to Format Chat Prompts - 7B v 13B v 70B - Prompting Tips - What is Llama 2 better at than ChatGPT? - In Conclusion - What’s next? System Prompts 💡 A system_prompt is text that…
985dTutorial#llama
985d ago
Streaming output for language models
Streaming output for language models You know when you’re using ChatGPT or Vercel’s AI playground and it returns an animated response, rendered word by word? That’s not just a dramatic visual effect to make it look like there’s a robot typing on the other side of the conversation. That’s actually the language model generating tokens one at a time, and streaming them back to you while it’s running. Replicate already provides ways for you to receive incremental updates as your predictions are running, through polling and webhooks. But those aren’t always the most efficient methods to get updates from a running model. When you’re building something like a chat app, what you really need is a live-updating event stream. Replicate’s API now supports server-sent event streams for language models. This lets you update your app live, as the model is…
985dTutorial
991d ago
Fine-tune SDXL with your own images
Fine-tune SDXL with your own images Stability AI recently open-sourced SDXL, the newest and most powerful version of Stable Diffusion yet. Replicate was ready from day one with a hosted version of SDXL that you can run from the web or using our cloud API. Today, we’re following up to announce fine-tuning support for SDXL 1.0. Fine-tuning allows you to train SDXL on a particular object or style, and create a new model that generates images of those objects or styles. For example, we fine-tuned SDXL on images from the Barbie movie and our colleague Zeke. There are multiple ways to fine-tune SDXL, such as Dreambooth, LoRA diffusion (Originally for LLMs), and Textual Inversion. We’ve got all of these covered for SDXL 1.0. In this post, we’ll show you how to fine-tune SDXL on your own images with one line…
1003d ago
Run Llama 2 with an API
Run Llama 2 with an API Llama 2 is a language model from Meta AI. It’s the first open source language model of the same caliber as OpenAI’s models. With Replicate, you can run Llama 2 in the cloud with one line of code. Contents - Contents - Running Llama 2 with JavaScript - Running Llama 2 with Python - Running Llama 2 with cURL - Choosing which model to use - Example chat app - Fine-tune Llama 2 - Run Llama 2 locally - Keep up to speed Running Llama 2 with JavaScript You can run Llama 2 with our official JavaScript client: import Replicate from "replicate"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, }); const input = { prompt: "Write a poem about open source machine learning in the style of Mary Oliver.", }; for await (const event…
1003dTutorial#llama#open-source
1004d ago
Run SDXL with an API
Run SDXL with an API SDXL 1.0 is a new text-to-image model by Stability AI. Stable Diffusion XL lets you create better, bigger pictures, with faces that look more real. You can add clear, readable words to your images and make great-looking art with just short prompts. Like Stable Diffusion 1.5 and 2.1, SDXL is open source. You can modify it, build things with it and use it commercially. Replicate lets you run generative AI models, like SDXL, from your own code, without having to set up any infrastructure. What you can do You can use the SDXL model on Replicate to: - make images from your prompts - make an image from another image (img2img) - inpaint using a mask - use a refiner to add fine-detail to your images Use a client library for Replicate We maintain official…
1004dTutorial
1008d ago
A comprehensive guide to running Llama 2 locally
A comprehensive guide to running Llama 2 locally We’ve been talking a lot about how to run and fine-tune Llama 2 on Replicate. But you can also run Llama locally on your M1/M2 Mac, on Windows, on Linux, or even your phone. The cool thing about running Llama 2 locally is that you don’t even need an internet connection. Here’s an example using a locally-running Llama 2 to whip up a website about why llamas are cool: It’s only been a couple days since Llama 2 was released, but there are already a handful of techniques for running it locally. In this blog post we’ll cover three open-source tools you can use to run Llama 2 on your own devices: - Llama.cpp (Mac/Windows/Linux) - Ollama (Mac) - MLC LLM (iOS/Android) Llama.cpp (Mac/Windows/Linux) Llama.cpp is a port of Llama in C/C++,…
1008dTutorial#llama
1010d ago
Fine-tune Llama 2 on Replicate
Fine-tune Llama 2 on Replicate Llama 2 is the first open-source language model of the same caliber as OpenAI’s models, and because it’s open source you can hack it to do new things that aren’t possible with GPT-4. Like become a better poet. Talk like Homer Simpson. Write Midjourney prompts. Or replace your best friends. One of the main reasons to fine-tune models is so you can use a small model do a task that would normally require a large model. This means you can do the same task, but cheaper and faster. For example, the 7 billion parameter Llama 2 model is not good at summarizing text, but we can teach it how. In this guide, we’ll show you how to create a text summarizer. We’ll be using Llama 2 7B, an open-source large language model from Meta and…
1011d ago
What happened with Llama 2 in the last 24 hours? 🦙
What happened with Llama 2 in the last 24 hours? 🦙 Meta AI just released version 2 of their open-source Llama language model. This new version was trained on more data (2 trillion tokens), supports longer context length (4096 tokens), and has a more permissive license than v1 which allows for commercial use. Here’s a list of developments from the last day since Llama 2 was released: - Llama2 chatbot – An open-source demo application built by the infra team at A16Z and powered by Streamlit, Replicate, and Fly.io. - Llama 2 7B - a 7-billion parameter version, fine tuned for chat completions, running on Replicate. Smaller and faster than the 13B and 70B versions. - Llama 2 13B - a 13-billion parameter version, fine tuned for chat completions, running on Replicate. - Llama 2 70B - a 70-billion parameter…
1011dOpen Source#llama#open-source
1065d ago
Make any large language model a better poet
Make any large language model a better poet In this post, we discuss a version of Vicuna-13B that we just released called Poet Vicuna-13B. This model is part of an early-stage project focused on enhancing open source large language models. Poet Vicuna-13B is an implementation of Vicuna-13B that is modified to generate poems and lyrics with specific syllabic patterns. You can use it to rewrite Twinkle, Twinkle Little Star or generate modernist poems with lots of beautiful white space sprinkled with lines of lengths that you choose. For example, we asked Poet Vicuna-13B to write an eight line poem with these syllable counts [3, 3, 0, 2, 0, 5, 0, 3, 2, 1, 0, 4] in response to this prompt: Write a poem that explores the concept of time and its impact on our lives—nostalgia, the passage of time, the…
1065dOpen Source#open-source