$ timeahead_
all sourcesAhead of AI (Sebastian Raschka)Anthropic NewsApple Machine Learning ResearchArs Technica AIAWS Machine Learning BlogCerebras BlogCohere BlogCrewAI BlogDeepSeek BlogDistill.pubfast.ai BlogFireworks AI BlogGoogle AI BlogGoogle Cloud AI BlogGoogle DeepMind BlogGroq BlogHaystack (deepset) BlogHugging Face BlogImport AI (Jack Clark)LangChain BlogLangFuse BlogLil'Log (Lilian Weng)LlamaIndex BlogMeta AI BlogMicrosoft AutoGen BlogMicrosoft Research BlogMistral AI NewsMIT Technology ReviewModal Blogn8n BlogNathan Lambert (RLHF)NVIDIA Developer BlogOllama BlogOpenAI BlogPerplexity AI BlogPyTorch BlogReplicate BlogSimon Willison BlogTensorFlow BlogThe Batch (DeepLearning.AI)The GradientThe Verge AITogether AI BlogVentureBeat AIvLLM BlogWeights & Biases BlogWired AIxAI (Grok) Blog
allapiagentsframeworkshardwareinframodelopen sourcereleaseresearchtutorial
★ TOP STORY[ OLL ]Hardware·26d ago

Ollama is now powered by MLX on Apple Silicon in preview March 30, 2026 Today, we're previewing the fastest way to run Ollama on Apple silicon, powered by MLX, Apple's machine learning framework.

Ollama is now powered by MLX on Apple Silicon in preview March 30, 2026 Today, we’re previewing the fastest way to run Ollama on Apple silicon, powered by MLX, Apple’s machine learning framework. This unlocks new performance to accelerate your most demanding work on macOS: - Personal assistants like OpenClaw - Coding agents like Claude Code, OpenCode, or Codex Accelerate coding agents like Pi or Claude Code OpenClaw now responds much faster Fastest performance on Apple silicon, powered by MLX Ollama on Apple silicon is now built on top of Apple’s machine learning framework, MLX, to take advantage of its unified memory architecture. This results in a large speedup of Ollama on all Apple Silicon devices. On Apple’s M5, M5 Pro and M5 Max chips, Ollama leverages the new GPU Neural Accelerators to accelerate both time to first token (TTFT)…

Ollama Blogread →
▲ trending · last 48hview all →
[OLL]Ollama Blog· 47 articlesvisit →
61d ago
The simplest and fastest way to setup OpenClaw February 23, 2026 Setup OpenClaw in under two minutes with a single Ollama command.
The simplest and fastest way to setup OpenClaw February 23, 2026 OpenClaw is a personal AI assistant that can clear your inbox, send emails, manage your calendar, and complete other tasks via messaging apps like WhatsApp, Telegram, iMessage, or any chat app you already use. It all runs on your own hardware, and with Ollama 0.17, it’s now a single command to get started. What you’ll need - Ollama 0.17 or later - Node.js (npm is used to install OpenClaw) - Mac or Linux system (Windows users can install OpenClaw via WSL - Windows Subsystem for Linux) Step 1: Run the command Open a terminal, and type in: ollama launch openclaw --model kimi-k2.5:cloud Note: Other models can also be configured. See ollama launch openclaw for recommended models. Ollama handles everything from here. Step 2: Install OpenClaw If OpenClaw isn’t already…
61dModel#llama
68d ago
Subagents and web search in Claude Code February 16, 2026 Ollama now supports subagents and web search in Claude Code.
Subagents and web search in Claude Code February 16, 2026 Ollama now supports subagents and web search in Claude Code. No MCP servers or API keys required. Get started ollama launch claude --model minimax-m2.5:cloud It works with any model on Ollama’s cloud. Subagents Subagents can run tasks in parallel, such as file search, code exploration, and research, each in their own context. Longer coding sessions stay productive. Side tasks don’t fill the context with noise. Some models will naturally trigger subagents when needed (minimax-m2.5, glm-5, kimi-k2.5), but you can force triggering subagents by telling the model to “use/spawn/create subagents” Example prompts: > spawn subagents to explore the auth flow, payment integration, and notification system > audit security issues, find performance bottlenecks, and check accessibility in parallel with subagents > create subagents to map the database queries, trace the API routes,…
83d ago
OpenClaw February 1, 2026 OpenClaw is a personal AI assistant that connects your messaging apps to local AI coding agents, all running on your own device.
OpenClaw February 1, 2026 OpenClaw is a personal AI assistant that bridges your favorite messaging platforms to AI coding agents through a centralized gateway. It runs locally on your own devices, keeping your conversations and code private. OpenClaw integrates with WhatsApp, Telegram, Slack, Discord, iMessage, and other messaging services, allowing you to interact with AI coding agents from anywhere. Get started Start by installing OpenClaw curl -fsSL https://openclaw.ai/install.sh | bash Windows iwr -useb https://openclaw.ai/install.ps1 | iex Running with Ollama Once installed, you can launch OpenClaw directly with Ollama to connect local/cloud models: ollama launch openclaw If you want to configure OpenClaw without immediately starting the service: ollama launch openclaw --config The gateway will auto-reload if it’s already running. ## Recommended models OpenClaw requires a larger context length to complete tasks. It is recommended to use a context length of at…
92d ago
ollama launch January 23, 2026 ollama launch is a new command which sets up and runs coding tools like Claude Code, OpenCode, and Codex with local or cloud models. No environment variables or config files needed.
ollama launch January 23, 2026 ollama launch is a new command which sets up and runs your favorite coding tools like Claude Code, OpenCode, and Codex with local or cloud models. No environment variables or config files needed. Get started Download Ollama v0.15+, then open a terminal and run: # ~23 GB VRAM required with 64000 tokens context length ollama pull glm-4.7-flash # or use a cloud model (with full context length) ollama pull glm-4.7:cloud One command setup Claude Code: ollama launch claude OpenCode: ollama launch opencode This will guide you to select models and launch your chosen integration. No environment variables or config files needed. Supported integrations - Claude Code - OpenCode - Codex - Droid Recommended models for coding Note: Coding tools work best with a full context length. Update the context length in Ollama’s settings to at…
95d ago
Image generation (experimental) January 20, 2026 Generate images locally with Ollama on macOS. Windows and Linux support coming soon.
Image generation (experimental) January 20, 2026 Ollama now supports image generation on macOS, with Windows and Linux coming soon. ollama run x/z-image-turbo "your prompt" Images save to your current directory. Terminals that support image rendering (Ghostty, iTerm2, etc.) can preview images directly inline. Models Z-Image Turbo ollama run x/z-image-turbo Z-Image Turbo is a 6 billion parameter text-to-image model from Alibaba’s Tongyi Lab. It generates photorealistic images and handles bilingual text rendering in both English and Chinese. - Photorealistic output: Strong at generating realistic photographs, portraits, and scenes - Bilingual text rendering: Accurately renders both English and Chinese text in images - Apache 2.0: Open weights available for commercial use Examples Photorealistic portraits: Young woman in a cozy coffee shop, natural window lighting, wearing a cream knit sweater, holding a ceramic mug, soft bokeh background with warm ambient lights, candid moment,…
99d ago
Claude Code with Anthropic API compatibility January 16, 2026 Ollama is now compatible with the Anthropic Messages API, making it possible to use tools like Claude Code with open models.
Claude Code with Anthropic API compatibility January 16, 2026 Ollama v0.14.0 and later are now compatible with the Anthropic Messages API, making it possible to use tools like Claude Code with open-source models. Run Claude Code with local models on your machine, or connect to cloud models through ollama.com. Using Claude Code with Ollama Claude Code is Anthropic’s agentic coding tool that lives in your terminal. With Anthropic API support, you can now use Claude Code with any Ollama model. Get started Install Claude Code macOS, Linux, WSL: curl -fsSL https://claude.ai/install.sh | bash Windows PowerShell: irm https://claude.ai/install.ps1 | iex Windows CMD: curl -fsSL https://claude.ai/install.cmd -o install.cmd && install.cmd && del install.cmd Connect Ollama Configure environment variables to use Ollama: export ANTHROPIC_AUTH_TOKEN=ollama export ANTHROPIC_BASE_URL=http://localhost:11434 Run Claude Code with an Ollama model: claude --model gpt-oss:20b Models in Ollama’s Cloud also work with…
100d ago
OpenAI Codex with Ollama January 15, 2026 Open models can be used with OpenAI's Codex CLI through Ollama. Codex can read, modify, and execute code in your working directory using models such as gpt-oss:20b, gpt-oss:120b, or other open-weight alternatives.
OpenAI Codex with Ollama January 15, 2026 Open models can be used with OpenAI’s Codex CLI through Ollama. Codex can read, modify, and execute code in your working directory using models such as gpt-oss:20b , gpt-oss:120b , or other open-weight alternatives. Get started Install Codex CLI: npm install -g @openai/codex Start Codex with the --oss flag: codex --oss By default, Codex will use the local gpt-oss:20b model. Note: Codex requires a large context window. We recommend at least 32K tokens. See the documentation for how to adjust context length in Ollama. Changing models You can switch to a different model using the -m flag: codex --oss -m gpt-oss:120b Cloud models All models on Ollama Cloud work with Codex. codex --oss -m gpt-oss:120b-cloud Learn more For more detailed setup instructions and configuration options, see the Codex integration guide.
100dOpen Source#llama#coding
178d ago
OpenAI gpt-oss-safeguard October 29, 2025 Ollama is partnering with OpenAI and ROOST (Robust Open Online Safety Tools) to bring the latest gpt-oss-safeguard reasoning models to users for safety classification tasks. gpt-oss-safeguard models are available in two sizes: 20B and 120B, and are permissively licensed under the Apache 2.0 license.
OpenAI gpt-oss-safeguard October 29, 2025 Ollama is partnering with OpenAI and ROOST (Robust Open Online Safety Tools) to bring the latest gpt-oss-safeguard reasoning models to users for safety classification tasks. gpt-oss-safeguard models are available in two sizes: 20B and 120B, and are permissively licensed under the Apache 2.0 license. Get started - Download Ollama - Open a terminal and run the model: 20B: ollama run gpt-oss-safeguard:20b 120B: ollama run gpt-oss-safeguard:120b Highlights - Trained to reason about safety: Trained and tuned for safety reasoning to accommodate use cases like LLM input-output filtering, online content labeling and offline labeling for Trust and Safety use cases. - Bring your own policy: Interprets your written policy, so it generalizes across products and use cases with minimal engineering. - Reasoned decisions, not just scores: Gain complete access to the model’s reasoning process, facilitating easier debugging…
178dOpen Source#llama#safety
179d ago
MiniMax M2 October 28, 2025 MiniMax M2 is now available on Ollama's cloud. It's a model built for coding and agentic workflows.
MiniMax M2 October 28, 2025 MiniMax M2 is now available on Ollama’s cloud. It’s a model built for coding and agentic workflows. Get Started ollama run minimax-m2:cloud Highlights Superior Intelligence. According to benchmarks from Artificial Analysis, MiniMax-M2 demonstrates highly competitive general intelligence across mathematics, science, instruction following, coding, and agentic tool use. Its composite score ranks #1 among open-source models globally. Advanced Coding. Engineered for end-to-end developer workflows, MiniMax-M2 excels at multi-file edits, coding-run-fix loops, and test-validated repairs. Strong performance on Terminal-Bench and (Multi-)SWE-Bench–style tasks demonstrates practical effectiveness in terminals, IDEs, and CI across languages. Agent Performance. MiniMax-M2 plans and executes complex, long-horizon toolchains across shell, browser, retrieval, and code runners. In BrowseComp-style evaluations, it consistently locates hard-to-surface sources, maintains traceable evidence, and gracefully recovers from flaky steps. Efficient Design. With 10 billion activated parameters (230 billion in total), MiniMax-M2…
184d ago
NVIDIA DGX Spark performance October 23, 2025 We ran performance tests on release day firmware and an updated Ollama version to see how Ollama performs.
NVIDIA DGX Spark performance October 23, 2025 Performance We ran performance tests on release day firmware and an updated Ollama version to see how Ollama performs. The tests were run using the latest NVIDIA DGX Spark firmware (580.95.05) and Ollama v0.12.6. Each test is performed: - 10 times - Temperature set to 0 - Constrained to 500 tokens output - Prompt: “write an in-depth summary of this story: $(head -n200 pg98.txt)” (please see the test script for the book, “A Tale of Two Cities”) - Caching is disabled so repeated tests will not be faster The test script and its readme are made available and can be customized for your own testing. *OpenAI’s gpt-oss models are tested using models officially provided by OpenAI, distributed via Ollama. Some GGUFs distributed online labeled as MXFP4 are further quantized to q8_0 in the…
184dModel#llama#gpu
191d ago
New coding models & integrations October 16, 2025 GLM-4.6 and Qwen3-coder-480B are available on Ollama’s cloud service with easy integrations to the tools you are familiar with. Qwen3-Coder-30B has been updated for faster, more reliable tool calling in Ollama’s new engine.
New coding models & integrations October 16, 2025 GLM-4.6 and Qwen3-coder-480B are available on Ollama’s cloud service with easy integrations to the tools you are familiar with. Qwen3-Coder-30B has been updated for faster, more reliable tool calling in Ollama’s new engine. Get started GLM-4.6 ollama run glm-4.6:cloud Qwen3-Coder-480B ollama run qwen3-coder:480b-cloud For users with more than 300GB of VRAM, qwen3-coder:480b is also available locally. Qwen3-Coder-30B ollama run qwen3-coder:30b Example prompts Create a single-page app in a single HTML file with the following requirements: Name: Ollama's Adventure Goal: Jump over obstacles to survive as long as possible. Features: Increasing speed, high score tracking, retry button, and funny sounds for actions and events. The UI should be colorful, with parallax scrolling backgrounds. The characters should look cartoonish, related to alpacas and be fun to watch. The game should be enjoyable for everyone.…
193d ago
Qwen3-VL October 14, 2025 Ollama now supports Alibaba's Qwen3-VL.
Qwen3-VL October 14, 2025 Qwen3-VL, the most powerful vision language model in the Qwen series is now available on Ollama’s cloud. The models will be made available locally soon. Model capabilities - Visual Agent: Operates PC/mobile GUIs—recognizes elements, understands functions, invokes tools, completes tasks - Visual Coding Boost: Generates Draw.io/HTML/CSS/JS from images/videos - Advanced Spatial Perception: Judges object positions, viewpoints, and occlusions; provides stronger 2D grounding and enables 3D grounding for spatial reasoning and embodied AI - Long Context & Video Understanding: Native 256K context, expandable to 1M; handles books and hours-long video with full recall and second-level indexing - Enhanced Multimodal Reasoning: Excels in STEM/Math—causal analysis and logical, evidence-based answers - Upgraded Visual Recognition: Broader, higher-quality pre-training is able to recognize everything more types of objects—celebrities, anime, products, landmarks, flora/fauna, etc - Expanded OCR: Supports 32 languages (up from…
193dModel#llama#qwen
194d ago
NVIDIA DGX Spark October 13, 2025 The latest NVIDIA DGX Spark is here! Ollama has partnered with NVIDIA to ensure it runs fast and efficiently out-of-the-box.
NVIDIA DGX Spark October 13, 2025 The latest NVIDIA DGX Spark is here! Ollama has partnered with NVIDIA to ensure it runs fast and efficiently out-of-the-box. Powered by the NVIDIA GB10 Grace Blackwell Superchip, the NVIDIA DGX delivers 1 petaFLOP of performance for prototyping and running local language models on Ollama. With 128GB of memory, you can run the latest models from Alibaba (Qwen), DeepSeek, Meta (Llama), Mistral, Google (Gemma), OpenAI (Gpt-oss), and many more from Ollama’s library. You can also upload and bring your own custom or fine-tuned models. We can’t wait to see what you’ll build with the latest NVIDIA DGX Spark! In the meantime, we’re working with NVIDIA to optimize Ollama’s performance and testing it across the use cases we see most often—chat, document processing (retrieval, OCR, modification), code tasks, and multimodal workflows. Learn more about the…
194dModel#llama#gpu
213d ago
Web search September 24, 2025 A new web search API is now available in Ollama. Ollama provides a generous free tier of web searches for individuals to use, and higher rate limits are available via Ollama’s cloud.
Web search September 24, 2025 A new web search API is now available in Ollama. Ollama provides a generous free tier of web searches for individuals to use, and higher rate limits are available via Ollama’s cloud. This web search capability can augment models with the latest information from the web to reduce hallucinations and improve accuracy. Web search is provided as a REST API with deeper tool integrations in Ollama’s Python and JavaScript libraries. This also enables models such as OpenAI’s gpt-oss models to conduct long-running research tasks. Get started Create an API key from your Ollama account. export OLLAMA_API_KEY="your_api_key" cURL curl https://ollama.com/api/web_search \ --header "Authorization: Bearer $OLLAMA_API_KEY" \ -d '{ "query": "what is ollama?" }' Example output { "results": [ { "title": "Ollama", "url": "https://ollama.com/", "content": "Cloud models are now available..." }, { "title": "What is Ollama? Introduction…
213dModel#llama
214d ago
New model scheduling September 23, 2025 Ollama now includes a significantly improved model scheduling system, reducing crashes due to out of memory issues, maximizing GPU utilization and performance, especially on multi-GPU systems.
New model scheduling September 23, 2025 Ollama now includes a significantly improved model scheduling system. Ahead of running a model, Ollama’s new engine will now measure the exact amount of memory required compared to an estimation in previous versions of Ollama. This has several benefits: - Significantly reduced crashes due to out of memory issues: Because memory management is exact, over-allocations no longer occur meaning fewer out of memory issues. - Maximizing GPU utilization: Ollama’s new memory management allocates more memory to the GPU, increasing token generation and processing speeds - Multi-GPU performance: Ollama will now schedule models more efficiently over multiple GPUs, significantly improving multi-GPU and mismatched GPU performance - Accurate reporting: Measurements in tools like nvidia-smi will now matchollama ps making it easy to track memory utilization on your system All models implemented in Ollama’s new engine now…
214dHardware#llama
218d ago
Cloud models September 19, 2025 Cloud models are now in preview, letting you run larger models with fast, datacenter-grade hardware. You can keep using your local tools while running larger models that wouldn’t fit on a personal computer.
Cloud models September 19, 2025 Cloud models are now in preview, letting you run larger models with fast, datacenter-grade hardware. You can keep using your local tools while running larger models that wouldn’t fit on a personal computer. Ollama’s cloud does not retain your data to ensure privacy and security. The same Ollama experience is now seamless across both local and in the cloud, integrating with the existing tools you already use. Ollama’s cloud models also work via Ollama’s OpenAI-compatible API. Get started Download Ollama v0.12, then open a terminal and run a cloud model: ollama run qwen3-coder:480b-cloud Available models qwen3-coder:480b-cloud gpt-oss:120b-cloud gpt-oss:20b-cloud deepseek-v3.1:671b-cloud Usage Cloud models behave like regular models. For example, you can ls , run , pull , and cp them as needed: % ollama ls NAME ID SIZE MODIFIED gpt-oss:120b-cloud 569662207105 - 5 seconds ago gpt-oss:20b-cloud…
218dHardware#local
263d ago
OpenAI gpt-oss August 5, 2025 Ollama partners with OpenAI to bring gpt-oss to Ollama and its community.
OpenAI gpt-oss August 5, 2025 Welcome OpenAI’s gpt-oss! Ollama partners with OpenAI to bring its latest state-of-the-art open weight models to Ollama. The two models, 20B and 120B, bring a whole new local chat experience, and are designed for powerful reasoning, agentic tasks, and versatile developer use cases. Feature highlights - Agentic capabilities: Use the models’ native capabilities for function calling, web browsing (Ollama is providing a built-in web search that can be optionally enabled to augment the model with the latest information), python tool calls, and structured outputs. - Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. - Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs. - Fine-tunable: Fully customize models to your specific use case through…
263dOpen Source#llama
269d ago
Ollama's new app July 30, 2025 Ollama's new app is now available for macOS and Windows.
Ollama's new app July 30, 2025 Ollama’s new app is now available for macOS and Windows. An easier way to chat with models Ollama’s macOS and Windows now include a way to download and chat with models. Chat with files Ollama’s new app supports file drag and drop, making it easier to reason with text or PDFs. For processing large documents, Ollama’s context length can be increased in the settings. Note: this will require more memory. Multimodal support Building on Ollama’s new multimodal engine, images can be sent to models that support them, such as Google DeepMind’s Gemma 3 model. Documentation writing Code files can be processed by models for understanding. Get started Download Ollama’s new app today on macOS and Windows. For pure CLI versions of Ollama, standalone downloads are available on Ollama’s GitHub releases page.
269dModel#llama
326d ago
Secure Minions: private collaboration between Ollama and frontier models June 3, 2025 Secure Minions is a secure protocol built by Stanford's Hazy Research lab to allow encrypted local-remote communication.
Secure Minions: private collaboration between Ollama and frontier models June 3, 2025 Three months ago, Stanford’s Hazy Research lab introduced Minions (ICML 2025), an open-source research project that connects local Ollama models (such as Google’s gemma3:4b) to frontier models in the cloud (such as GPT-4o). In Minions, the raw context stays local and can only be accessed by the local LLM. The frontier model orchestrates local LLMs and aggregates their outputs. By sending fewer tokens to the cloud, the protocol reduces cloud costs by 5x-30x while achieving 98% of frontier model accuracy. Cost savings aside, local-first setups (like Ollama) have a big privacy upside: sensitive context never leaves the device. But in the original Minions protocol, some information still goes to the cloud, and that information can be sensitive. Avanika Narayan and Dan Biderman from Stanford’s Hazy Research lab then…
326dResearch#llama
331d ago
Thinking May 30, 2025 Ollama now has the ability to enable or disable thinking. This gives users the flexibility to choose the model’s thinking behavior for different applications and use cases.
Thinking May 30, 2025 Ollama now has the ability to enable or disable thinking. This gives users the flexibility to choose the model’s thinking behavior for different applications and use cases. When thinking is enabled, the output will separate the model’s thinking from the model’s output. When thinking is disabled, the model will not think and directly output the content. Models that support thinking: - DeepSeek R1 - Qwen 3 - more will be added under thinking models. Thinking in action Enable thinking in DeepSeek R1 In the CLI, thinking is enabled by default. This can be useful in getting the model to think through different viewpoints to arrive at more accurate answer. The model shown is the 8 billion parameter DeepSeek-R1-0528 Qwen 3 distilled model. This video is not sped up. Disable thinking in DeepSeek R1 In the CLI,…
331dModel#llama
333d ago
Streaming responses with tool calling May 28, 2025 Ollama now supports streaming responses with tool calling. This enables all chat applications to stream content and also call tools in real time.
Streaming responses with tool calling May 28, 2025 Ollama now supports streaming responses with tool calling. This enables all chat applications to stream content and also call tools in real time. Models that support using tools: Example of simple tool calling (weather) Example of web search Get started Download the latest version of Ollama cURL An example of Ollama using the weather tool to answer the prompt What is the weather today in Toronto? curl http://localhost:11434/api/chat -d '{ "model": "qwen3", "messages": [ { "role": "user", "content": "What is the weather today in Toronto?" } ], "stream": true, "tools": [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather for a location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The location to get the weather for, e.g. San Francisco, CA" }, "format": { "type":…
333dAgents#llama
345d ago
Ollama's new engine for multimodal models May 15, 2025 Ollama now supports new multimodal models with its new engine.
Ollama's new engine for multimodal models May 15, 2025 Ollama now supports multimodal models via Ollama’s new engine, starting with new vision multimodal models: General Multimodal Understanding & Reasoning Llama 4 Scout ollama run llama4:scout (Note: this is a 109 billion parameter, mixture-of-experts model.) Example of asking location-based questions about a video frame: You can then ask follow-up questions: ollama@ollamas-computer ~ % ollama run llama4:scout >>> what do you see in this image? /Users/ollama/Downloads/multimodal-example1.png Added image '/Users/ollama/Downloads/multimodal-example1.png' The image depicts a scenic waterfront area with a prominent clock tower at its center. The clock tower, likely the Ferry Building Clock Tower in San Francisco, features a tall structure with a clock face on each side and an American flag flying atop. In the foreground, there are several buildings, including what appears to be a marketplace or shopping center, with a…
424d ago
Minions: where local and cloud LLMs meet February 25, 2025 Avanika Narayan, Dan Biderman, and Sabri Eyuboglu from Christopher Ré's Stanford Hazy Research lab, along with Avner May, Scott Linderman, James Zou, have developed a way to shift a substantial portion of LLM workloads to consumer devices by having small on-device models (such as Llama 3.2 with Ollama) collaborate with larger models in the cloud (such as GPT-4o).
Minions: where local and cloud LLMs meet February 25, 2025 Avanika Narayan, Dan Biderman, and Sabri Eyuboglu from Christopher Ré’s Stanford Hazy Research lab, along with Avner May, Scott Linderman, James Zou, have developed a way to shift a substantial portion of LLM workloads to consumer devices by having small on-device models (such as Llama 3.2 with Ollama) collaborate with larger models in the cloud (such as GPT-4o). This new paper with accompanying open-source code aims to reduce cloud costs with minimal or no quality degradation in two protocol configurations: - Minion: cloud model freely chats with a single local model with access to data, until the two reach a solution - Achieves 30.4x reduction in remote costs, while maintaining 87% of cloud model performance - MinionS: cloud model decomposes the task into bit-sized subtasks to be performed on chunks…
424dResearch#llama#gpt#local
505d ago
Structured outputs December 6, 2024 Ollama now supports structured outputs making it possible to constrain a model's output to a specific format defined by a JSON schema. The Ollama Python and JavaScript libraries have been updated to support structured outputs.
Structured outputs December 6, 2024 Ollama now supports structured outputs making it possible to constrain a model’s output to a specific format defined by a JSON schema. The Ollama Python and JavaScript libraries have been updated to support structured outputs. Use cases for structured outputs include: - Parsing data from documents - Extracting data from images - Structuring all language model responses - More reliability and consistency than JSON mode Get started Download the latest version of Ollama Upgrade to the latest version of the Ollama Python or JavaScript library: Python pip install -U ollama JavaScript npm i ollama To pass structured outputs to the model, the format parameter can be used in the cURL request or the format parameter in the Python or JavaScript libraries. cURL curl -X POST http://localhost:11434/api/chat -H "Content-Type: application/json" -d '{ "model": "llama3.1", "messages": [{"role":…
505dHardware#llama
516d ago
Ollama Python library 0.4 with function calling improvements November 25, 2024 With Ollama Python library version 0.4, functions can now be provided as tools. The library now also has full typing support and new examples have been added.
Ollama Python library 0.4 with function calling improvements November 25, 2024 In the latest version of the Ollama Python library, functions can now be provided as tools. The library now also has full typing support and new examples have been added. Get started Start by installing or upgrading the Ollama Python library: pip install -U ollama Passing Python functions as tools Define a Python function Start by defining a regular Python function. For better results, annotate parameter and return values types and optionally add a Google-style docstring: def add_two_numbers(a: int, b: int) -> int: """ Add two numbers Args: a: The first integer number b: The second integer number Returns: int: The sum of the two numbers """ return a + b Pass the function as a tool to Ollama Next, use the tools field to pass the function as…
516dAgents#llama
535d ago
Llama 3.2 Vision November 6, 2024 Llama 3.2 Vision 11B and 90B models are now available in Ollama.
Llama 3.2 Vision November 6, 2024 Llama 3.2 Vision is now available to run in Ollama, in both 11B and 90B sizes. Get started Download Ollama 0.4, then run: ollama run llama3.2-vision To run the larger 90B model: ollama run llama3.2-vision:90b To add an image to the prompt, drag and drop it into the terminal, or add a path to the image to the prompt on Linux. Note: Llama 3.2 Vision 11B requires least 8GB of VRAM, and the 90B model requires at least 64 GB of VRAM. Examples Handwriting Optical Character Recognition (OCR) Charts & tables Image Q&A Usage First, pull the model: ollama pull llama3.2-vision Python Library To use Llama 3.2 Vision with the Ollama Python library: import ollama response = ollama.chat( model='llama3.2-vision', messages=[{ 'role': 'user', 'content': 'What is in this image?', 'images': ['image.jpg'] }] ) print(response) JavaScript…
551d ago
IBM Granite 3.0 models October 21, 2024 Ollama partners with IBM to bring Granite 3.0 models to Ollama.
IBM Granite 3.0 models October 21, 2024 A selection of IBM Granite 3.0 models are now available to run using Ollama. All models are offered under a standard Apache 2.0 license. Performance on par with state-of-the-art open models 2B: ollama run granite3-dense 8B: ollama run granite3-dense:8b Granite 2B and Granite 8B are text-only dense LLMs trained on over 12 trillion tokens of data, demonstrated significant improvements over their predecessors in performance and speed in IBM’s initial testing. Granite 8B Instruct now rivals Llama 3.1 8B Instruct across both OpenLLM Leaderboard v1 and OpenLLM Leaderboard v2 benchmarks. They are designed to support tool-based use cases and support for retrieval augmented generation (RAG), streamlining code generation, translation and bug fixing. Mixture of Expert (MoE) models for low latency 1B: ollama run granite3-moe 3B: ollama run granite3-moe:3b The 1B and 3B models are…
551dModel#llama
577d ago
Llama 3.2 goes small and multimodal September 25, 2024 Ollama partners with Meta to bring Llama 3.2 to Ollama.
Llama 3.2 goes small and multimodal September 25, 2024 Meta’s Llama 3.2 is now available to run using Ollama. To get started, download Ollama and run Llama 3.2: ollama run llama3.2 Small & Multimodal: 1B, 3B, 11B and 90B 1B and 3B Text-only models 1B: ollama run llama3.2:1b 3B: ollama run llama3.2 1B and 3B models are text-only models are optimized to run locally on a mobile or edge device. They can be used to build highly personalized, on-device agents. For example, a person could ask it to summarize the last ten messages they received on WhatsApp, or to summarize their schedule for the next month. The prompts and responses should feel instantaneous, and with Ollama, processing is done locally, maintaining privacy by not sending data such as messages and other information to other third parties or cloud services. (Coming…
584d ago
Reduce hallucinations with Bespoke-Minicheck September 18, 2024 Bespoke-Minicheck is a new grounded factuality checking model developed by Bespoke Labs that is now available in Ollama. It can fact-check responses generated by other models to detect and reduce hallucinations.
Reduce hallucinations with Bespoke-Minicheck September 18, 2024 Bespoke-Minicheck is a new grounded factuality checking model developed by Bespoke Labs that is now available in Ollama. It can fact-check responses generated by other models to detect and reduce hallucinations. How it works Bespoke-Minicheck works by taking chunks of factual information (i.e. the Document) and generated output (i.e. the Claim) and verifying the claim against the document. If the document supports the claim, the model will output Yes. Otherwise, it will output No: RAG use case Bespoke-Minicheck is especially powerful when building Retrieval Augmented Generation (RAG) applications, as it can be used to make sure responses are grounded in the retrieved context provided to the LLM. This can be done as a post-processing step to detect hallucinations: For an example of how to use Bespoke-Minicheck in a RAG application using Ollama, see…
584dModel#llama
639d ago
Tool support July 25, 2024 Ollama now supports tool calling with popular models such as Llama 3.1. This enables a model to answer a given prompt using tool(s) it knows about, making it possible for models to perform more complex tasks or interact with the outside world.
Tool support July 25, 2024 Ollama now supports tool calling with popular models such as Llama 3.1. This enables a model to answer a given prompt using tool(s) it knows about, making it possible for models to perform more complex tasks or interact with the outside world. Example tools include: - Functions and APIs - Web browsing - Code interpreter - much more! Tool calling To enable tool calling, provide a list of available tools via the tools field in Ollama’s API. import ollama response = ollama.chat( model='llama3.1', messages=[{'role': 'user', 'content': 'What is the weather in Toronto?'}], # provide a weather checking tool to the model tools=[{ 'type': 'function', 'function': { 'name': 'get_current_weather', 'description': 'Get the current weather for a city', 'parameters': { 'type': 'object', 'properties': { 'city': { 'type': 'string', 'description': 'The name of the city', }, }, 'required':…
639dAgents#llama
667d ago
Google Gemma 2 June 27, 2024 Gemma 2 is now available on Ollama in 3 sizes - 2B, 9B and 27B.
Google Gemma 2 June 27, 2024 Google Gemma 2 is now available in three sizes, 2B, 9B and 27B, featuring a brand new architecture designed for class leading performance and efficiency. To run Gemma 2: ollama run gemma2 Class leading performance At 27 billion parameters, Gemma 2 delivers performance surpassing models more than twice its size in benchmarks. This breakthrough efficiency sets a new standard in the open model landscape. Three sizes: 2B, 9B and 27B parameters The initial release of Gemma 2 includes two sizes: - 2B Parameters ollama run gemma2:2b - 9B Parameters ollama run gemma2 - 27B Parameters ollama run gemma2:27b Using Gemma 2 with popular tooling LangChain from langchain_community.llms import Ollama llm = Ollama(model="gemma2") llm.invoke("Why is the sky blue?") LlamaIndex from llama_index.llms.ollama import Ollama llm = Ollama(model="gemma2") llm.complete("Why is the sky blue?")
667dModel#llama
694d ago
An entirely open-source AI code assistant inside your editor May 31, 2024 Continue enables you to easily create your own coding assistant directly inside Visual Studio Code and JetBrains with open-source LLMs.
An entirely open-source AI code assistant inside your editor May 31, 2024 This is a guest post from Ty Dunn, Co-founder of Continue, that covers how to set up, explore, and figure out the best way to use Continue and Ollama together. Continue enables you to easily create your own coding assistant directly inside Visual Studio Code and JetBrains with open-source LLMs. All this can run entirely on your own laptop or have Ollama deployed on a server to remotely power code completion and chat experiences based on your needs. To get set up, you’ll want to install Once you have them downloaded, here’s what we recommend exploring: Try out Mistral AI’s Codestral 22B model for autocomplete and chat As of the now, Codestral is our current favorite model capable of both autocomplete and chat. This model demonstrates how LLMs…
694dOpen Source#coding#open-source
705d ago
Google announces Firebase Genkit with Ollama support May 20, 2024 At Google IO 2024, Google announced Ollama support in Firebase Genkit, a new open-source framework for developers to build, deploy and monitor production-ready AI-powered apps.
Google announces Firebase Genkit with Ollama support May 20, 2024 At Google IO 2024, Google unveiled Firebase Genkit, featuring Ollama support for running Google’s open-source Gemma model on your local machine. Firebase Genkit is a new open-source framework for developers to build, deploy and monitor production-ready AI-powered apps. Getting started Firebase Genkit works with Ollama on MacOS, Windows, Linux, and via Docker containers. Install Genkit npm i -g genkit Download Google’s Gemma model ollama pull gemma If you don’t have Ollama installed, it can be downloaded here. Create and initialize a new node.js project mkdir genkit-ollama cd genkit-ollama npm init genkit init Genkit will now be running on localhost:4000
736d ago
Llama 3 is not very censored April 19, 2024 Compared to Llama 2, Llama 3 feels much less censored. Meta has substantially lowered false refusal rates. Llama 3 will refuse less than 1/3 of the prompts previously refused by Llama 2.
Llama 3 is not very censored April 19, 2024 Llama 3 feels significantly less censored than its predecessor. The Llama 3 models have substantially lower false refusal rates, with less than 1⁄3 the amount of false refusals when compared to Llama 2, making it possible to discuss a wider range of interesting topics! Examples Below are examples comparing results for series of prompts between Llama 3 8B and Llama 2 7B, both optimized using to 4-bit integer quantization: Killing time at the airport Llama 3 >>> Can you help me kill time at the airport? I'd be happy to help! Airports can be overwhelming, but there are plenty of ways to make the most of your wait. Here are some ideas: 1. **Explore the airport's amenities**: Check out the shops, restaurants, and bars. You might find something unique or delicious.…
736dModel#llama
737d ago
Llama 3 April 18, 2024 Llama 3 is now available to run on Ollama. This model is the next generation of Meta's state-of-the-art large language model, and is the most capable openly available LLM to date.
Llama 3 April 18, 2024 Llama 3 is now available to run using Ollama. To get started, Download Ollama and run Llama 3: ollama run llama3 The most capable model Llama 3 represents a large improvement over Llama 2 and other openly available models: - Trained on a dataset seven times larger than Llama 2 - Double the context length of 8K from Llama 2 - Encodes language much more efficiently using a larger token vocabulary with 128K tokens - Less than 1⁄3 of the false “refusals” when compared to Llama 2 Two sizes: 8B and 70B parameters The initial release of Llama 3 includes two sizes: - 8B Parameters ollama run llama3:8b - 70B Parameters ollama run llama3:70b Using Llama 3 with popular tooling LangChain from langchain_community.llms import Ollama llm = Ollama(model="llama3") llm.invoke("Why is the sky blue?") LlamaIndex from…
737dModel#llama
747d ago
Embedding models April 8, 2024 Embedding models are available in Ollama, making it easy to generate vector embeddings for use in search and retrieval augmented generation (RAG) applications.
Embedding models April 8, 2024 Ollama supports embedding models, making it possible to build retrieval augmented generation (RAG) applications that combine text prompts with existing documents or other data. What are embedding models? Embedding models are models that are trained specifically to generate vector embeddings: long arrays of numbers that represent semantic meaning for a given sequence of text: The resulting vector embedding arrays can then be stored in a database, which will compare them as a way to search for data that is similar in meaning. Example embedding models Usage To generate vector embeddings, first pull a model: ollama pull mxbai-embed-large Next, use the REST API, Python or JavaScript libraries to generate vector embeddings from the model: REST API curl http://localhost:11434/api/embed -d '{ "model": "mxbai-embed-large", "input": "Llamas are members of the camelid family" }' Python library ollama.embed( model='mxbai-embed-large', input='Llamas…
772d ago
Ollama now supports AMD graphics cards March 14, 2024 Ollama now supports AMD graphics cards in preview on Windows and Linux. All the features of Ollama can now be accelerated by AMD graphics cards on Ollama for Linux and Windows.
Ollama now supports AMD graphics cards March 14, 2024 Ollama now supports AMD graphics cards in preview on Windows and Linux. All the features of Ollama can now be accelerated by AMD graphics cards on Ollama for Linux and Windows. Supported graphics cards Support for more AMD graphics cards is coming soon. Get started To get started with Ollama with support for AMD graphics cards, download Ollama for Linux or Windows.
772dModel#llama
800d ago
Windows preview February 15, 2024 Ollama is now available on Windows in preview, making it possible to pull, run and create large language models in a new native Windows experience. Ollama on Windows includes built-in GPU acceleration, access to the full model library, and serves the Ollama API including OpenAI compatibility.
Windows preview February 15, 2024 Ollama is now available on Windows in preview, making it possible to pull, run and create large language models in a new native Windows experience. Ollama on Windows includes built-in GPU acceleration, access to the full model library, and the Ollama API including OpenAI compatibility. Hardware acceleration Ollama accelerates running models using NVIDIA GPUs as well as modern CPU instruction sets such as AVX and AVX2 if available. No configuration or virtualization required! Full access to the model library The full Ollama model library is available to run on Windows, including vision models. When running vision models such as LLaVA 1.6, images can be dragged and dropped into ollama run to add them to a message. Always-on Ollama API Ollama’s API automatically runs in the background, serving on http://localhost:11434 . Tools and applications can connect…
800dHardware#llama
807d ago
OpenAI compatibility February 8, 2024 Ollama now has initial compatibility with the OpenAI Chat Completions API, making it possible to use existing tooling built for OpenAI with local models via Ollama.
OpenAI compatibility February 8, 2024 Ollama now has built-in compatibility with the OpenAI Chat Completions API, making it possible to use more tooling and applications with Ollama locally. Setup Start by downloading Ollama and pulling a model such as Llama 2 or Mistral: ollama pull llama2 Usage cURL To invoke Ollama’s OpenAI compatible API endpoint, use the same OpenAI format and change the hostname to http://localhost:11434 : curl http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "llama2", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Hello!" } ] }' OpenAI Python library from openai import OpenAI client = OpenAI( base_url = 'http://localhost:11434/v1', api_key='ollama', # required, but unused ) response = client.chat.completions.create( model="llama2", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Who won the world series in 2020?"},…
807dModel#llama#local
813d ago
Vision models February 2, 2024 New vision models are now available: LLaVA 1.6, in 7B, 13B and 34B parameter sizes. These models support higher resolution images, improved text recognition and logical reasoning.
Vision models February 2, 2024 New LLaVA models The LLaVA (Large Language-and-Vision Assistant) model collection has been updated to version 1.6 supporting: - Higher image resolution: support for up to 4x more pixels, allowing the model to grasp more details. - Improved text recognition and reasoning capabilities: trained on additional document, chart and diagram data sets. - More permissive licenses: distributed via the Apache 2.0 license or the LLaMA 2 Community License. These models are available in three parameter sizes. 7B, 13B and a new 34B model: ollama run llava:7b ollama run llava:13b ollama run llava:34b Usage CLI To use a vision model with ollama run , reference .jpg or .png files using file paths: % ollama run llava "describe this image: ./art.jpg" The image shows a colorful poster featuring an illustration of a cartoon character with spiky hair. Above…
823d ago
Python & JavaScript Libraries January 23, 2024 The initial versions of the Ollama Python and JavaScript libraries are now available, making it easy to integrate your Python or JavaScript, or Typescript app with Ollama in a few lines of code. Both libraries include all the features of the Ollama REST API, are familiar in design, and compatible with new and previous versions of Ollama.
Python & JavaScript Libraries January 23, 2024 The initial versions of the Ollama Python and JavaScript libraries are now available: Both libraries make it possible to integrate new and existing apps with Ollama in a few lines of code, and share the features and feel of the Ollama REST API. Getting Started Python pip install ollama import ollama response = ollama.chat(model='llama2', messages=[ { 'role': 'user', 'content': 'Why is the sky blue?', }, ]) print(response['message']['content']) JavaScript npm install ollama import ollama from 'ollama' const response = await ollama.chat({ model: 'llama2', messages: [{ role: 'user', content: 'Why is the sky blue?' }], }) console.log(response.message.content) Use cases Both libraries support Ollama’s full set of features. Here are some examples in Python: Streaming for chunk in chat('mistral', messages=messages, stream=True): print(chunk['message']['content'], end='', flush=True) Multi-modal with open('image.png', 'rb') as file: response = ollama.chat( model='llava', messages=[ {…
823dAPI#llama
925d ago
Building LLM-Powered Web Apps with Client-Side Technology October 13, 2023 Recreate one of the most popular LangChain use-cases with open source, locally running software - a chain that performs Retrieval-Augmented Generation, or RAG for short, and allows you to “chat with your documents”
Building LLM-Powered Web Apps with Client-Side Technology October 13, 2023 This is a guest blog post by Jacob Lee, JS/TS maintainer at @LangChainAI, formerly co-founder & CTO at @Autocode, engineer on Google photos. The initial version of this blog post was a talk for Google’s internal WebML Summit 2023, which you can check out here: It’s no secret that for a long time machine learning has been mostly a Python game, but the recent surge in popularity of ChatGPT has brought many new developers into the field. With JavaScript being the most widely-used programming language, it’s no surprise that this has included many web developers, who have naturally tried to build web apps. There’s been a ton of ink spilled on building with LLMs via API calls to the likes of OpenAI, Anthropic, Google, and others, so I thought I’d…
933d ago
Ollama is now available as an official Docker image October 5, 2023 Ollama can now run with Docker Desktop on the Mac, and run inside Docker containers with GPU acceleration on Linux.
Ollama is now available as an official Docker image October 5, 2023 Ollama is now available as an official Docker image We are excited to share that Ollama is now available as an official Docker sponsored open-source image, making it simpler to get up and running with large language models using Docker containers. With Ollama, all your interactions with large language models happen locally without sending private data to third-party services. On the Mac Ollama handles running the model with GPU acceleration. It provides both a simple CLI as well as a REST API for interacting with your applications. To get started, simply download and install Ollama. On the Mac, please run Ollama as a standalone application outside of Docker containers as Docker Desktop does not support GPUs. On Linux Ollama can run with GPU acceleration inside Docker containers for…
933dHardware#llama
947d ago
Leveraging LLMs in your Obsidian Notes September 21, 2023 This post walks through how you could incorporate a local LLM using Ollama in Obsidian, or potentially any note taking tool.
Leveraging LLMs in your Obsidian Notes September 21, 2023 Today I saw a post on Hacker News about another plugin for Obsidian that integrates with ChatGPT. There are a bunch of these tools out there, and I love seeing the different ways to use them with the Obsidian. Making connections, letting you go further with your notes. Some commenters suggested it’s doing the work you need to do yourself, but I think it empowers you in new and incredible ways. Talk to your notes The first and perhaps most obvious thing you probably want to do is be able to converse with your notes. Ask it questions to gain further insights. It would be convenient if you could just point the model at your notes and be done with it. But most models can’t accept all that content all at…
959d ago
How to prompt Code Llama September 9, 2023 This guide walks through the different ways to structure prompts for Code Llama and its different variations and features including instructions, code completion and fill-in-the-middle (FIM).
How to prompt Code Llama September 9, 2023 Two weeks ago the Code Llama model was released by Meta with three variations: - Instruct - Code completion - Python This guide walks through the different ways to structure prompts for Code Llama for its different variations and features. Examples below use the 7 billion parameter model with 4-bit quantization, but 13 billion and 34 billion parameter models were made available as well. This guide uses the open-source Ollama project to download and prompt Code Llama, but these prompts will work in other model providers and runtimes too. Instruct The instruct model was trained to output human-like answers to questions. This model variation is the easiest to use and will behave closest to ChatGPT, with answer questions including both natural language and code: Prompt ollama run codellama:7b-instruct 'You are an expert…
959dTutorial#llama#coding
975d ago
Run Code Llama locally August 24, 2023 Meta's Code Llama is now available on Ollama to try.
Run Code Llama locally August 24, 2023 Today, Meta Platforms, Inc., releases Code Llama to the public, based on Llama 2 to provide state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks. Code Llama is now available on Ollama to try! If you haven’t already, installed Ollama, please download it here. For users to play with Code Llama: Available with 7 billion, 13 billion (16GB+ of memory requirement) and 34 billion (32GB+ of memory requirement) parameters: ollama run codellama:7b ollama run codellama:13b ollama run codellama:34b Example prompt: In Bash, how do I list all text files in the current directory (excluding subdirectories) that have been modified in the last month? Foundation models and Python specializations are available for code generation/completions tasks Foundation models: ollama pull codellama:7b-code ollama pull codellama:13b-code…
975dModel#llama#coding
998d ago
Run Llama 2 uncensored locally August 1, 2023 This post will give some example comparisons running Llama 2 uncensored model versus its censored model.
Run Llama 2 uncensored locally August 1, 2023 In May 2023, Eric Hartford, a machine learning engineer authored a popular blog post “Uncensored Models” providing his viewpoints to the merits of uncensored models, and how they are created. It’s a great read! This post will give some example comparisons running Llama 2 uncensored model vs its censored model. Some of the uncensored models that are available: - - Llama 2 7B model fine-tuned using Wizard-Vicuna conversation dataset - Try it: ollama run llama2-uncensored Nous Research’s Nous Hermes Llama 2 13B - Llama 2 13B model fine-tuned on over 300,000 instructions. This model stands out for its long responses, lower hallucination rate, and absence of OpenAI censorship mechanisms - Try it: ollama run nous-hermes-llama2 Eric Hartford’s Wizard Vicuna 13B uncensored - Llama 1 13B model fine-tuned to remove alignment - Try…
998dModel#llama