Providers

OneLLM supports 18 providers, giving you access to 300+ language models through a unified interface.

Provider List

🚀 Major Providers

OpenAI

  • Models: GPT-4o, GPT-4, GPT-3.5-Turbo
  • Features: Function calling, JSON mode, vision, DALL-E, embeddings
  • Pricing: Pay per token
  • Best for: General purpose, production applications
  • Setup: OpenAI Setup Guide

Anthropic

  • Models: Claude 3.5 Sonnet, Claude 3 Opus/Sonnet/Haiku
  • Features: 200K+ context, vision support
  • Pricing: Pay per token
  • Best for: Long context, careful reasoning
  • Setup: Anthropic Setup Guide

Google AI Studio

  • Models: Gemini 1.5 Pro/Flash, Gemini Pro
  • Features: Multimodal, 1M+ context, JSON mode
  • Pricing: Free tier available
  • Best for: Multimodal tasks, long context
  • Setup: Google Setup Guide

Mistral

  • Models: Mistral Large/Medium/Small, Mixtral
  • Features: European hosting, function calling
  • Pricing: Pay per token
  • Best for: EU compliance, multilingual
  • Setup: Mistral Setup Guide

⚡ Fast Inference Providers

Groq

  • Models: Llama 3, Mixtral, Gemma
  • Features: Ultra-fast LPU inference, 10x faster
  • Pricing: Pay per token
  • Best for: Real-time applications, low latency
  • Setup: Groq Setup Guide

Together AI

  • Models: Llama, Mistral, CodeLlama, 50+ models
  • Features: Open source models, custom fine-tunes
  • Pricing: Simple per-token pricing
  • Best for: Open source models, research
  • Setup: Together Setup Guide

Fireworks

  • Models: Llama, Mixtral, Starcoder
  • Features: Optimized inference, function calling
  • Pricing: Competitive per-token
  • Best for: Fast open model serving
  • Setup: Fireworks Setup Guide

Anyscale

  • Models: Llama, Mistral, CodeLlama
  • Features: Ray integration, schema-based JSON
  • Pricing: $1/million tokens flat rate
  • Best for: Scale-out workloads
  • Setup: Anyscale Setup Guide

🌐 Specialized Providers

X.AI (Grok)

  • Models: Grok-2, Grok-1
  • Features: 128K context window
  • Pricing: Premium
  • Best for: Large context, reasoning
  • Setup: X.AI Setup Guide

Perplexity

  • Models: Sonar models with web search
  • Features: Real-time web access, citations
  • Pricing: Pay per request
  • Best for: Current information, research
  • Setup: Perplexity Setup Guide

DeepSeek

  • Models: DeepSeek Chat, DeepSeek Coder
  • Features: Chinese/English bilingual
  • Pricing: Competitive
  • Best for: Chinese language, coding
  • Setup: DeepSeek Setup Guide

Cohere

  • Models: Command R/R+, Embed
  • Features: RAG optimization, embeddings
  • Pricing: Enterprise/startup plans
  • Best for: Enterprise NLP, search
  • Setup: Cohere Setup Guide

🌍 Multi-Provider Gateways

OpenRouter

  • Models: 100+ models from all providers
  • Features: Unified billing, free models
  • Pricing: Small markup on provider prices
  • Best for: Model exploration, fallbacks
  • Setup: OpenRouter Setup Guide

☁️ Enterprise Cloud

Azure OpenAI

  • Models: GPT-4, GPT-3.5, DALL-E, Embeddings
  • Features: Enterprise SLA, VNet integration
  • Pricing: Same as OpenAI
  • Best for: Enterprise, compliance
  • Setup: Azure Setup Guide

AWS Bedrock

  • Models: Claude, Llama, Titan, Stable Diffusion
  • Features: AWS integration, multiple providers
  • Pricing: Pay per use
  • Best for: AWS ecosystem
  • Setup: Bedrock Setup Guide

Google Vertex AI

  • Models: Gemini, PaLM, Codey
  • Features: MLOps platform, enterprise
  • Pricing: Enterprise pricing
  • Best for: GCP ecosystem
  • Setup: Vertex AI Setup Guide

💻 Local Providers

Ollama

  • Models: Any GGUF model
  • Features: Local hosting, model management
  • Pricing: Free (self-hosted)
  • Best for: Privacy, offline use
  • Setup: Ollama Setup Guide

llama.cpp

  • Models: Any GGUF model
  • Features: Direct inference, GPU support
  • Pricing: Free (self-hosted)
  • Best for: Maximum control, embedded
  • Setup: llama.cpp Setup Guide

Provider Comparison

By Speed

  1. Groq - Ultra-fast LPU (100+ tokens/sec)
  2. Fireworks - Optimized inference
  3. Together - Fast parallel inference
  4. OpenAI - Reliable performance
  5. Local - Depends on hardware

By Context Length

  1. Google Gemini 1.5 - 1M+ tokens
  2. Anthropic Claude - 200K tokens
  3. X.AI Grok - 128K tokens
  4. Perplexity - 128K tokens
  5. OpenAI GPT-4 - 128K tokens

By Price (Lowest to Highest)

  1. Local (Ollama/llama.cpp) - Free
  2. Anyscale - $1/M tokens flat
  3. Together/Fireworks - Competitive
  4. OpenRouter - Various options
  5. OpenAI/Anthropic - Premium

By Features

  • Function Calling: OpenAI, Mistral, Groq, Anyscale
  • Vision: OpenAI, Anthropic, Google, Vertex AI
  • Web Search: Perplexity
  • JSON Mode: OpenAI, Google, Mistral, Groq
  • Embeddings: OpenAI, Cohere, Google, Bedrock

Model Naming Convention

Models are specified using a provider prefix to clearly identify the source:

Provider Format Example
OpenAI openai/{model} openai/gpt-4
Google google/{model} google/gemini-pro
Anthropic anthropic/{model} anthropic/claude-3-opus
Groq groq/{model} groq/llama3-70b
Mistral mistral/{model} mistral/mistral-large
Ollama ollama/{model}@host:port ollama/llama3:8b@localhost:11434
llama.cpp llama_cpp/{model.gguf} llama_cpp/llama-3-8b-q4_K_M.gguf
XAI (Grok) xai/{model} xai/grok-beta
Cohere cohere/{model} cohere/command-r-plus
AWS Bedrock bedrock/{model} bedrock/claude-3-5-sonnet

Additional Examples

# Standard models
"openai/gpt-4o-mini"
"anthropic/claude-3-5-sonnet-20241022"
"google/gemini-1.5-flash"
"groq/llama3-70b-8192"

# Models with organization prefixes
"together/meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo"
"fireworks/accounts/fireworks/models/llama-v3p1-70b-instruct"

# Local models
"ollama/llama3:latest"
"llama_cpp/models/llama-3-8b-instruct.Q4_K_M.gguf"

Quick Start

from onellm import OpenAI

# Client works with all providers
client = OpenAI()

# Use any provider by changing model name
response = client.chat.completions.create(
    model="anthropic/claude-3-5-sonnet-20241022",  # Just change this
    messages=[{"role": "user", "content": "Hello!"}]
)

Choosing a Provider

For Production

  • OpenAI: Most reliable, best ecosystem
  • Anthropic: Best for complex reasoning
  • Azure OpenAI: Enterprise requirements

For Speed

  • Groq: Ultra-fast responses
  • Fireworks: Fast and affordable
  • Local: No network latency

For Cost

  • Local: Free (your hardware)
  • Anyscale: Predictable pricing
  • OpenRouter: Access to free models

For Privacy

  • Ollama: Fully local
  • llama.cpp: Complete control
  • Azure/Vertex: Enterprise privacy

Next Steps


Table of contents


Back to top

Copyright © 2025 Ran Aroussi.
Licensed under Apache 2.0 license.