Providers
OneLLM supports 18 providers, giving you access to 300+ language models through a unified interface.
Provider List
🚀 Major Providers
OpenAI
- Models: GPT-4o, GPT-4, GPT-3.5-Turbo
- Features: Function calling, JSON mode, vision, DALL-E, embeddings
- Pricing: Pay per token
- Best for: General purpose, production applications
- Setup: OpenAI Setup Guide
Anthropic
- Models: Claude 3.5 Sonnet, Claude 3 Opus/Sonnet/Haiku
- Features: 200K+ context, vision support
- Pricing: Pay per token
- Best for: Long context, careful reasoning
- Setup: Anthropic Setup Guide
Google AI Studio
- Models: Gemini 1.5 Pro/Flash, Gemini Pro
- Features: Multimodal, 1M+ context, JSON mode
- Pricing: Free tier available
- Best for: Multimodal tasks, long context
- Setup: Google Setup Guide
Mistral
- Models: Mistral Large/Medium/Small, Mixtral
- Features: European hosting, function calling
- Pricing: Pay per token
- Best for: EU compliance, multilingual
- Setup: Mistral Setup Guide
⚡ Fast Inference Providers
Groq
- Models: Llama 3, Mixtral, Gemma
- Features: Ultra-fast LPU inference, 10x faster
- Pricing: Pay per token
- Best for: Real-time applications, low latency
- Setup: Groq Setup Guide
Together AI
- Models: Llama, Mistral, CodeLlama, 50+ models
- Features: Open source models, custom fine-tunes
- Pricing: Simple per-token pricing
- Best for: Open source models, research
- Setup: Together Setup Guide
Fireworks
- Models: Llama, Mixtral, Starcoder
- Features: Optimized inference, function calling
- Pricing: Competitive per-token
- Best for: Fast open model serving
- Setup: Fireworks Setup Guide
Anyscale
- Models: Llama, Mistral, CodeLlama
- Features: Ray integration, schema-based JSON
- Pricing: $1/million tokens flat rate
- Best for: Scale-out workloads
- Setup: Anyscale Setup Guide
🌐 Specialized Providers
X.AI (Grok)
- Models: Grok-2, Grok-1
- Features: 128K context window
- Pricing: Premium
- Best for: Large context, reasoning
- Setup: X.AI Setup Guide
Perplexity
- Models: Sonar models with web search
- Features: Real-time web access, citations
- Pricing: Pay per request
- Best for: Current information, research
- Setup: Perplexity Setup Guide
DeepSeek
- Models: DeepSeek Chat, DeepSeek Coder
- Features: Chinese/English bilingual
- Pricing: Competitive
- Best for: Chinese language, coding
- Setup: DeepSeek Setup Guide
Cohere
- Models: Command R/R+, Embed
- Features: RAG optimization, embeddings
- Pricing: Enterprise/startup plans
- Best for: Enterprise NLP, search
- Setup: Cohere Setup Guide
🌍 Multi-Provider Gateways
OpenRouter
- Models: 100+ models from all providers
- Features: Unified billing, free models
- Pricing: Small markup on provider prices
- Best for: Model exploration, fallbacks
- Setup: OpenRouter Setup Guide
☁️ Enterprise Cloud
Azure OpenAI
- Models: GPT-4, GPT-3.5, DALL-E, Embeddings
- Features: Enterprise SLA, VNet integration
- Pricing: Same as OpenAI
- Best for: Enterprise, compliance
- Setup: Azure Setup Guide
AWS Bedrock
- Models: Claude, Llama, Titan, Stable Diffusion
- Features: AWS integration, multiple providers
- Pricing: Pay per use
- Best for: AWS ecosystem
- Setup: Bedrock Setup Guide
Google Vertex AI
- Models: Gemini, PaLM, Codey
- Features: MLOps platform, enterprise
- Pricing: Enterprise pricing
- Best for: GCP ecosystem
- Setup: Vertex AI Setup Guide
💻 Local Providers
Ollama
- Models: Any GGUF model
- Features: Local hosting, model management
- Pricing: Free (self-hosted)
- Best for: Privacy, offline use
- Setup: Ollama Setup Guide
llama.cpp
- Models: Any GGUF model
- Features: Direct inference, GPU support
- Pricing: Free (self-hosted)
- Best for: Maximum control, embedded
- Setup: llama.cpp Setup Guide
Provider Comparison
By Speed
- Groq - Ultra-fast LPU (100+ tokens/sec)
- Fireworks - Optimized inference
- Together - Fast parallel inference
- OpenAI - Reliable performance
- Local - Depends on hardware
By Context Length
- Google Gemini 1.5 - 1M+ tokens
- Anthropic Claude - 200K tokens
- X.AI Grok - 128K tokens
- Perplexity - 128K tokens
- OpenAI GPT-4 - 128K tokens
By Price (Lowest to Highest)
- Local (Ollama/llama.cpp) - Free
- Anyscale - $1/M tokens flat
- Together/Fireworks - Competitive
- OpenRouter - Various options
- OpenAI/Anthropic - Premium
By Features
- Function Calling: OpenAI, Mistral, Groq, Anyscale
- Vision: OpenAI, Anthropic, Google, Vertex AI
- Web Search: Perplexity
- JSON Mode: OpenAI, Google, Mistral, Groq
- Embeddings: OpenAI, Cohere, Google, Bedrock
Model Naming Convention
Models are specified using a provider prefix to clearly identify the source:
Provider | Format | Example |
---|---|---|
OpenAI | openai/{model} | openai/gpt-4 |
google/{model} | google/gemini-pro | |
Anthropic | anthropic/{model} | anthropic/claude-3-opus |
Groq | groq/{model} | groq/llama3-70b |
Mistral | mistral/{model} | mistral/mistral-large |
Ollama | ollama/{model}@host:port | ollama/llama3:8b@localhost:11434 |
llama.cpp | llama_cpp/{model.gguf} | llama_cpp/llama-3-8b-q4_K_M.gguf |
XAI (Grok) | xai/{model} | xai/grok-beta |
Cohere | cohere/{model} | cohere/command-r-plus |
AWS Bedrock | bedrock/{model} | bedrock/claude-3-5-sonnet |
Additional Examples
# Standard models
"openai/gpt-4o-mini"
"anthropic/claude-3-5-sonnet-20241022"
"google/gemini-1.5-flash"
"groq/llama3-70b-8192"
# Models with organization prefixes
"together/meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo"
"fireworks/accounts/fireworks/models/llama-v3p1-70b-instruct"
# Local models
"ollama/llama3:latest"
"llama_cpp/models/llama-3-8b-instruct.Q4_K_M.gguf"
Quick Start
from onellm import OpenAI
# Client works with all providers
client = OpenAI()
# Use any provider by changing model name
response = client.chat.completions.create(
model="anthropic/claude-3-5-sonnet-20241022", # Just change this
messages=[{"role": "user", "content": "Hello!"}]
)
Choosing a Provider
For Production
- OpenAI: Most reliable, best ecosystem
- Anthropic: Best for complex reasoning
- Azure OpenAI: Enterprise requirements
For Speed
- Groq: Ultra-fast responses
- Fireworks: Fast and affordable
- Local: No network latency
For Cost
- Local: Free (your hardware)
- Anyscale: Predictable pricing
- OpenRouter: Access to free models
For Privacy
- Ollama: Fully local
- llama.cpp: Complete control
- Azure/Vertex: Enterprise privacy
Next Steps
- Provider Setup - Detailed setup instructions
- Provider Capabilities - Feature comparison matrix
- Examples - Provider-specific examples
- Best Practices - Choosing providers