Providers
OneLLM supports 21 providers, giving you access to 300+ language models through a unified interface.
Provider List
🚀 Major Providers
OpenAI
- Models:
- GPT-5 family:
gpt-5,gpt-5-pro,gpt-5-mini,gpt-5-nano - GPT-4 family:
gpt-4o,gpt-4o-mini,gpt-4-turbo,gpt-4,gpt-4-turbo-preview - GPT-3.5:
gpt-3.5-turbo,gpt-3.5-turbo-16k - O-series (reasoning):
o1,o1-preview,o1-mini,o3,o3-mini - Embeddings:
text-embedding-3-small,text-embedding-3-large,text-embedding-ada-002
- GPT-5 family:
- Features: Function calling, JSON mode, vision, DALL-E, embeddings
- Pricing: Pay per token
- Best for: General purpose, production applications
- Setup: OpenAI Setup Guide
Anthropic
- Models:
- Claude 4 family:
claude-sonnet-4.5,claude-opus-4.1,claude-sonnet-4,claude-opus-4 - Claude 3.5:
claude-3-5-sonnet-20241022,claude-3-5-sonnet-20240620 - Claude 3:
claude-3-opus-20240229,claude-3-sonnet-20240229,claude-3-haiku-20240307 - Legacy:
claude-2.1,claude-2.0,claude-instant-1.2
- Claude 4 family:
- Features: 200K+ context, vision support
- Pricing: Pay per token
- Best for: Long context, careful reasoning
- Setup: Anthropic Setup Guide
Google AI Studio
- Models:
- Gemini 2.5:
gemini-2.5-pro,gemini-2.5-flash,gemini-2.5-flash-lite,gemini-2.5-flash-image - Gemini 1.5:
gemini-1.5-pro,gemini-1.5-pro-latest,gemini-1.5-flash,gemini-1.5-flash-latest - Gemini 1.0:
gemini-pro,gemini-pro-vision - Embeddings:
text-embedding-004,embedding-001
- Gemini 2.5:
- Features: Multimodal, 1M+ context, JSON mode
- Pricing: Free tier available
- Best for: Multimodal tasks, long context
- Setup: Google Setup Guide
Mistral
- Models:
- Latest:
mistral-large-latest,mistral-medium-latest,mistral-small-latest - Specialized:
codestral(code),pixtral(vision),devstral(development),voxtral(voice),ministral(lightweight) - Mixtral:
mixtral-8x7b,mixtral-8x22b - Legacy:
mistral-tiny,open-mistral-7b
- Latest:
- Features: European hosting, function calling
- Pricing: Pay per token
- Best for: EU compliance, multilingual
- Setup: Mistral Setup Guide
⚡ Fast Inference Providers
Groq
- Models:
- Llama 3:
llama3-70b-8192,llama3-8b-8192,llama-3.1-70b-versatile,llama-3.1-8b-instant - Mixtral:
mixtral-8x7b-32768 - Gemma:
gemma-7b-it,gemma2-9b-it - Llama Guard:
llama-guard-3-8b(content moderation)
- Llama 3:
- Features: Ultra-fast LPU inference, 10x faster
- Pricing: Pay per token
- Best for: Real-time applications, low latency
- Setup: Groq Setup Guide
Together AI
- Models:
- Llama:
meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo,meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo - Mixtral:
mistralai/Mixtral-8x7B-Instruct-v0.1,mistralai/Mixtral-8x22B-Instruct-v0.1 - Qwen:
Qwen/Qwen2.5-72B-Instruct-Turbo,Qwen/Qwen2.5-7B-Instruct-Turbo - DeepSeek:
deepseek-ai/deepseek-llm-67b-chat - CodeLlama:
codellama/CodeLlama-34b-Instruct-hf - 50+ other open-source models
- Llama:
- Features: Open source models, custom fine-tunes
- Pricing: Simple per-token pricing
- Best for: Open source models, research
- Setup: Together Setup Guide
Fireworks
- Models:
- Llama:
accounts/fireworks/models/llama-v3p1-70b-instruct,accounts/fireworks/models/llama-v3p1-8b-instruct - Mixtral:
accounts/fireworks/models/mixtral-8x7b-instruct,accounts/fireworks/models/mixtral-8x22b-instruct - Qwen:
accounts/fireworks/models/qwen2p5-72b-instruct - Deepseek:
accounts/fireworks/models/deepseek-v3 - StarCoder:
accounts/fireworks/models/starcoder-16b
- Llama:
- Features: Optimized inference, function calling
- Pricing: Competitive per-token
- Best for: Fast open model serving
- Setup: Fireworks Setup Guide
Anyscale
- Models:
- Llama:
meta-llama/Meta-Llama-3.1-70B-Instruct,meta-llama/Meta-Llama-3.1-8B-Instruct - Mixtral:
mistralai/Mixtral-8x7B-Instruct-v0.1 - Qwen:
Qwen/Qwen2.5-72B-Instruct - Gemma:
google/gemma-2-9b-it
- Llama:
- Features: Ray integration, schema-based JSON
- Pricing: $1/million tokens flat rate
- Best for: Scale-out workloads
- Setup: Anyscale Setup Guide
🌐 Specialized Providers
X.AI (Grok)
- Models:
- Latest:
grok-2-latest,grok-2-1212,grok-2-vision-1212 - Grok 2:
grok-2-public,grok-2-mini - Legacy:
grok-1,grok-beta
- Latest:
- Features: 128K context window
- Pricing: Premium
- Best for: Large context, reasoning
- Setup: X.AI Setup Guide
Perplexity
- Models:
- Sonar (online):
llama-3.1-sonar-small-128k-online,llama-3.1-sonar-large-128k-online,llama-3.1-sonar-huge-128k-online - Sonar (chat):
llama-3.1-sonar-small-128k-chat,llama-3.1-sonar-large-128k-chat - Sonar Pro:
sonar-pro(advanced search)
- Sonar (online):
- Features: Real-time web access, citations
- Pricing: Pay per request
- Best for: Current information, research
- Setup: Perplexity Setup Guide
DeepSeek
- Models:
- Latest:
deepseek-chat,deepseek-reasoner - Specialized:
deepseek-coder(coding tasks) - Legacy:
deepseek-llm-67b-chat
- Latest:
- Features: Chinese/English bilingual
- Pricing: Competitive
- Best for: Chinese language, coding
- Setup: DeepSeek Setup Guide
Moonshot
- Models:
- Kimi:
moonshot-v1-8k,moonshot-v1-32k,moonshot-v1-128k - Latest:
kimi-k2-0711-preview(preview) - Vision:
kimi-vl(multimodal) - Audio:
kimi-audio(voice input)
- Kimi:
- Features: Long-context (200K+ tokens), Chinese/English bilingual, vision support
- Pricing: Cost-effective (~5x cheaper than Claude/Gemini)
- Best for: Long-context processing, Chinese language, document analysis
- Setup: Moonshot Setup Guide
GLM (Zhipu AI)
- Models:
- GLM-4:
glm-4,glm-4-plus,glm-4-air,glm-4-flash - GLM-4V:
glm-4v(vision support) - Legacy:
glm-3-turbo
- GLM-4:
- Features: Chinese/English bilingual, streaming, function calling, vision
- Pricing: Competitive
- Best for: Chinese language tasks, cost-effective inference
- Setup: GLM Setup Guide
Cohere
- Models:
- Command:
command-r-plus,command-r,command,command-light - Embeddings:
embed-english-v3.0,embed-multilingual-v3.0,embed-english-light-v3.0
- Command:
- Features: RAG optimization, embeddings
- Pricing: Enterprise/startup plans
- Best for: Enterprise NLP, search
- Setup: Cohere Setup Guide
🌍 Multi-Provider Gateways
OpenRouter
- Models:
- Access 100+ models using
openrouter/{provider}/{model}format - Free models:
meta-llama/llama-3.2-3b-instruct:free,google/gemma-2-9b-it:free - Premium:
anthropic/claude-3.5-sonnet,openai/gpt-4o,google/gemini-2.5-pro-exp
- Access 100+ models using
- Features: Unified billing, free models
- Pricing: Small markup on provider prices
- Best for: Model exploration, fallbacks
- Setup: OpenRouter Setup Guide
Vercel AI Gateway
- Models:
- Access 100+ models using
vercel/{provider}/{model}format - OpenAI:
vercel/openai/gpt-4o-mini,vercel/openai/gpt-4o - Anthropic:
vercel/anthropic/claude-sonnet-4,vercel/anthropic/claude-opus-4 - Google:
vercel/google/gemini-2.5-pro,vercel/google/gemini-2.5-flash - Meta:
vercel/meta/llama-3.1-70b-instruct - Many more providers and models
- Access 100+ models using
- Features: Unified billing, streaming, function calling, vision
- Pricing: Provider passthrough with optional markup
- Best for: Production deployments, unified billing
- Setup: Vercel Setup Guide
☁️ Enterprise Cloud
Azure OpenAI
- Models:
- GPT-4:
gpt-4,gpt-4-turbo,gpt-4o,gpt-4o-mini - GPT-3.5:
gpt-35-turbo,gpt-35-turbo-16k - Embeddings:
text-embedding-3-small,text-embedding-3-large,text-embedding-ada-002 - DALL-E:
dall-e-3,dall-e-2
- GPT-4:
- Features: Enterprise SLA, VNet integration
- Pricing: Same as OpenAI
- Best for: Enterprise, compliance
- Setup: Azure Setup Guide
AWS Bedrock
- Models:
- Anthropic:
anthropic.claude-3-5-sonnet-20241022-v2:0,anthropic.claude-3-opus-20240229-v1:0 - Meta:
meta.llama3-1-70b-instruct-v1:0,meta.llama3-1-8b-instruct-v1:0 - Amazon:
amazon.titan-text-premier-v1:0,amazon.titan-embed-text-v2:0 - Cohere:
cohere.command-r-plus-v1:0,cohere.embed-english-v3 - Mistral:
mistral.mistral-large-2407-v1:0
- Anthropic:
- Features: AWS integration, multiple providers
- Pricing: Pay per use
- Best for: AWS ecosystem
- Setup: Bedrock Setup Guide
Google Vertex AI
- Models:
- Gemini:
gemini-2.5-pro,gemini-2.5-flash,gemini-1.5-pro,gemini-1.5-flash - Legacy:
gemini-pro,gemini-pro-vision - Embeddings:
text-embedding-004,textembedding-gecko@003
- Gemini:
- Features: MLOps platform, enterprise
- Pricing: Enterprise pricing
- Best for: GCP ecosystem
- Setup: Vertex AI Setup Guide
💻 Local Providers
Ollama
- Models:
- Popular:
llama3,llama3.1,mistral,mixtral,gemma2,qwen2.5 - Code:
codellama,deepseek-coder-v2,starcoder2 - Vision:
llava,llava-phi3,bakllava - Specialized:
dolphin-mixtral,wizardlm2,phi3 - Any model from ollama.com/library
- Popular:
- Features: Local hosting, model management
- Pricing: Free (self-hosted)
- Best for: Privacy, offline use
- Setup: Ollama Setup Guide
llama.cpp
- Models:
- Any GGUF model from HuggingFace
- Llama:
llama-3-8b-instruct.Q4_K_M.gguf,llama-3.1-70b-instruct.Q4_K_M.gguf - Mistral:
mistral-7b-instruct.Q4_K_M.gguf - Quantization levels: Q4_K_M (recommended), Q5_K_M, Q8_0, etc.
- Use
onellm download <model>to fetch models
- Features: Direct inference, GPU support
- Pricing: Free (self-hosted)
- Best for: Maximum control, embedded
- Setup: llama.cpp Setup Guide
Provider Comparison
By Speed
- Groq - Ultra-fast LPU (100+ tokens/sec)
- Fireworks - Optimized inference
- Together - Fast parallel inference
- OpenAI - Reliable performance
- Local - Depends on hardware
By Context Length
- Google Gemini 1.5 - 1M+ tokens
- Moonshot Kimi - 200K tokens
- Anthropic Claude - 200K tokens
- X.AI Grok - 128K tokens
- Perplexity - 128K tokens
- OpenAI GPT-4 - 128K tokens
By Price (Lowest to Highest)
- Local (Ollama/llama.cpp) - Free
- Anyscale - $1/M tokens flat
- Together/Fireworks - Competitive
- OpenRouter - Various options
- OpenAI/Anthropic - Premium
By Features
- Function Calling: OpenAI, Mistral, Groq, Anyscale, Moonshot
- Vision: OpenAI, Anthropic, Google, Vertex AI, Moonshot
- Web Search: Perplexity
- JSON Mode: OpenAI, Google, Mistral, Groq, Moonshot
- Embeddings: OpenAI, Cohere, Google, Bedrock
Model Naming Convention
Models are specified using a provider prefix to clearly identify the source:
| Provider | Format | Example |
|---|---|---|
| OpenAI | openai/{model} | openai/gpt-4 |
google/{model} | google/gemini-pro | |
| Anthropic | anthropic/{model} | anthropic/claude-3-opus |
| Groq | groq/{model} | groq/llama3-70b |
| Mistral | mistral/{model} | mistral/mistral-large |
| Ollama | ollama/{model}@host:port | ollama/llama3:8b@localhost:11434 |
| llama.cpp | llama_cpp/{model.gguf} | llama_cpp/llama-3-8b-q4_K_M.gguf |
| XAI (Grok) | xai/{model} | xai/grok-beta |
| Cohere | cohere/{model} | cohere/command-r-plus |
| AWS Bedrock | bedrock/{model} | bedrock/claude-3-5-sonnet |
| Moonshot | moonshot/{model} | moonshot/moonshot-v1-8k |
Additional Examples
# Standard models
"openai/gpt-4o-mini"
"anthropic/claude-3-5-sonnet-20241022"
"google/gemini-1.5-flash"
"groq/llama3-70b-8192"
"moonshot/moonshot-v1-8k"
# Models with organization prefixes
"together/meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo"
"fireworks/accounts/fireworks/models/llama-v3p1-70b-instruct"
# Local models
"ollama/llama3:latest"
"llama_cpp/models/llama-3-8b-instruct.Q4_K_M.gguf"
Quick Start
from onellm import OpenAI
# Client works with all providers
client = OpenAI()
# Use any provider by changing model name
response = client.chat.completions.create(
model="anthropic/claude-3-5-sonnet-20241022", # Just change this
messages=[{"role": "user", "content": "Hello!"}]
)
Choosing a Provider
For Production
- OpenAI: Most reliable, best ecosystem
- Anthropic: Best for complex reasoning
- Azure OpenAI: Enterprise requirements
For Speed
- Groq: Ultra-fast responses
- Fireworks: Fast and affordable
- Local: No network latency
For Cost
- Local: Free (your hardware)
- Anyscale: Predictable pricing
- OpenRouter: Access to free models
For Privacy
- Ollama: Fully local
- llama.cpp: Complete control
- Azure/Vertex: Enterprise privacy
Next Steps
- Provider Setup - Detailed setup instructions
- Provider Capabilities - Feature comparison matrix