Provider Capabilities
Comprehensive comparison of features and capabilities across all OneLLM providers.
Feature Matrix
Provider | Chat | Stream | Functions | JSON | Vision | Audio | Embed | Search | Local |
---|---|---|---|---|---|---|---|---|---|
OpenAI | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
Anthropic | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | |
Mistral | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ |
Groq | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
Together | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
Fireworks | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
Anyscale | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
X.AI | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
Perplexity | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ |
DeepSeek | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
Cohere | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ |
OpenRouter | ✅ | ✅ | Varies | Varies | Varies | ❌ | Varies | ❌ | ❌ |
Azure | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
Bedrock | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ |
Vertex AI | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
Ollama | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ |
llama.cpp | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ |
Legend
- ✅ Supported
- ❌ Not Supported
- Varies: Depends on underlying model
Notable Models by Category
Through these providers, you gain access to hundreds of models across different categories:
Model Family | Notable Models |
---|---|
OpenAI Family | GPT-4o, GPT-4 Turbo, o3 |
Claude Family | Claude 3 Opus, Claude 3 Sonnet, Claude 3 Haiku |
Llama Family | Llama 3 70B, Llama 3 8B, Code Llama |
Mistral Family | Mistral Large, Mistral 7B, Mixtral |
Gemini Family | Gemini Pro, Gemini Ultra, Gemini Flash |
Embeddings | Ada-002, text-embedding-3-small/large, Cohere embeddings |
Multimodal | GPT-4 Vision, Claude 3 Vision, Gemini Pro Vision |
Detailed Capabilities
Chat Completions
All providers support basic chat completions with:
- System messages
- Multi-turn conversations
- Temperature control
- Max tokens limit
Streaming
All providers support streaming responses for better UX.
Function Calling
Providers with function calling support:
- OpenAI: Full support with parallel calls
- Mistral: Full support
- Groq: Basic support
- Together: Basic support
- Fireworks: Basic support
- Anyscale: Single function calls only
- Azure: Full support (same as OpenAI)
- Vertex AI: Full support
JSON Mode
Force structured JSON output:
- OpenAI:
response_format={"type": "json_object"}
- Anthropic: Via prompting
- Google: Native support
- Mistral: Native support
- Groq: Native support
- Anyscale: With schema specification
Vision/Multimodal
Process images alongside text:
- OpenAI: GPT-4V models
- Anthropic: Claude 3 models
- Google: All Gemini models
- Azure: GPT-4V deployments
- Bedrock: Claude 3, select others
- Vertex AI: Gemini models
Audio Processing
- OpenAI: Whisper (transcription), TTS
- Google: Speech services
- Azure: Full audio support
- Vertex AI: Speech services
Embeddings
Generate text embeddings:
- OpenAI: text-embedding-3-small/large
- Google: text-embedding models
- Mistral: mistral-embed
- Cohere: embed-v3 models
- Azure: OpenAI embeddings
- Bedrock: Titan, Cohere embeddings
- Vertex AI: text-embedding models
Web Search
Real-time internet access:
- Perplexity: All Sonar models with “online” suffix
Local Execution
Run models on your hardware:
- Ollama: Model management included
- llama.cpp: Direct GGUF execution
Context Windows
Provider | Model | Max Context |
---|---|---|
Gemini 1.5 Pro | 2,000,000 tokens | |
Gemini 1.5 Flash | 1,000,000 tokens | |
Anthropic | Claude 3 | 200,000 tokens |
X.AI | Grok-2 | 128,000 tokens |
Perplexity | Sonar models | 128,000 tokens |
OpenAI | GPT-4 Turbo | 128,000 tokens |
Mistral | Large | 32,000 tokens |
Groq | Mixtral | 32,768 tokens |
Most others | - | 4,096-16,384 tokens |
Performance Characteristics
Response Speed (First Token)
- Groq: <100ms (LPU acceleration)
- Fireworks: ~200ms
- Together: ~300ms
- OpenAI: ~500ms
- Anthropic: ~800ms
- Local: Varies by hardware
Throughput (Tokens/Second)
- Groq: 300+ tokens/sec
- Local (with GPU): 50-200 tokens/sec
- Fireworks: 100+ tokens/sec
- Together: 80+ tokens/sec
- OpenAI: 50-80 tokens/sec
- Anthropic: 40-60 tokens/sec
Pricing Comparison
Input Tokens (per 1M)
- Local (Ollama/llama.cpp): $0 (your hardware)
- Anyscale: $1 flat rate
- Groq: $0.10-0.80
- Together: $0.20-4.00
- Fireworks: $0.20-0.90
- OpenRouter: Varies by model
- Mistral: $2-8
- OpenAI: $0.50-30.00
- Anthropic: $3-15
Output Tokens (per 1M)
- Local: $0
- Anyscale: $1 flat rate (same as input)
- Groq: $0.10-0.80
- Together: $0.20-4.00
- OpenAI: $1.50-60.00
- Anthropic: $15-75
Special Features
OpenAI
- DALL-E image generation
- GPT-4 with vision
- Whisper transcription
- Text-to-speech
- Fine-tuning API
Anthropic
- Constitutional AI
- 200K context window
- Careful reasoning
- XML tag support
- 1M+ context window
- Native multimodal
- Multiple response candidates
- Safety settings
Perplexity
- Real-time web search
- Source citations
- Current information
- Fact checking
Groq
- Ultra-fast LPU inference
- Consistent low latency
- High throughput
- Deterministic performance
Anyscale
- Ray integration
- Schema-based JSON
- Simple flat pricing
- 30 concurrent request limit
Local Providers
- Complete privacy
- No internet required
- Custom models
- Hardware acceleration
Model Recommendations
For General Use
- OpenAI GPT-4o-mini: Best balance
- Anthropic Claude 3.5 Sonnet: Complex reasoning
- Google Gemini 1.5 Flash: Fast and capable
For Speed
- Groq Llama 3: Ultra-fast
- Fireworks: Optimized inference
- Local: No network latency
For Long Context
- Google Gemini 1.5: Up to 2M tokens
- Anthropic Claude: 200K tokens
- X.AI Grok: 128K tokens
For Cost
- Local models: Free
- Anyscale: Predictable pricing
- Groq: Competitive rates
For Privacy
- Ollama: Fully local
- llama.cpp: Complete control
- Azure/Vertex: Enterprise privacy
Provider Limitations
OpenAI
- Rate limits on popular models
- Higher pricing
- US-based data processing
Anthropic
- No function calling
- Limited availability
- Higher pricing
- Limited function calling
- Beta features
- Region restrictions
Groq
- Limited model selection
- Context window limits
- No vision support
Local
- Requires capable hardware
- Setup complexity
- No built-in scaling
Choosing the Right Provider
Consider these factors:
- Features Needed
- Function calling → OpenAI, Mistral
- Vision → OpenAI, Anthropic, Google
- Search → Perplexity
- Local → Ollama, llama.cpp
- Performance Requirements
- Ultra-low latency → Groq
- High throughput → Groq, Local GPU
- Consistent performance → Major providers
- Budget
- Unlimited budget → OpenAI, Anthropic
- Cost-conscious → Anyscale, Groq
- Zero cost → Local providers
- Compliance
- GDPR → Mistral (EU), Azure
- HIPAA → Azure, Vertex AI
- Data residency → Local, Azure regions
Next Steps
- Provider Setup - Set up providers
- Examples - Provider-specific code
- Best Practices - Optimization tips