Providers

OneLLM supports 18 providers, giving you access to 300+ language models through a unified interface.

Provider List

🚀 Major Providers

OpenAI

Models: GPT-4o, GPT-4, GPT-3.5-Turbo
Features: Function calling, JSON mode, vision, DALL-E, embeddings
Pricing: Pay per token
Best for: General purpose, production applications
Setup: OpenAI Setup Guide

Anthropic

Models: Claude 3.5 Sonnet, Claude 3 Opus/Sonnet/Haiku
Features: 200K+ context, vision support
Pricing: Pay per token
Best for: Long context, careful reasoning
Setup: Anthropic Setup Guide

Google AI Studio

Models: Gemini 1.5 Pro/Flash, Gemini Pro
Features: Multimodal, 1M+ context, JSON mode
Pricing: Free tier available
Best for: Multimodal tasks, long context
Setup: Google Setup Guide

Mistral

Models: Mistral Large/Medium/Small, Mixtral
Features: European hosting, function calling
Pricing: Pay per token
Best for: EU compliance, multilingual
Setup: Mistral Setup Guide

⚡ Fast Inference Providers

Groq

Models: Llama 3, Mixtral, Gemma
Features: Ultra-fast LPU inference, 10x faster
Pricing: Pay per token
Best for: Real-time applications, low latency
Setup: Groq Setup Guide

Together AI

Models: Llama, Mistral, CodeLlama, 50+ models
Features: Open source models, custom fine-tunes
Pricing: Simple per-token pricing
Best for: Open source models, research
Setup: Together Setup Guide

Fireworks

Models: Llama, Mixtral, Starcoder
Features: Optimized inference, function calling
Pricing: Competitive per-token
Best for: Fast open model serving
Setup: Fireworks Setup Guide

Anyscale

Models: Llama, Mistral, CodeLlama
Features: Ray integration, schema-based JSON
Pricing: $1/million tokens flat rate
Best for: Scale-out workloads
Setup: Anyscale Setup Guide

🌐 Specialized Providers

X.AI (Grok)

Models: Grok-2, Grok-1
Features: 128K context window
Pricing: Premium
Best for: Large context, reasoning
Setup: X.AI Setup Guide

Perplexity

Models: Sonar models with web search
Features: Real-time web access, citations
Pricing: Pay per request
Best for: Current information, research
Setup: Perplexity Setup Guide

DeepSeek

Models: DeepSeek Chat, DeepSeek Coder
Features: Chinese/English bilingual
Pricing: Competitive
Best for: Chinese language, coding
Setup: DeepSeek Setup Guide

Cohere

Models: Command R/R+, Embed
Features: RAG optimization, embeddings
Pricing: Enterprise/startup plans
Best for: Enterprise NLP, search
Setup: Cohere Setup Guide

🌍 Multi-Provider Gateways

OpenRouter

Models: 100+ models from all providers
Features: Unified billing, free models
Pricing: Small markup on provider prices
Best for: Model exploration, fallbacks
Setup: OpenRouter Setup Guide

☁️ Enterprise Cloud

Azure OpenAI

Models: GPT-4, GPT-3.5, DALL-E, Embeddings
Features: Enterprise SLA, VNet integration
Pricing: Same as OpenAI
Best for: Enterprise, compliance
Setup: Azure Setup Guide

AWS Bedrock

Models: Claude, Llama, Titan, Stable Diffusion
Features: AWS integration, multiple providers
Pricing: Pay per use
Best for: AWS ecosystem
Setup: Bedrock Setup Guide

Google Vertex AI

Models: Gemini, PaLM, Codey
Features: MLOps platform, enterprise
Pricing: Enterprise pricing
Best for: GCP ecosystem
Setup: Vertex AI Setup Guide

💻 Local Providers

Ollama

Models: Any GGUF model
Features: Local hosting, model management
Pricing: Free (self-hosted)
Best for: Privacy, offline use
Setup: Ollama Setup Guide

llama.cpp

Models: Any GGUF model
Features: Direct inference, GPU support
Pricing: Free (self-hosted)
Best for: Maximum control, embedded
Setup: llama.cpp Setup Guide

Provider Comparison

By Speed

Groq - Ultra-fast LPU (100+ tokens/sec)
Fireworks - Optimized inference
Together - Fast parallel inference
OpenAI - Reliable performance
Local - Depends on hardware

By Context Length

Google Gemini 1.5 - 1M+ tokens
Anthropic Claude - 200K tokens
X.AI Grok - 128K tokens
Perplexity - 128K tokens
OpenAI GPT-4 - 128K tokens

By Price (Lowest to Highest)

Local (Ollama/llama.cpp) - Free
Anyscale - $1/M tokens flat
Together/Fireworks - Competitive
OpenRouter - Various options
OpenAI/Anthropic - Premium

By Features

Function Calling: OpenAI, Mistral, Groq, Anyscale
Vision: OpenAI, Anthropic, Google, Vertex AI
Web Search: Perplexity
JSON Mode: OpenAI, Google, Mistral, Groq
Embeddings: OpenAI, Cohere, Google, Bedrock

Model Naming Convention

Models are specified using a provider prefix to clearly identify the source:

Provider	Format	Example
OpenAI	`openai/{model}`	`openai/gpt-4`
Google	`google/{model}`	`google/gemini-pro`
Anthropic	`anthropic/{model}`	`anthropic/claude-3-opus`
Groq	`groq/{model}`	`groq/llama3-70b`
Mistral	`mistral/{model}`	`mistral/mistral-large`
Ollama	`ollama/{model}@host:port`	`ollama/llama3:8b@localhost:11434`
llama.cpp	`llama_cpp/{model.gguf}`	`llama_cpp/llama-3-8b-q4_K_M.gguf`
XAI (Grok)	`xai/{model}`	`xai/grok-beta`
Cohere	`cohere/{model}`	`cohere/command-r-plus`
AWS Bedrock	`bedrock/{model}`	`bedrock/claude-3-5-sonnet`

Additional Examples

# Standard models
"openai/gpt-4o-mini"
"anthropic/claude-3-5-sonnet-20241022"
"google/gemini-1.5-flash"
"groq/llama3-70b-8192"

# Models with organization prefixes
"together/meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo"
"fireworks/accounts/fireworks/models/llama-v3p1-70b-instruct"

# Local models
"ollama/llama3:latest"
"llama_cpp/models/llama-3-8b-instruct.Q4_K_M.gguf"

Quick Start

from onellm import OpenAI

# Client works with all providers
client = OpenAI()

# Use any provider by changing model name
response = client.chat.completions.create(
    model="anthropic/claude-3-5-sonnet-20241022",  # Just change this
    messages=[{"role": "user", "content": "Hello!"}]
)

Choosing a Provider

For Production

OpenAI: Most reliable, best ecosystem
Anthropic: Best for complex reasoning
Azure OpenAI: Enterprise requirements

For Speed

Groq: Ultra-fast responses
Fireworks: Fast and affordable
Local: No network latency

For Cost

Local: Free (your hardware)
Anyscale: Predictable pricing
OpenRouter: Access to free models

For Privacy

Ollama: Fully local
llama.cpp: Complete control
Azure/Vertex: Enterprise privacy

Next Steps

Provider Setup - Detailed setup instructions
Provider Capabilities - Feature comparison matrix
Examples - Provider-specific examples
Best Practices - Choosing providers

Providers

Provider List

🚀 Major Providers

OpenAI

Anthropic

Google AI Studio

Mistral

⚡ Fast Inference Providers

Groq

Together AI

Fireworks

Anyscale

🌐 Specialized Providers

X.AI (Grok)

Perplexity

DeepSeek

Cohere

🌍 Multi-Provider Gateways

OpenRouter

☁️ Enterprise Cloud

Azure OpenAI

AWS Bedrock

Google Vertex AI

💻 Local Providers

Ollama

llama.cpp

Provider Comparison

By Speed

By Context Length

By Price (Lowest to Highest)

By Features

Model Naming Convention

Additional Examples

Quick Start

Choosing a Provider

For Production

For Speed

For Cost

For Privacy

Next Steps

Table of contents