AWS Bedrock Provider
The AWS Bedrock provider enables access to foundation models from multiple providers (Anthropic, Meta, Mistral, Amazon, AI21 Labs, Cohere) through AWS’s fully managed service.
Installation
# Install OneLLM with Bedrock support
pip install "onellm[bedrock]"
# Or install boto3 separately
pip install boto3
Configuration
AWS Credentials
The Bedrock provider supports multiple authentication methods:
- AWS CLI Configuration (Recommended)
aws configure # Enter your AWS Access Key ID, Secret Access Key, and region
- Environment Variables
export AWS_ACCESS_KEY_ID="your-access-key" export AWS_SECRET_ACCESS_KEY="your-secret-key" export AWS_DEFAULT_REGION="us-east-1"
- AWS Profile
from onellm import Client client = Client() # Uses profile from bedrock.json or default profile
- IAM Role (for EC2/Lambda/ECS)
- Automatically uses instance/task role
bedrock.json Configuration
Create a bedrock.json
file in your project root:
{
"profile": "bedrock",
"region": "us-east-1"
}
Required IAM Permissions
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream",
"bedrock:Converse",
"bedrock:ConverseStream"
],
"Resource": "*"
}
]
}
Model Access
Important: AWS Bedrock requires explicit model access. You must request access to models in the AWS Console:
- Navigate to Amazon Bedrock in AWS Console
- Go to “Model access”
- Request access to desired models
- Wait for approval (usually instant for most models)
Usage Examples
Basic Chat Completion
import asyncio
from onellm import Client
client = Client()
async def main():
response = await client.chat.completions.create(
model="bedrock/claude-3-5-sonnet",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
max_tokens=200,
temperature=0.7
)
print(response.choices[0].message.content)
asyncio.run(main())
Streaming Response
async def stream_example():
stream = await client.chat.completions.create(
model="bedrock/claude-3-haiku", # Faster model for streaming
messages=[{"role": "user", "content": "Write a story about a robot."}],
max_tokens=500,
stream=True
)
async for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Multi-Modal (Vision)
import base64
async def vision_example():
# Read and encode image
with open("image.jpg", "rb") as f:
image_data = base64.b64encode(f.read()).decode()
response = await client.chat.completions.create(
model="bedrock/claude-3-5-sonnet", # or nova-pro
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{image_data}"
}
}
]
}],
max_tokens=300
)
print(response.choices[0].message.content)
Embeddings
async def embedding_example():
response = await client.embeddings.create(
model="bedrock/titan-embed-text-v2",
input="The quick brown fox jumps over the lazy dog."
)
print(f"Embedding dimension: {len(response.data[0].embedding)}")
print(f"First 5 values: {response.data[0].embedding[:5]}")
Supported Models
Chat Models
Anthropic Claude
bedrock/claude-3-5-sonnet
- Latest, most capablebedrock/claude-3-5-haiku
- Fast, efficientbedrock/claude-3-opus
- Most powerful (legacy)bedrock/claude-3-sonnet
- Balancedbedrock/claude-3-haiku
- Fastest
Meta Llama
bedrock/llama3-2-90b
- Large multimodalbedrock/llama3-2-11b
- Medium multimodalbedrock/llama3-2-3b
- Small efficientbedrock/llama3-2-1b
- Tiny edge modelbedrock/llama3-1-405b
- Largest modelbedrock/llama3-1-70b
- Large modelbedrock/llama3-1-8b
- Medium model
Amazon Nova
bedrock/nova-pro
- Multimodal reasoningbedrock/nova-lite
- Fast, cost-effectivebedrock/nova-micro
- Ultra-fast responses
Mistral
bedrock/mistral-7b
- Efficient modelbedrock/mixtral-8x7b
- Mixture of expertsbedrock/mistral-large
- Most capable
Others
bedrock/command-r
- Cohere Command Rbedrock/command-r-plus
- Cohere Command R+bedrock/jamba-1-5-large
- AI21 Jamba Largebedrock/jamba-1-5-mini
- AI21 Jamba Mini
Embedding Models
bedrock/titan-embed-text
- Amazon Titan Embeddings v1bedrock/titan-embed-text-v2
- Amazon Titan Embeddings v2bedrock/embed-english
- Cohere English embeddingsbedrock/embed-multilingual
- Cohere multilingual embeddings
Advanced Features
Cross-Region Inference
Bedrock can automatically route requests to the best available region:
response = await client.chat.completions.create(
model="bedrock/claude-3-5-sonnet",
messages=[{"role": "user", "content": "Hello"}],
# Cross-region inference is enabled by default
)
Using Different AWS Regions
# Method 1: Via bedrock.json
{
"profile": "default",
"region": "eu-west-1"
}
# Method 2: Environment variable
export AWS_DEFAULT_REGION="eu-west-1"
# Method 3: When initializing provider (requires custom client setup)
Using Full Model IDs
You can also use full Bedrock model IDs:
response = await client.chat.completions.create(
model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0",
messages=[{"role": "user", "content": "Hello"}]
)
Cost Optimization
- Choose the right model size - Use smaller models when possible
- Use streaming - Get responses faster and stop generation early if needed
- Batch requests - Bedrock supports batch processing for 50% cost reduction
- Monitor usage - Use CloudWatch to track token usage
Common Issues
Model Access Denied
Error: Model access denied: The requested model anthropic.claude-3-opus-20240229-v1:0 is not supported for inference in your account.
Solution: Request model access in AWS Bedrock console
Rate Limits
Error: Too many requests, please try again later.
Solution: Implement retry logic or request quota increase
Region Availability
Not all models are available in all regions. Check AWS documentation for model availability by region.
Limitations
- No file upload/download support (images must be base64-encoded in messages)
- No explicit JSON mode (use prompt engineering or tool calling)
- Audio/video input not currently supported
- Model-specific features may vary
Best Practices
- Use the Converse API - The provider uses Bedrock’s unified Converse API for consistency
- Handle errors gracefully - Implement retry logic for transient errors
- Monitor costs - Set up CloudWatch alarms for usage
- Choose appropriate models - Balance capability vs. cost
- Request model access early - Some models require manual approval