IBM watsonx.ai
LiteLLM supports all IBM watsonx.ai foundational models and embeddings.
Environment Variables​
os.environ["WATSONX_URL"] = "" # (required) Base URL of your WatsonX instance
# (required) either one of the following:
os.environ["WATSONX_APIKEY"] = "" # IBM cloud API key
os.environ["WATSONX_TOKEN"] = "" # IAM auth token
# optional - can also be passed as params to completion() or embedding()
os.environ["WATSONX_PROJECT_ID"] = "" # Project ID of your WatsonX instance
os.environ["WATSONX_DEPLOYMENT_SPACE_ID"] = "" # ID of your deployment space to use deployed models
os.environ["WATSONX_ZENAPIKEY"] = "" # Zen API key (use for long-term api token)
See here for more information on how to get an access token to authenticate to watsonx.ai.
Usage​
Chat Completion
import os
from litellm import completion
os.environ["WATSONX_URL"] = ""
os.environ["WATSONX_APIKEY"] = ""
response = completion(
model="watsonx/meta-llama/llama-3-1-8b-instruct",
messages=[{ "content": "what is your favorite colour?","role": "user"}],
project_id="<my-project-id>"
)
Usage - Streaming​
Streaming
import os
from litellm import completion
os.environ["WATSONX_URL"] = ""
os.environ["WATSONX_APIKEY"] = ""
os.environ["WATSONX_PROJECT_ID"] = ""
response = completion(
model="watsonx/meta-llama/llama-3-1-8b-instruct",
messages=[{ "content": "what is your favorite colour?","role": "user"}],
stream=True
)
for chunk in response:
print(chunk)
Usage - Models in deployment spaces​
Models deployed to a deployment space (e.g.: tuned models) can be called using the deployment/<deployment_id> format.
Deployment Space
import litellm
response = litellm.completion(
model="watsonx/deployment/<deployment_id>",
messages=[{"content": "Hello, how are you?", "role": "user"}],
space_id="<deployment_space_id>"
)
Usage - Embeddings​
Embeddings
from litellm import embedding
response = embedding(
model="watsonx/ibm/slate-30m-english-rtrvr",
input=["What is the capital of France?"],
project_id="<my-project-id>"
)
LiteLLM Proxy Usage​
1. Save keys in your environment​
export WATSONX_URL=""
export WATSONX_APIKEY=""
export WATSONX_PROJECT_ID=""
2. Start the proxy​
- CLI
- config.yaml
$ litellm --model watsonx/meta-llama/llama-3-8b-instruct
config.yaml
model_list:
- model_name: llama-3-8b
litellm_params:
model: watsonx/meta-llama/llama-3-8b-instruct
api_key: "os.environ/WATSONX_API_KEY"
3. Test it​
- Curl Request
- OpenAI SDK
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "llama-3-8b",
"messages": [
{
"role": "user",
"content": "what is your favorite colour?"
}
]
}'
import openai
client = openai.OpenAI(
api_key="anything",
base_url="http://0.0.0.0:4000"
)
response = client.chat.completions.create(
model="llama-3-8b",
messages=[{"role": "user", "content": "what is your favorite colour?"}]
)
print(response)
Supported Models​
| Model Name | Command |
|---|---|
| Llama 3.1 8B Instruct | completion(model="watsonx/meta-llama/llama-3-1-8b-instruct", messages=messages) |
| Llama 2 70B Chat | completion(model="watsonx/meta-llama/llama-2-70b-chat", messages=messages) |
| Granite 13B Chat V2 | completion(model="watsonx/ibm/granite-13b-chat-v2", messages=messages) |
| Mixtral 8X7B Instruct | completion(model="watsonx/ibm-mistralai/mixtral-8x7b-instruct-v01-q", messages=messages) |
For all available models, see watsonx.ai documentation.
Supported Embedding Models​
| Model Name | Function Call |
|---|---|
| Slate 30m | embedding(model="watsonx/ibm/slate-30m-english-rtrvr", input=input) |
| Slate 125m | embedding(model="watsonx/ibm/slate-125m-english-rtrvr", input=input) |
For all available embedding models, see watsonx.ai embedding documentation.