anaconda-ai#

Download, launch, and integrate AI models curated by Anaconda. This package provides programmatic access and an SDK to access the curated models, download them, and start servers.

Below you will find documentation for

Install#

conda install -c anaconda-cloud anaconda-ai

Backends#

The anaconda-ai package is the CLI/SDK for a number of backends that provide API endpoint to list and download models and manage running servers. All activities performed by the CLI, SDK, and integrations here are visible within the backend application or site.

The available backends are

Backend name

Configuration value

Supports

Default

Anaconda AI Navigator

"ai-navigator"

Models,Servers,Server Parameters,VectorDB

DEFAULT

Anaconda Desktop (beta)

"anaconda-desktop"

Models,Servers,Server Parameters,VectorDB

Anaconda AI Catalyst (beta)

"ai-catalyst"

Models,Servers

Configuration#

Anaconda AI supports configuration management in the ~/.anaconda/config.toml file. The following parameters are supported under the table [plugin.ai] or by setting ANACONDA_AI_<parameter>=<value> environment variables.

Parameter

Environment variable

Description

Default value

backend

ANACONDA_AI_BACKEND

The backend API

"ai-navigator"

stop_server_on_exit

ANACONDA_AI_STOP_SERVER_ON_EXIT

For any server started during a Python interpreter session stop the server when the interpreter stops. Does not affect servers that were previously running

true

server_operations_timeout

ANACONDA_AI_SERVER_OPERATIONS_TIMEOUT

Timeout waiting for a server to start or stop

30

show_blocked_models

ANACONDA_AI_SHOW_BLOCKED_MODELS

Toggle display of blocked models if backend supports it

false

Configuration CLI#

Use anaconda ai config command to apply changes to the ~/.anaconda/config.toml. See anaconda ai config --help for details.

Declaring model quantization files#

In the CLI, SDK, and integrations below individual model quantizations are are referenced according the following scheme.

[<author>/]<model_name></ or _><quantization>[.<format>]

Fields surrounded by [] are optional. The essential elements are the model name and quantization method separated by either / or _. The supported quantization methods are

  • Q4_K_M

  • Q5_K_M

  • Q6_K

  • Q8_0

CLI#

The CLI subcommands within anaconda ai provide full access to list and download model files, start and stop servers through the backend.

Command

Description

models

Show all models or detailed information about a single model with downloaded model files indicated in bold

download

Download a model file using model name and quantization

launch

Launch a server for a model file

servers

Show all running servers or detailed information about a single server

stop

Stop a running server by id

launch-vectordb

Starts a pg vector db (not supported by all backends)

See the --help for each command for more details.

SDK#

The SDK actions are initiated by creating a client connection to the backend.

from anaconda_ai import AnacondaAIClient

client = AnacondaAIClient()

The client provides two top-level accessors .models and .servers.

Models#

The .models attribute provides actions to list available models and download specific quantization files.

Method

Return

Description

.list()

List[ModelSummary]

List all available and downloaded models

.get('<model-name>')

Model

retrieve metadata about a model

.download('<model>/<quantization>')

None

Download a model quantization file

.delete('<model>/<quantization>')

None

Delete a downloaded model quantization file

The Model class holds metadata for each available model

Attribute/Method

Return

Description

.name

string

The name of the model

.description

str

Description of the model provided by the original author

.num_parameters

int

Number of parameters for the model

.trained_for

str

Either 'sentence-similarity' or 'text-generation'

.context_window_size

int

Length of the context window for the model

.quantized_files

List[ModelQuantization]

List of available quantization files

.get_quantization('<method>')

ModelQuantization

Retrieve metadata for a single quantization file

.download('<method>')

None

Direct call to download a quantization file

.delete('<method>')

None

Delete a downloaded quantization file

Each ModelQuantization object provides

Attribute/Method

Return

Description

.identifier

str

The file name as it will appear on disk

.sha256

str

The sha256 checksum of the model file

.quant_method

str

The quantization method

.size_bytes

int

Size of the model file in bytes

.max_ram_usage

int

The total amount of ram needed to load the model in bytes

.is_downloaded

bool

True if the model file has been downloaded

.local_path

str

Will be non-null if the model file has been downloaded

.download()

None

Direct call to download the quantization file

.delete()

None

Delete the downloaded quantization file

Downloading models#

There are three methods to download a quantization file:

  1. Calling .download() from a ModelQuantization object

    • For example: client.models.get('<model>').get_quantization('<method>').download()

  2. Calling .download('<method>') from a Model object

    • For example: client.models.get('<model>').download('<method>')

  3. client.models.download('quantized-file-name')

    • the .models.download() method accepts two types of input: string name of the model with quantization or a ModelQuantization object

If the model file has already been downloaded this function returns immediately. Otherwise a progress bar is shown showing the download progress.

Servers#

The .servers accessor provides methods to list running servers, start new servers, and stop servers.

Method

Return

Description

.list

List[Server]

List all running servers

.get('<server-id>')

Server

Lookup server object by identifier

.match

Server

Find a running server that matches supplied configuration

.create

Server

Create a new server configuration with supplied model file and API parameters

.start('<server-id>')

None

Start the API server

.status('<server-id>')

str

Return the status for a server id

.stop('<server-id>')

None

Stop a running server

.delete('<server-id>')

None

Completely remove record of server configuration

Creating servers#

The .create method will create a new server configuration. If there is already a running server with the same model file and API parameters the matched server configuration is returned rather than creating and starting a new server.

The .create function has the following inputs

Argument

Type

Description

model

str or ModelQuantization

The string name for the quantized model or a ModelQuantization object

extra_options

dict

Control server configuration supported by the backend

By default creating a server configuration will

  • download the model file if required by the backend

  • run the server API

For example to create a server with the OpenHermes model with default values

from anaconda_ai import get_default_client

client = get_default_client()
server = client.servers.create(
  'OpenHermes-2.5-Mistral-7B/Q4_K_M',
)

Starting servers#

When a server is created it is not automatically started. A server can be started and stopped in a number of ways

From the server object

server.start()
server.stop()

From the .servers accessor

client.servers.start(server)
client.servers.stop(server)

Alternatively you can use .create as a context manager, which will automatically stop the server on exit of the indented block.

with client.servers.create('OpenHermes-2.5-Mistral-7B/Q4_K_M') as server:
    openai_client = server.openai_client()
    # make requests to the server

Server attributes#

  • .status: Text status of the server

  • .is_running: Boolean status, True if the server is in the ‘running’ state

  • .start(): Start the server, optional can be used as a context manager to auto stop

  • .stop(): Stop the server

  • .url: is the full url to the running server

  • .openai_url: OpenAI compatibility url

  • .openai_client(): creates a pre-configured OpenAI client for this url

  • .async_openai_client(): creates a pre-configured Async OpenAI client for this url

Each of .openai_client() and async_openai_client() allow extra keyword parameters to pass to the client initialization.

Server Configuration Options#

Not all backends support extra_options= on server create.

The AI Navigator backend supports llama-server options passed as snake-case dictionary keys to client.servers.create() with the extra_options kwarg. To enable flags set the value to True.

Here are some notes on specific server parameter behavior

Dict key

Notes

port

Start server on specific port, 0 or missing means start on random port

jinja

Set to True to enable tool calling for models trained to do so

For example:

from anaconda_ai import AnacondaAIClient

client = AnacondaAIClient()
server = client.servers.create(
  'OpenHermes-2.5-Mistral-7B/Q4_K_M',
  extra_options={
    "ctx_size": 512,
    "jinja": True
  }
)

Vector Db#

Creates a postgres vector db and returns the connection information. VectorDB is not supported by all backends.

anaconda ai launch-vectordb

LLM#

To use the llm integration you will need to also install llm package

conda install -c conda-forge llm

then you can list downloaded model quantizations

llm models

or to show only the Anaconda AI models

llm models list -q anaconda

When utilizing a model it will first ensure that the model has been downloaded and start the server though the backend. Standard OpenAI parameters are supported.

llm -m 'anaconda:meta-llama/llama-2-7b-chat-hf_Q4_K_M.gguf' -o temperature 0.1 'what is pi?'

Additionally, server configuration parameters like ctx_size can be passed

llm -m 'anaconda:meta-llama/llama-2-7b-chat-hf_Q4_K_M.gguf' -o temperature 0.1 -o ctx_size 512 'what is pi?'

Langchain#

The LangChain integration provides Chat and Embedding classes that automatically manage downloading and starting servers. You will need the langchain-openai package.

from langchain.prompts import ChatPromptTemplate
from anaconda_ai.integrations.langchain import AnacondaQuantizedModelChat, AnacondaQuantizedModelEmbeddings

prompt = ChatPromptTemplate.from_template("tell me a joke about {topic}")
model = AnacondaQuantizedModelChat(model_name='meta-llama/llama-2-7b-chat-hf_Q4_K_M.gguf')

chain = prompt | model

message = chain.invoke({'topic': 'python'})

The following keyword arguments are supported:

  • extra_options: Dict, see create servers above

LlamaIndex#

You will need at least the llama-index-llms-openai package installed to use the integration.

from anaconda_ai.integrations.llama_index import AnacondaModel

llm = AnacondaModel(
    model='OpenHermes-2.5-Mistral-7B_q4_k_m'
)

The AnacondaModel class supports the following arguments

  • model: Name of the model using the pattern defined above

  • system_prompt: Optional system prompt to apply to completions and chats

  • temperature: Optional temperature to apply to all completions and chats (default is 0.1)

  • max_tokens: Optional Max tokens to predict (default is to let the model decide when to finish)

  • extra_options: Optional dict, see server creation above

LiteLLM#

This provides a CustomLLM provider for use with litellm. But, since litellm does not currently support entrypoints to register the provider, the user must import the module first.

import litellm
import anaconda_ai.integrations.litellm

response = litellm.completion(
    'anaconda/openhermes-2.5-mistral-7b/q4_k_m',
    messages=[{'role': 'user', 'content': 'what is pi?'}]
)

Supported usage:

  • completion (with and without stream=True)

  • acompletion (with and without stream=True)

  • Most OpenAI inference parameters

    • n: number of completions is not supported

  • Server parameters can be passed as dictionaries to the optional_params keyword argument in the key “server”

    • optional_params={"server": {"ctx_size": 512}}

DSPy#

Since DSPy uses LiteLLM, Anaconda models can be used with dspy. Streaming and async are supported for raw LLM calls and for modules like Predict or ChainofThought .

import dspy
import anaconda_ai.integrations.litellm

lm = dspy.LM('anaconda/openhermes-2.5-mistral-7b/q4_k_m')
dspy.configure(lm=lm)

chain = dspy.ChainOfThought("question -> answer")
chain(question="Who are you?")

dspy.LM supports optional_params= keyword argument as explained in the previous section.

PydanticAI#

The Pydantic AI integration provides ChatModel and EmbeddingModel support. Here’s an example using a chat model in an agent.

from anaconda_ai.integrations.pydantic_ai import (
    AnacondaChatModel,
    AnacondaChatModelSettings,
)
settings = AnacondaChatModelSettings(temperature=0.1, extra_options={"ctx_size": 1024})

model = AnacondaChatModel(
    "OpenHermes-2.5-Mistral-7B/q4_k_m",
    settings=settings,
)

And embedding

embed = AnacondaEmbeddingModel(
    "bge-small-en-v1.5/q4_k_m"
)

result = await embed.embed("cat", input_type="document")

Instructor#

This integration monkeypatches the instructor.from_provider() method on import. This is needed until the provider can be added to the upstream Instructor package.

import instructor
from pydantic import BaseModel
import anaconda_ai.integrations.instructor  # noqa: F401

client = instructor.from_provider(
    "anaconda/OpenHermes-2.5-Mistral-7B/Q4_K_M", extra_options={"ctx_size": 512}
)

class UserInfo(BaseModel):
    name: str
    age: int


user_info = await client.create(
    response_model=UserInfo,
    messages=[{"role": "user", "content": "John Doe is 30 years old."}],
)

Panel#

A callback is available to work with Panel’s ChatInterface

To use it you will need to have panel, httpx, and numpy installed.

Here’s an example application that can be written in Python script or Jupyter Notebook

import panel as pn
from anaconda_ai.integrations.panel import AnacondaModelHandler

pn.extension('echarts', 'tabulator', 'terminal')

llm = AnacondaModelHandler('TinyLlama/TinyLlama-1.1B-Chat-v1.0_Q4_K_M.gguf', display_throughput=True)

chat = pn.chat.ChatInterface(
    callback=llm.callback,
    show_button_name=False)

chat.send(
    "I am your assistant. How can I help you?",
    user=llm.model_id, avatar=llm.avatar, respond=False
)
chat.servable()

the AnacondaModelHandler supports the following keyword arguments

  • display_throughput: Show a speed dial next to the response. Default is False

  • system_message: Default system message applied to all responses

  • client_options: Optional dict passed as kwargs to chat.completions.create

  • api_params: Optional dict or APIParams object

  • load_params: Optional dict or LoadParams object

  • infer_params: Optional dict or InferParams object

Setup for development#

Ensure you have conda installed. Then run:

make setup

Run the unit tests#

make test

Run the unit tests across isolated environments with tox#

make tox