How to deploy chroma database (vector database) in production
Asked Answered
R

3

9

I am working on a project where i want to save the embeddings in vector database. need some help or resources to deploy chroma db for production use

Reject answered 19/6, 2023 at 9:59 Comment(0)
O
4

Update 1

On GCP or any other platform, you can start a new instance. Install docker and docker compose. Then run the following docker compose file. The Chroma db will be up and running. You can then access the db at external-ip:8000

# docker-compose.yml
version: '3.3'

services:
    server:
        image: ghcr.io/chroma-core/chroma:latest
        volumes:
            - index_data:/index_data
        environment:
            - CHROMA_DB_IMPL=clickhouse
            - CLICKHOUSE_HOST=clickhouse
            - CLICKHOUSE_PORT=8123
        ports:
            - 8000:8000
        depends_on:
            - clickhouse
    
    clickhouse:
        image: clickhouse/clickhouse-server:22.9-alpine
        environment:
            - ALLOW_EMPTY_PASSWORD=yes
            - CLICKHOUSE_TCP_PORT=9000
            - CLICKHOUSE_HTTP_PORT=8123
        ports:
            - '8123:8123'
            - '9000:9000'
        volumes:
            - clickhouse_data:/bitnami/clickhouse
            - backups:/backups
            - ./config/backup_disk.xml:/etc/clickhouse-server/config.d/backup_disk.xml
            - ./config/chroma_users.xml:/etc/clickhouse-server/users.d/chroma.xml

volumes:
    clickhouse_data:
        driver: local
    index_data:
        driver: local
    backups:
        driver: local

also create config/chroma_users.xml file

<clickhouse>
    <profiles>
        <default>
            <allow_experimental_lightweight_delete>1</allow_experimental_lightweight_delete>
            <mutations_sync>1</mutations_sync>
        </default>
    </profiles>
</clickhouse>

Original answer

As of right now, chroma team has only published details about how to deploy db on AWS https://docs.trychroma.com/deployment. But it is in alpha and uses AWS EC2 to deploy the db.

I did some research on deploying the db with Kubernetes. You can use the docker image to create the deployment https://github.com/chroma-core/chroma/pkgs/container/chroma. It is one option, but I haven't tested it out yet. I am working on it, will update.

Occidental answered 5/7, 2023 at 4:40 Comment(0)
P
2

Chroma Deployment commands

docker pull chromadb/chroma
docker run -d -p 8000:8000 chromadb/chroma

Access using the below snippet

import chromadb
chroma_client = chromadb.HttpClient(
   host=os.getenv("DB_HOST"),
   port=8000,
   settings=Settings(allow_reset=True, anonymized_telemetry=False),
)
Pangolin answered 4/5 at 10:26 Comment(0)
C
0

The following repo has instructions to deploy ChromaDB on GCP with Cloud Run, including a persistent storage on GCS: https://github.com/HerveMignot/chromadb-on-gcp.

  1. ChromaDB is deployed using Cloud Run (serverless, can scale down to 0 instances if not used).

  2. The deployment uses the ChromaDB Docker image available on Dockerhub.

  3. A GCS bucket is created/used and mounted as a volume in the container to store ChromaDB’s database files, ensuring data persists across container restarts and redeployments.

To avoid issues with concurrent access to the storage, the number of instances is capped at 1.

Capitular answered 30/8 at 11:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.