I am working on a project where i want to save the embeddings in vector database. need some help or resources to deploy chroma db for production use
Update 1
On GCP or any other platform, you can start a new instance. Install docker and docker compose. Then run the following docker compose file. The Chroma db will be up and running. You can then access the db at external-ip:8000
# docker-compose.yml
version: '3.3'
services:
server:
image: ghcr.io/chroma-core/chroma:latest
volumes:
- index_data:/index_data
environment:
- CHROMA_DB_IMPL=clickhouse
- CLICKHOUSE_HOST=clickhouse
- CLICKHOUSE_PORT=8123
ports:
- 8000:8000
depends_on:
- clickhouse
clickhouse:
image: clickhouse/clickhouse-server:22.9-alpine
environment:
- ALLOW_EMPTY_PASSWORD=yes
- CLICKHOUSE_TCP_PORT=9000
- CLICKHOUSE_HTTP_PORT=8123
ports:
- '8123:8123'
- '9000:9000'
volumes:
- clickhouse_data:/bitnami/clickhouse
- backups:/backups
- ./config/backup_disk.xml:/etc/clickhouse-server/config.d/backup_disk.xml
- ./config/chroma_users.xml:/etc/clickhouse-server/users.d/chroma.xml
volumes:
clickhouse_data:
driver: local
index_data:
driver: local
backups:
driver: local
also create config/chroma_users.xml
file
<clickhouse>
<profiles>
<default>
<allow_experimental_lightweight_delete>1</allow_experimental_lightweight_delete>
<mutations_sync>1</mutations_sync>
</default>
</profiles>
</clickhouse>
Original answer
As of right now, chroma team has only published details about how to deploy db on AWS https://docs.trychroma.com/deployment. But it is in alpha and uses AWS EC2 to deploy the db.
I did some research on deploying the db with Kubernetes. You can use the docker image to create the deployment https://github.com/chroma-core/chroma/pkgs/container/chroma. It is one option, but I haven't tested it out yet. I am working on it, will update.
Chroma Deployment commands
docker pull chromadb/chroma
docker run -d -p 8000:8000 chromadb/chroma
Access using the below snippet
import chromadb
chroma_client = chromadb.HttpClient(
host=os.getenv("DB_HOST"),
port=8000,
settings=Settings(allow_reset=True, anonymized_telemetry=False),
)
The following repo has instructions to deploy ChromaDB on GCP with Cloud Run, including a persistent storage on GCS: https://github.com/HerveMignot/chromadb-on-gcp.
ChromaDB is deployed using Cloud Run (serverless, can scale down to 0 instances if not used).
The deployment uses the ChromaDB Docker image available on Dockerhub.
A GCS bucket is created/used and mounted as a volume in the container to store ChromaDB’s database files, ensuring data persists across container restarts and redeployments.
To avoid issues with concurrent access to the storage, the number of instances is capped at 1.
© 2022 - 2024 — McMap. All rights reserved.