Working with a cluster

Definition

Clusters in BagelDB serve as powerful containers for large datasets, encapsulating embeddings — high-dimensional vectors that represent various data forms, such as text, images, or audio. These clusters enable efficient similarity searches, which are fundamental to a wide range of applications, from recommendation systems and search engines to data analytics tools.

Cluster Management Options in Bagel

BagelDB provides a comprehensive set of options for managing your clusters effectively:

  • Public vs. Private Clusters: Choose to make your cluster publicly accessible for broader collaboration or keep it private for confidential data, secured with API keys and a unique user_id.

  • Embedding Models: Select the most suitable embedding model based on your data type to ensure optimal data representation and retrieval efficiency:

    • bagel-text: Tailored for textual data, this model generates embeddings with a dimensionality of 768, capturing the intricate semantic relationships within text.

    • bagel-multimodal: Perfect for datasets containing both text and images, this model creates comprehensive embeddings with a dimensionality of 1408, reflecting the multifaceted nature of multimedia content.

    • custom: This option is ideal for scenarios requiring the use of precomputed embeddings. You can provide your embeddings along with their dimensions. This model is invaluable for specialized data types or proprietary embedding algorithms. Consistency in the dimensions of your custom embeddings is crucial for maintaining query performance and accuracy. When creating a cluster with this model, you must specify the cluster_dimension to accommodate your precomputed embeddings.

Create or Retrieve a Cluster

# Creating a bagel-text cluster
cluster = client.get_or_create_cluster(name = "my-text-cluster", embedding_model = "bagel-text")

# Creating a mutimodal cluster
cluster = client.get_or_create_cluster(name = "my-multi-modal-cluster", embedding_model = "bagel-multimodal")

# Creating a custom-embedding cluster
cluster = client.get_or_create_cluster(name = "my-custom-cluster", embedding_model = "custom", dimensions=precomputed_embedding_dimension)

Add an embeddings

# Adding documents to the cluster
cluster.add(
  documents=["doc1", "doc2"],
  metadatas=[{"source": "notion"}, {"source": "google-doc"}],
  ids=["id1", "id2"]
)

The add() method accepts:

  • documents: Texts of the documents.

  • metadatas: Metadata objects for each document.

  • embeddings: Precomputed embeddings, if available.

  • ids: Unique identifiers for each document.

  • Note: If only documents are provided, embeddings will be automatically generated using the cluster's embedding function.

# Query the cluster for similar results
results = cluster.find(
    query_texts=["query text"],
    n_results=3
)

print(results)

The find() method:

  • Utilizes either query_embeddings or query_texts to identify and return the top n_results closest matches.

Get embeddings by ID or filter

results = cluster.get(ids=["id1", "id2"])

# Filter by metadataresults
cluster.get(where={"color":"red"})

Adding Images

You can enhance your cluster by adding images and generating embeddings directly from the image pixels, enabling a robust visual search functionality within your application powered by BagelDB.

Use the add_image method with the desired image path:

cluster.add_image(image_path)

This process will:

  • Upload the base_64 encoding of the image to the server.

  • Generate an embedding vector based on the image pixels.

  • Index the image within the cluster using the generated embedding.

Important Considerations:

  • Supported Formats: BagelDB accepts JPEG, PNG, BMP, and GIF image formats.

  • Embedding Generation Time: Generating embeddings from images may take longer than generating text embeddings due to the complexity of visual data.

Delete the cluster

client.delete_cluster(name=name)

Conclusion

Whether you're working solo or in a team, our platform ensures that your cluster data remains securely stored and accessible, even after your session ends. Explore our user-friendly Python Clien, JavaScript Client, or dive into our comprehensive API references to unlock the full potential of BagelDB

Last updated