Working with a cluster
Definition
Clusters in BagelDB serve as powerful containers for large datasets, encapsulating embeddings — high-dimensional vectors that represent various data forms, such as text, images, or audio. These clusters enable efficient similarity searches, which are fundamental to a wide range of applications, from recommendation systems and search engines to data analytics tools.
Cluster Management Options in Bagel
BagelDB provides a comprehensive set of options for managing your clusters effectively:
Public vs. Private Clusters: Choose to make your cluster publicly accessible for broader collaboration or keep it private for confidential data, secured with API keys and a unique
user_id
.Embedding Models: Select the most suitable embedding model based on your data type to ensure optimal data representation and retrieval efficiency:
bagel-text
: Tailored for textual data, this model generates embeddings with a dimensionality of 768, capturing the intricate semantic relationships within text.bagel-multimodal
: Perfect for datasets containing both text and images, this model creates comprehensive embeddings with a dimensionality of 1408, reflecting the multifaceted nature of multimedia content.custom
: This option is ideal for scenarios requiring the use of precomputed embeddings. You can provide your embeddings along with their dimensions. This model is invaluable for specialized data types or proprietary embedding algorithms. Consistency in the dimensions of your custom embeddings is crucial for maintaining query performance and accuracy. When creating a cluster with this model, you must specify thecluster_dimension
to accommodate your precomputed embeddings.
Create or Retrieve a Cluster
Add an embeddings
The add()
method accepts:
documents
: Texts of the documents.metadatas
: Metadata objects for each document.embeddings
: Precomputed embeddings, if available.ids
: Unique identifiers for each document.Note: If only
documents
are provided, embeddings will be automatically generated using the cluster's embedding function.
Conduct a Similarity Search
The find()
method:
Utilizes either
query_embeddings
orquery_texts
to identify and return the topn_results
closest matches.
Get embeddings by ID or filter
Adding Images
You can enhance your cluster by adding images and generating embeddings directly from the image pixels, enabling a robust visual search functionality within your application powered by BagelDB.
Use the add_image
method with the desired image path:
This process will:
Upload the base_64 encoding of the image to the server.
Generate an embedding vector based on the image pixels.
Index the image within the cluster using the generated embedding.
Important Considerations:
Supported Formats: BagelDB accepts JPEG, PNG, BMP, and GIF image formats.
Embedding Generation Time: Generating embeddings from images may take longer than generating text embeddings due to the complexity of visual data.
Delete the cluster
Conclusion
Whether you're working solo or in a team, our platform ensures that your cluster data remains securely stored and accessible, even after your session ends. Explore our user-friendly Python Clien, JavaScript Client, or dive into our comprehensive API references to unlock the full potential of BagelDB
Last updated