Azure Cosmos DB now supports vector search, which allows you to perform efficient similarity searches on high-dimensional data, such as embeddings from AI models. This tutorial will guide you through enabling and using vector search in Azure Cosmos DB for NoSQL with Python, TypeScript, .NET, and Java, using a movie dataset as an example.
Prerequisites
Before diving into the implementation, ensure you have the following:
- An Azure subscription with access to Azure Cosmos DB.
- Azure Cosmos DB for NoSQL account.
- Azure CLI installed.
- SDKs for Python, TypeScript, .NET, and Java installed.
- A movie dataset containing embeddings.
Step 1: Enable Vector Search In Azure Cosmos DB
To enable vector search in Cosmos DB:
- Navigate to your Azure Cosmos DB account.
- Create a new container or use an existing one.
- Enable vector search by defining an index policy that supports
vector
types. - Define a vector index on your embeddings column.
Example indexing policy (JSON format):
{
"indexingMode": "consistent",
"includedPaths": [
{
"path": "/*",
"indexes": [
{
"kind": "Vector",
"dataType": "number",
"dimensions": 1536
}
]
}
]
}
Step 2: Upload Movie Dataset with Embeddings
The dataset should contain movie descriptions with vector embeddings. You can generate embeddings using OpenAI’s text-embedding-ada-002
model or similar.
Example movie document structure:
{
"id": "1",
"title": "Inception",
"description": "A mind-bending thriller by Christopher Nolan.",
"embedding": [0.12, -0.05, 0.34, ..., 0.22]
}
Python: Upload Data
from azure.cosmos import CosmosClient
import json
ENDPOINT = "your_cosmosdb_endpoint"
KEY = "your_cosmosdb_key"
DATABASE_NAME = "MoviesDB"
CONTAINER_NAME = "Movies"
client = CosmosClient(ENDPOINT, KEY)
database = client.get_database_client(DATABASE_NAME)
container = database.get_container_client(CONTAINER_NAME)
movie = {
"id": "1",
"title": "Inception",
"description": "A mind-bending thriller by Christopher Nolan.",
"embedding": [0.12, -0.05, 0.34, ..., 0.22]
}
container.create_item(movie)
Step 3: Perform Vector Search
Python: Querying Using Vector Search
query = {
"query": "SELECT * FROM Movies ORDER BY VECTOR_DISTANCE(embedding, @vector) ASC",
"parameters": [{ "name": "@vector", "value": [0.1, -0.2, 0.3, ..., 0.2] }]
}
results = container.query_items(query=query, enable_cross_partition_query=True)
for result in results:
print(result)
TypeScript: Querying Using Vector Search
import { CosmosClient } from "@azure/cosmos";
const endpoint = "your_cosmosdb_endpoint";
const key = "your_cosmosdb_key";
const client = new CosmosClient({ endpoint, key });
const database = client.database("MoviesDB");
const container = database.container("Movies");
const query = {
query: "SELECT * FROM Movies ORDER BY VECTOR_DISTANCE(embedding, @vector) ASC",
parameters: [{ name: "@vector", value: [0.1, -0.2, 0.3, ..., 0.2] }],
};
async function searchMovies() {
const { resources } = await container.items.query(query).fetchAll();
console.log(resources);
}
searchMovies();
.NET: Querying Using Vector Search
using Microsoft.Azure.Cosmos;
string endpoint = "your_cosmosdb_endpoint";
string key = "your_cosmosdb_key";
CosmosClient client = new CosmosClient(endpoint, key);
Database database = client.GetDatabase("MoviesDB");
Container container = database.GetContainer("Movies");
string query = "SELECT * FROM Movies ORDER BY VECTOR_DISTANCE(embedding, @vector) ASC";
QueryDefinition queryDefinition = new QueryDefinition(query)
.WithParameter("@vector", new float[] {0.1f, -0.2f, 0.3f, ..., 0.2f});
FeedIterator<Movie> resultSet = container.GetItemQueryIterator<Movie>(queryDefinition);
while (resultSet.HasMoreResults)
{
foreach (Movie movie in await resultSet.ReadNextAsync())
{
Console.WriteLine(movie.Title);
}
}
Java: Querying Using Vector Search
import com.azure.cosmos.*;
import com.azure.cosmos.models.*;
import java.util.List;
String endpoint = "your_cosmosdb_endpoint";
String key = "your_cosmosdb_key";
CosmosClient client = new CosmosClientBuilder().endpoint(endpoint).key(key).buildClient();
CosmosDatabase database = client.getDatabase("MoviesDB");
CosmosContainer container = database.getContainer("Movies");
String query = "SELECT * FROM Movies ORDER BY VECTOR_DISTANCE(embedding, @vector) ASC";
SqlParameter param = new SqlParameter("@vector", List.of(0.1, -0.2, 0.3, ..., 0.2));
SqlQuerySpec querySpec = new SqlQuerySpec(query, List.of(param));
CosmosPagedIterable<JsonNode> results = container.queryItems(querySpec, new CosmosQueryRequestOptions(), JsonNode.class);
results.forEach(System.out::println);
Conclusion
Vector search in Azure Cosmos DB for NoSQL enables efficient similarity searches on high-dimensional data, making it a powerful tool for AI-driven applications. In this guide, we:
- Enabled vector search by modifying the indexing policy.
- Uploaded a movie dataset containing vector embeddings.
- Demonstrated how to perform vector searches using Python, TypeScript, .NET, and Java.
By implementing vector search, you can enhance recommendation engines, semantic search, and other AI-powered functionalities in your applications. Azure Cosmos DB’s scalability ensures that even large-scale datasets can be efficiently queried using vector search.