Azure Cosmos DB now supports vector search, which allows you to perform efficient similarity searches on high-dimensional data, such as embeddings from AI models. This tutorial will guide you through enabling and using vector search in Azure Cosmos DB for NoSQL with Python, TypeScript, .NET, and Java, using a movie dataset as an example.

Prerequisites

Before diving into the implementation, ensure you have the following:

  • An Azure subscription with access to Azure Cosmos DB.
  • Azure Cosmos DB for NoSQL account.
  • Azure CLI installed.
  • SDKs for Python, TypeScript, .NET, and Java installed.
  • A movie dataset containing embeddings.

Step 1: Enable Vector Search In Azure Cosmos DB

To enable vector search in Cosmos DB:

  1. Navigate to your Azure Cosmos DB account.
  2. Create a new container or use an existing one.
  3. Enable vector search by defining an index policy that supports vector types.
  4. Define a vector index on your embeddings column.

Example indexing policy (JSON format):

{
  "indexingMode": "consistent",
  "includedPaths": [
    {
      "path": "/*",
      "indexes": [
        {
          "kind": "Vector",
          "dataType": "number",
          "dimensions": 1536
        }
      ]
    }
  ]
}

Step 2: Upload Movie Dataset with Embeddings

The dataset should contain movie descriptions with vector embeddings. You can generate embeddings using OpenAI’s text-embedding-ada-002 model or similar.

Example movie document structure:

{
  "id": "1",
  "title": "Inception",
  "description": "A mind-bending thriller by Christopher Nolan.",
  "embedding": [0.12, -0.05, 0.34, ..., 0.22]  
}

Python: Upload Data

from azure.cosmos import CosmosClient
import json

ENDPOINT = "your_cosmosdb_endpoint"
KEY = "your_cosmosdb_key"
DATABASE_NAME = "MoviesDB"
CONTAINER_NAME = "Movies"

client = CosmosClient(ENDPOINT, KEY)
database = client.get_database_client(DATABASE_NAME)
container = database.get_container_client(CONTAINER_NAME)

movie = {
    "id": "1",
    "title": "Inception",
    "description": "A mind-bending thriller by Christopher Nolan.",
    "embedding": [0.12, -0.05, 0.34, ..., 0.22]
}

container.create_item(movie)

Step 3: Perform Vector Search

Python: Querying Using Vector Search

query = {
    "query": "SELECT * FROM Movies ORDER BY VECTOR_DISTANCE(embedding, @vector) ASC",
    "parameters": [{ "name": "@vector", "value": [0.1, -0.2, 0.3, ..., 0.2] }]
}

results = container.query_items(query=query, enable_cross_partition_query=True)
for result in results:
    print(result)

TypeScript: Querying Using Vector Search

import { CosmosClient } from "@azure/cosmos";

const endpoint = "your_cosmosdb_endpoint";
const key = "your_cosmosdb_key";
const client = new CosmosClient({ endpoint, key });
const database = client.database("MoviesDB");
const container = database.container("Movies");

const query = {
  query: "SELECT * FROM Movies ORDER BY VECTOR_DISTANCE(embedding, @vector) ASC",
  parameters: [{ name: "@vector", value: [0.1, -0.2, 0.3, ..., 0.2] }],
};

async function searchMovies() {
  const { resources } = await container.items.query(query).fetchAll();
  console.log(resources);
}

searchMovies();

.NET: Querying Using Vector Search

using Microsoft.Azure.Cosmos;

string endpoint = "your_cosmosdb_endpoint";
string key = "your_cosmosdb_key";
CosmosClient client = new CosmosClient(endpoint, key);
Database database = client.GetDatabase("MoviesDB");
Container container = database.GetContainer("Movies");

string query = "SELECT * FROM Movies ORDER BY VECTOR_DISTANCE(embedding, @vector) ASC";
QueryDefinition queryDefinition = new QueryDefinition(query)
    .WithParameter("@vector", new float[] {0.1f, -0.2f, 0.3f, ..., 0.2f});

FeedIterator<Movie> resultSet = container.GetItemQueryIterator<Movie>(queryDefinition);
while (resultSet.HasMoreResults)
{
    foreach (Movie movie in await resultSet.ReadNextAsync())
    {
        Console.WriteLine(movie.Title);
    }
}

Java: Querying Using Vector Search

import com.azure.cosmos.*;
import com.azure.cosmos.models.*;
import java.util.List;

String endpoint = "your_cosmosdb_endpoint";
String key = "your_cosmosdb_key";
CosmosClient client = new CosmosClientBuilder().endpoint(endpoint).key(key).buildClient();
CosmosDatabase database = client.getDatabase("MoviesDB");
CosmosContainer container = database.getContainer("Movies");

String query = "SELECT * FROM Movies ORDER BY VECTOR_DISTANCE(embedding, @vector) ASC";
SqlParameter param = new SqlParameter("@vector", List.of(0.1, -0.2, 0.3, ..., 0.2));
SqlQuerySpec querySpec = new SqlQuerySpec(query, List.of(param));

CosmosPagedIterable<JsonNode> results = container.queryItems(querySpec, new CosmosQueryRequestOptions(), JsonNode.class);
results.forEach(System.out::println);

Conclusion

Vector search in Azure Cosmos DB for NoSQL enables efficient similarity searches on high-dimensional data, making it a powerful tool for AI-driven applications. In this guide, we:

  • Enabled vector search by modifying the indexing policy.
  • Uploaded a movie dataset containing vector embeddings.
  • Demonstrated how to perform vector searches using Python, TypeScript, .NET, and Java.

By implementing vector search, you can enhance recommendation engines, semantic search, and other AI-powered functionalities in your applications. Azure Cosmos DB’s scalability ensures that even large-scale datasets can be efficiently queried using vector search.