Quickstart

network_check Beginner
query_builder 15 min

Learn how to create a Serverless (Vector) database, connect to your database, load a set of vector embeddings, and perform a similarity search to find vectors that are close to the one in your query.

Create a Serverless (Vector) database

  1. Create an Astra account or sign in to an existing Astra account.

  2. In the Astra Portal, select Databases in the main navigation.

  3. Select Create Database.

  4. In the Create Database dialog, select the Serverless (Vector) deployment type.

  5. In Configuration, enter a meaningful Database name.

    You can’t change database names. Make sure the name is human-readable and meaningful. Database names must start and end with an alphanumeric character, and can contain the following special characters: & + - _ ( ) < > . , @.

  6. Select your preferred Provider and Region.

    You can select from a limited number of regions if you’re on the Free plan. Regions with a lock icon require that you upgrade to a Pay As You Go plan.

  7. Click Create Database.

    You are redirected to your new database’s Overview screen. Your database starts in Pending status before transitioning to Initializing. You’ll receive a notification once your database is initialized.

  8. Ensure the database is in Active status, and then select Generate Token. In the Application Token dialog, click content_paste Copy to copy the token (e.g. AstraCS:WSnyFUhRxsrg…​). Store the token in a secure location before closing the dialog.

    Your token is automatically assigned the Database Administrator role.

  9. Copy your database’s API endpoint, located under Database Details > API Endpoint (e.g. https://ASTRA_DB_ID-ASTRA_DB_REGION.apps.astra.datastax.com).

  10. Assign your token and API endpoint to environment variables in your terminal.

    • Linux or macOS

    • Windows

    • Google Colab

    export ASTRA_DB_API_ENDPOINT=API_ENDPOINT # Your database API endpoint
    export ASTRA_DB_APPLICATION_TOKEN=TOKEN # Your database application token
    set ASTRA_DB_API_ENDPOINT=API_ENDPOINT # Your database API endpoint
    set ASTRA_DB_APPLICATION_TOKEN=TOKEN # Your database application token
    import os
    os.environ["ASTRA_DB_API_ENDPOINT"] = "API_ENDPOINT" # Your database API endpoint
    os.environ["ASTRA_DB_APPLICATION_TOKEN"] = "TOKEN" # Your database application token

Install a client

You can interact with Astra DB Serverless in the Astra Portal or programmatically. This tutorial uses the DataStax Python, TypeScript, and Java clients. For more information, see Connection methods comparison and Get started with the Data API.

Install the client library for your preferred language and package manager:

  • Python

  • TypeScript

  • Java

To install the Python client with pip:

  1. Verify that pip is version 23.0 or higher.

    pip --version
  2. Upgrade pip if needed.

    python -m pip install --upgrade pip
  3. Install the astrapy package. You must have Python 3.8 or higher.

    pip install astrapy

To install the TypeScript client:

  1. Verify that Node is version 14 or higher:

    node --version
  2. Use npm or Yarn to install the TypeScript client:

    • npm

    • Yarn

    To install the TypeScript client with npm:

    npm install @datastax/astra-db-ts

    To install the TypeScript client with Yarn:

    1. Verify that Yarn is version 2.0 or higher.

      yarn --version
    2. Install the astra-db-ts package.

      yarn add @datastax/astra-db-ts

Use Maven or Gradle to install the Java client:

  • Maven

  • Gradle

To install the Java client with Maven:

  1. Install Java 11+ and Maven 3.9+.

  2. Create a pom.xml file in the root of your project.

    pom.xml
    <project xmlns="http://maven.apache.org/POM/4.0.0"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
                                 http://maven.apache.org/xsd/maven-4.0.0.xsd">
      <modelVersion>4.0.0</modelVersion>
    
      <groupId>com.example</groupId>
      <artifactId>test-java-client</artifactId>
      <version>1.0-SNAPSHOT</version>
    
      <!-- The Java client -->
      <dependencies>
        <dependency>
          <groupId>com.datastax.astra</groupId>
          <artifactId>astra-db-java</artifactId>
          <version>1.0.0</version>
        </dependency>
      </dependencies>
    
      <build>
        <plugins>
          <plugin>
            <groupId>org.codehaus.mojo</groupId>
            <artifactId>exec-maven-plugin</artifactId>
            <version>3.0.0</version>
            <configuration>
              <executable>java</executable>
              <mainClass>com.example.Quickstart</mainClass>
            </configuration>
            <executions>
              <execution>
                <goals>
                  <goal>java</goal>
                </goals>
              </execution>
            </executions>
          </plugin>
          <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <configuration>
              <source>11</source>
              <target>11</target>
            </configuration>
          </plugin>
        </plugins>
      </build>
    </project>

To install the Java client with Gradle:

  1. Install Java 11+ and Gradle.

  2. Create a build.gradle file in the root of your project.

    build.gradle
    plugins {
        id 'java'
        id 'application'
    }
    
    repositories {
        mavenCentral()
    }
    
    dependencies {
        implementation 'com.datastax.astra:astra-db-java:1.0.0'
    }
    
    application {
        mainClassName = 'com.example.Quickstart'
    }

Create a script

Copy the quickstart script to a Python, TypeScript, or Java script file on your computer:

  • Python

  • TypeScript

  • Java

To avoid a namespace collision, don’t name your Python client script files astrapy.py.

quickstart.py
import os
from astrapy import DataAPIClient
from astrapy.constants import VectorMetric
from astrapy.ids import UUID
from astrapy.exceptions import InsertManyException

# Initialize the client and get a "Database" object
client = DataAPIClient(os.environ["ASTRA_DB_APPLICATION_TOKEN"])
database = client.get_database(os.environ["ASTRA_DB_API_ENDPOINT"])
print(f"* Database: {database.info().name}\n")

# Create a collection. The default similarity metric is cosine.
# Choose dimensions that match your vector data.
# If you're not sure, use the vector dimension that your embeddings model produces.
collection = database.create_collection(
    "vector_test",
    dimension=5,
    metric=VectorMetric.COSINE,  # Or just 'cosine'.
    check_exists=False, # Optional
)
print(f"* Collection: {collection.full_name}\n")

# Insert documents with embeddings into the collection.
# UUIDs are version 7.
documents = [
    {
        "_id": UUID("018e65c9-df45-7913-89f8-175f28bd7f74"),
        "text": "Chat bot integrated sneakers that talk to you",
        "$vector": [0.1, 0.15, 0.3, 0.12, 0.05],
    },
    {
        "_id": UUID("018e65c9-e1b7-7048-a593-db452be1e4c2"),
        "text": "An AI quilt to help you sleep forever",
        "$vector": [0.45, 0.09, 0.01, 0.2, 0.11],
    },
    {
        "_id": UUID("018e65c9-e33d-749b-9386-e848739582f0"),
        "text": "A deep learning display that controls your mood",
        "$vector": [0.1, 0.05, 0.08, 0.3, 0.6],
    },
]
try:
    insertion_result = collection.insert_many(documents)
    print(f"* Inserted {len(insertion_result.inserted_ids)} items.\n")
except InsertManyException:
    print("* Documents found on DB already. Let's move on.\n")

# Perform a similarity search
query_vector = [0.15, 0.1, 0.1, 0.35, 0.55]
results = collection.find(
    sort={"$vector": query_vector},
    limit=10,
)
print("Vector search results:")
for document in results:
    print("    ", document)
quickstart.ts
import { DataAPIClient, VectorDoc, UUID } from '@datastax/astra-db-ts';

const { ASTRA_DB_APPLICATION_TOKEN, ASTRA_DB_API_ENDPOINT } = process.env;

// Initialize the client and get a "Db" object
const client = new DataAPIClient(ASTRA_DB_APPLICATION_TOKEN);
const db = client.db(ASTRA_DB_API_ENDPOINT);

console.log(`* Connected to DB ${db.id}`);

// Schema for the collection (VectorizeDoc adds the $vector field)
interface Idea extends VectorizeDoc {
  idea: string,
}

(async function () {
  // Create a collection. The default similarity metric is cosine.
  // Choose dimensions that match your vector data.
  // If you're not sure, use the vector dimension that your embeddings model produces.
  const collection = await db.createCollection<Idea>('vector_test', {
    vector: {
      dimension: 5,
      metric: 'cosine',
    },
    checkExists: false, // Optional
  });
  console.log(`* Created collection ${collection.namespace}.${collection.collectionName}`);

  // Insert documents with embeddings into the collection.
  // UUIDs are version 7.
  const documents = [
    {
      _id: new UUID('018e65c9-df45-7913-89f8-175f28bd7f74'),
      idea: 'Chat bot integrated sneakers that talk to you',
      $vector: [0.1, 0.15, 0.3, 0.12, 0.05],
    },
    {
      _id: new UUID('018e65c9-e1b7-7048-a593-db452be1e4c2'),
      idea: 'An AI quilt to help you sleep forever',
      $vector: [0.45, 0.09, 0.01, 0.2, 0.11],
    },
    {
      _id: new UUID('018e65c9-e33d-749b-9386-e848739582f0'),
      idea: 'A deep learning display that controls your mood',
      $vector: [0.1, 0.05, 0.08, 0.3, 0.6],
    },
  ];

  try {
    const inserted = await collection.insertMany(documents);
    console.log(`* Inserted ${inserted.insertedCount} items.`);
  } catch (e) {
    console.log('* Documents found on DB already. Let\'s move on!');
  }

  // Perform a similarity search
  const cursor = await collection.find({}, {
    sort: { $vector: [0.15, 0.1, 0.1, 0.35, 0.55] },
    limit: 10,
    includeSimilarity: true,
  });

  console.log('* Search results:')
  for await (const doc of cursor) {
    console.log('  ', doc.text, doc.$similarity);
  }

})();
src/main/java/com/example/Quickstart.java
import com.datastax.astra.client.Collection;
import com.datastax.astra.client.DataAPIClient;
import com.datastax.astra.client.Database;
import com.datastax.astra.client.model.Document;
import com.datastax.astra.client.model.FindIterable;
import com.datastax.astra.client.model.SimilarityMetric;

public class Quickstart {

  public static void main(String[] args) {
    // Loading Arguments
    String astraToken = System.getenv("ASTRA_DB_APPLICATION_TOKEN");
    String astraApiEndpoint = System.getenv("ASTRA_DB_API_ENDPOINT");

    // Initialize the client
    DataAPIClient client = new DataAPIClient(astraToken);
    System.out.println("Connected to AstraDB");

    Database db = client.getDatabase(astraApiEndpoint);
    System.out.println("Connected to Database.");

    // Create a collection. The default similarity metric is cosine.
    // Choose dimensions that match your vector data.
    // If you're not sure, use the vector dimension that your embeddings model produces.
    Collection<Document> collection = db
            .createCollection("vector_test", 5, SimilarityMetric.COSINE);
    System.out.println("Created a collection");

    // Insert documents with embeddings into the collection
    collection.insertMany(
            new Document("1")
                    .append("text", "Chat bot integrated sneakers that talk to you")
                    .vector(new float[]{0.1f, 0.15f, 0.3f, 0.12f, 0.05f}),
            new Document("2")
                    .append("text", "An AI quilt to help you sleep forever")
                    .vector(new float[]{0.45f, 0.09f, 0.01f, 0.2f, 0.11f}),
            new Document("3")
                    .append("text", "A deep learning display that controls your mood")
                    .vector(new float[]{0.1f, 0.05f, 0.08f, 0.3f, 0.6f}));
    System.out.println("Inserted documents into the collection");

    // Perform a similarity search
    FindIterable<Document> resultsSet = collection.find(
            new float[]{0.15f, 0.1f, 0.1f, 0.35f, 0.55f},
            10
    );
    resultsSet.forEach(System.out::println);

  }
}

The quickstart script does the following:

  • Initializes the client.

  • Creates a collection named vector_test that uses the default similarity metric cosine and dimensionality of 5.

  • Loads documents into the collection.

  • Performs a similarity search to find documents that are close to a specific vector embedding. This returns a list of documents in the collection sorted by their similarity to the query vector with the most similar documents first. The calculation uses the similarity metric specified when you created the collection.

Run the script

Run the quickstart script:

  • Python

  • TypeScript

  • Java

python quickstart.py
npm
npx tsx quickstart.ts
Yarn
yarn dlx tsx quickstart.ts
Maven
mvn clean compile
mvn exec:java -Dexec.mainClass="com.example.Quickstart"
Gradle
gradle build
gradle run

In the Astra Portal, you can find your new collection and the loaded data. You can use the Data Explorer to inspect and search your data.

Was this helpful?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com