LangStream Documentation
Langstream.aiLangStream GitHub RepoChangelog
  • LangStream Documentation
  • ❤️Langstream.ai
  • ⭐LangStream GitHub Repo
  • 📜Changelog
  • about
    • What is LangStream?
    • License
  • Get Started
  • installation
    • LangStream CLI
    • Docker
    • Minikube (mini-langstream)
    • Kubernetes
    • Build and install from source
  • Building Applications
    • Vector Databases
    • Application structure
      • Pipelines
      • Instances
      • Configuration
      • Topics
      • Assets
      • Secrets
      • YAML templating
      • Error Handling
      • Stateful agents
      • .langstreamignore
    • Sample App
    • Develop, test and deploy
    • Application Lifecycle
    • Expression Language
    • API Gateways
      • Websocket
      • HTTP
      • Message filtering
      • Gateway authentication
    • API Reference
      • Agents
      • Resources
      • Assets
  • LangStream CLI
    • CLI Commands
    • CLI Configuration
    • Web interface
  • Integrations
    • Large Language Models (LLMs)
      • OpenAI
      • Hugging Face
      • Google Vertex AI
      • Amazon Bedrock
      • Ollama
    • Data storage
      • Astra Vector DB
      • Astra
      • Cassandra
      • Pinecone
      • Milvus
      • Solr
      • JDBC
      • OpenSearch
    • Integrations
      • Apache Kafka Connect
      • Apache Camel
    • LangServe
  • Pipeline Agents
    • Agent Messaging
    • Builtin agents
      • Input & Output
        • webcrawler-source
        • s3-source
        • azure-blob-storage-source
        • sink
        • vector-db-sink
        • camel-source
      • AI Agents
        • ai-chat-completions
        • ai-text-completions
        • compute-ai-embeddings
        • flare-controller
      • Text Processors
        • document-to-json
        • language-detector
        • query
        • query-vector-db
        • re-rank
        • text-normaliser
        • text-extractor
        • text-splitter
        • http-request
      • Data Transform
        • cast
        • compute
        • drop
        • drop-fields
        • merge-key-value
        • unwrap-key-value
      • Flow control
        • dispatch
        • timer-source
        • trigger-event
    • Custom Agents
      • Python sink
      • Python source
      • Python processor
      • Python service
    • Agent Developer Guide
      • Agent Types
      • Agent Creation
      • Configuration and Testing
      • Environment variables
  • Messaging
    • Messaging
      • Apache Pulsar
      • Apache Kafka
      • Pravega.io
  • Patterns
    • RAG pattern
    • FLARE pattern
  • Examples
    • LangServe chatbot
    • LlamaIndex Cassandra sink
Powered by GitBook
On this page
Edit on GitHub
  1. Integrations
  2. Data storage

Astra Vector DB

PreviousData storageNextAstra

Last updated 1 year ago

Connecting to DataStax Astra Vector DB

To use DataStax Astra DB as a vector database using the , you have to create a "vector-database" resource in your configuration.yaml file.

If you want to use the CQL based API then you have to create a "astra" resource. See the documentation .

resources:
  - type: "vector-database"
    name: "AstraDatasource"
    configuration:
      service: "astra-vector-db"      
      token: "${ secrets.astra-vector-db.token }"
      enviroment: "${ secrets.astra-vector-db.endpoint }"

Required parameters:

  • token: this is the token provided by the Astra DB service

  • endpoint: this is the URL of your database

Handling Collections as Assets

You can manage a "collection" in your Astra Database using the "astra-collection" asset type. A collection is a logical container for documents.

assets:
  - name: "langstream-keyspace"
    asset-type: "astra-collection"
    creation-mode: create-if-not-exists    
    deletion-mode: delete
    config:
      collection-name: "langstream"
      datasource: "AstraDatasource"  

With the "astra-collection" asset, you can create a collection in your Astra DB database.

As usual, you can use the creation-mode and deletion-mode parameters to control the creation and deletion of the collection.

Querying an Astra DB collection

When you use the "query-vector-db" agent to query Astra, you can use the following parameters:

pipeline:
  - name: "lookup-related-documents"
    type: "query-vector-db"
    configuration:
      datasource: "AstraDatasource"
      query: |
        {
          "collection-name": "docs",
          "vector": ?,
          "limit": 10,
          "include-similarity:" true,
          "select": ["text"],
          "filter": {
            "language": "en"            
          }
        }
      fields:
        - "value.question_embeddings"
      output-field: "value.related_documents"

As usual you can use the '?' symbol as a placeholder for the fields that you specify in the "fields" section. The "id" field is always returned by default, so you don't need to specify it in the "select" list. The "similarity" field is returned depedending on the "include-similarity" parameter, the default is "true". The "vector" field is returned as long as it is it returned by the query, so you don't need to specify it in the "select" list.

The "vector" filter is special, the query uses the "vector" field to perform a similarity search and not an exact match.

Writing to Astra INSERT/UPDATE/DELETE

You can use the query-vector-db agent to write to Astra and update your collections.

This is an example for the "insertOne" action:

pipeline:
  - name: "lookup-related-documents"
    type: "query-vector-db"
    configuration:
       datasource: "AstraDatasource"
       mode: "execute"
       query: |
        {
            "action": "insertOne",
            "collection-name": "documents",
            "document": {
                "id": "some-id",
                "name": ?,
                "vector": ?,
                "text": "Some text"
            }
        }
       fields:
         - "value.name"
         - "value.question_embeddings" 

Similarly you can use the "findOneAndUpdate" action to update an existing document:

pipeline:
  - name: "lookup-related-documents"
    type: "query-vector-db"
    configuration:
       datasource: "AstraDatasource"
       mode: "execute"
       query: |
        {
            "action": "findOneAndUpdate",
            "collection-name": "documents",
            "filter": {
                "_id": ?
            },
            "update": {
                "$set": {
                    "text": ?
                }
            }
        }
       fields:
         - "value.document_id"
         - "value.new_text" 

In delete one or more documents you can use the "deleteOne" and "deleteMany" actions.

pipeline:
  - name: "lookup-related-documents"
    type: "query-vector-db"
    configuration:
       datasource: "AstraDatasource"
       mode: "execute"
       query: |
        {
            "action": "deleteOne",
            "collection-name": "documents",
            "filter": {
                "_id": ?
            }
        }
       fields:
         - "value.document_id"

This is an example for the "deleteMany" action:

pipeline:
  - name: "lookup-related-documents"
    type: "query-vector-db"
    configuration:
       datasource: "AstraDatasource"
       mode: "execute"
       query: |
        {
            "action": "deleteMany",
            "collection-name": "documents",
            "filter": {
                "userId": ?
            }
        }
       fields:
         - "value.user_id"

Writing to Astra Vector DB using the "vector-db-sink" agent

When you use the "vector-db-sink" agent to write to Astra, you can use the following parameters:

  - name: "Write to Astra"
    type: "vector-db-sink"
    configuration:
      datasource: "AstraDatasource"
      collection-name: "documents"
      fields:
        - name: "id"
          expression: "fn:concat(value.filename, '-', value.chunk_id)"
        - name: "vector"
          expression: "value.embeddings_vector"
        - name: "text"
          expression: "value.text"
        - name: "filename"
          expression: "value.filename"
        - name: "chunk_id"
          expression: "value.chunk_id"

Configuration

Set the collection-name to the name of the collection you want to write to. Then you define the fields in the "fields" list. This works similarly to the , and you can use the same syntax to define the fields.

Check out the full configuration properties in the .

JSON API
here
'compute' agent
API Reference page