vector-db-sink

This agent writes vector data to vector databases. LangStream currently supports AstraDB and Pinecone.

Astra DB and Pinecone are both of the type "vector-db-sink" in a LangStream pipeline, but the databases require different configuration values to map the vector data from the sink into the database.

Astra DB example

The Astra DB vector database connection is defined in configuration.yaml:

configuration:
  resources:
    - type: "vector-database"
      name: "AstraDatasource"
      configuration:
        service: "astra"
        username: "${ secrets.astra.username }"
        password: "${ secrets.astra.password }"
        secureBundle: "${ secrets.astra.secureBundle }"

The "Write to Astra DB" pipeline step takes embeddings as input from "input-topic" and writes them to the configured datasource "AstraDatasource":

name: "Write to Astra DB"
topics:
  - name: "input-topic"
    creation-mode: create-if-not-exists
pipeline:
  - name: "Write to Cassandra"
    type: "vector-db-sink"
    input: "input-topic"
    configuration:
      datasource: "AstraDatasource"
      table: "vsearch.products"
      mapping: "id=value.id,description=value.description,name=value.name"

AstraDB Topics

Input

  • Structured and unstructured text ?

  • Implicit topic ?

  • Templating ?

Output

  • None, it’s a sink.

AstraDB Configuration

Pinecone Example

The "Write to Pinecone" pipeline step takes embeddings as input from "vectors-topic" and writes them to a Pinecone datasource.

The Pinecone vector database connection is defined in configuration.yaml:

    - type: "vector-database"
      name: "PineconeDatasource"
      configuration:
        service: "pinecone"
        api-key: "${secrets.pinecone.api-key}"
        environment: "${secrets.pinecone.environment}"
        index-name: "${secrets.pinecone.index-name}"
        project-name: "${secrets.pinecone.project-name}"
        server-side-timeout-sec: 10

The "Write to Pinecone" pipeline step takes embeddings as input from "input-topic" and writes them to the configured datasource "PineconeDatasource":

name: "Write to Pinecone DB"
topics:
  - name: "vectors-topic"
    creation-mode: create-if-not-exists
pipeline:
  - name: "Write to Pinecone"
    type: "vector-db-sink"
    configuration:
      datasource: "PineconeDatasource"
      vector.id: "value.id"
      vector.vector: "value.embeddings"
      vector.namespace: "value.namespace"
      vector.metadata.genre: "value.genre"

Pinecone Topics

Input

  • Structured and unstructured text ?

  • Implicit topic ?

  • Templating ?

Output

  • None, it’s a sink.

Pinecone Configuration

Last updated