LangStream Documentation
Langstream.aiLangStream GitHub RepoChangelog
  • LangStream Documentation
  • ❤️Langstream.ai
  • ⭐LangStream GitHub Repo
  • 📜Changelog
  • about
    • What is LangStream?
    • License
  • Get Started
  • installation
    • LangStream CLI
    • Docker
    • Minikube (mini-langstream)
    • Kubernetes
    • Build and install from source
  • Building Applications
    • Vector Databases
    • Application structure
      • Pipelines
      • Instances
      • Configuration
      • Topics
      • Assets
      • Secrets
      • YAML templating
      • Error Handling
      • Stateful agents
      • .langstreamignore
    • Sample App
    • Develop, test and deploy
    • Application Lifecycle
    • Expression Language
    • API Gateways
      • Websocket
      • HTTP
      • Message filtering
      • Gateway authentication
    • API Reference
      • Agents
      • Resources
      • Assets
  • LangStream CLI
    • CLI Commands
    • CLI Configuration
    • Web interface
  • Integrations
    • Large Language Models (LLMs)
      • OpenAI
      • Hugging Face
      • Google Vertex AI
      • Amazon Bedrock
      • Ollama
    • Data storage
      • Astra Vector DB
      • Astra
      • Cassandra
      • Pinecone
      • Milvus
      • Solr
      • JDBC
      • OpenSearch
    • Integrations
      • Apache Kafka Connect
      • Apache Camel
    • LangServe
  • Pipeline Agents
    • Agent Messaging
    • Builtin agents
      • Input & Output
        • webcrawler-source
        • s3-source
        • azure-blob-storage-source
        • sink
        • vector-db-sink
        • camel-source
      • AI Agents
        • ai-chat-completions
        • ai-text-completions
        • compute-ai-embeddings
        • flare-controller
      • Text Processors
        • document-to-json
        • language-detector
        • query
        • query-vector-db
        • re-rank
        • text-normaliser
        • text-extractor
        • text-splitter
        • http-request
      • Data Transform
        • cast
        • compute
        • drop
        • drop-fields
        • merge-key-value
        • unwrap-key-value
      • Flow control
        • dispatch
        • timer-source
        • trigger-event
    • Custom Agents
      • Python sink
      • Python source
      • Python processor
      • Python service
    • Agent Developer Guide
      • Agent Types
      • Agent Creation
      • Configuration and Testing
      • Environment variables
  • Messaging
    • Messaging
      • Apache Pulsar
      • Apache Kafka
      • Pravega.io
  • Patterns
    • RAG pattern
    • FLARE pattern
  • Examples
    • LangServe chatbot
    • LlamaIndex Cassandra sink
Powered by GitBook
On this page
  • Example
  • Automatically repeating the query over a list of inputs
  • Topics
  • Configuration
  • What's next?
Edit on GitHub
  1. Pipeline Agents
  2. Builtin agents
  3. Text Processors

query-vector-db

PreviousqueryNextre-rank

Last updated 1 year ago

This agent enables submitting queries to a vector datasource (like or ) and outputting the results.

Example

Follow the to create your Pinecone API key, environment, vector index name, and project name, and add them to your secrets.yaml file. These values are required.

Specify a datasource in configuration.yaml:

configuration:
  resources:
    - type: "vector-database"
      name: "PineconeDatasource"
      configuration:
        service: "pinecone"
        api-key: "${secrets.pinecone.api-key}"
        environment: "${secrets.pinecone.environment}"
        index-name: "${secrets.pinecone.index-name}"
        project-name: "${secrets.pinecone.project-name}"
        server-side-timeout-sec: 10

Reference the "vector-database" datasource and submit a query using input message values in pipeline.yaml:

- name: "Execute Query"
  type: "query-vector-db"
  configuration:
    datasource: "PineconeDatasource"
    query: |
      {
            "vector": ?,
            "topK": 5,
            "filter":
              {"$or": [{"genre": "comedy"}, {"year":2019}]}
       }
    fields:
      - "value.embeddings"
    output-field: "value.query-result"

Please refer to the documentation of the vector database you are using for more information on how to write queries:

Automatically repeating the query over a list of inputs

In the example below we use the 'loop-over' capability to query the database for each document in the list of documents to retrieve.

  - name: "lookup-related-documents"
    type: "query-vector-db"
    configuration:
      datasource: "JdbcDatasource"
      # execute the agent for all the document in documents_to_retrieve
      # you can refer to each document with "record.xxx"
      loop-over: "value.documents_to_retrieve"
      query: |
              SELECT text,embeddings_vector
              FROM documents
              ORDER BY cosine_similarity(embeddings_vector, CAST(? as FLOAT ARRAY)) DESC LIMIT 5
      fields:
        - "record.embeddings"
      # as we are looping over a list of document, the result of the query
      # is the union of all the results
      output-field: "value.retrieved_documents"

When you use "loop-over", the agent executes for each element in a list instead of operating on the whole message. Use "record.xxx" to refer to the current element in the list.

The snippet above computes the embeddings for each element in the list "documents_to_retrieve". The list is expected to be a struct like this:

{
  "documents_to_retrieve": [
      {
        "text": "the text of the first document",
        "embeddings": [1,2,3,4,5]
       },
       {
        "text": "the text of the second document",
        "embeddings": [6,7,8,9,10]
       }
    ]
}

The agent then adds all the results to a new field named "retrieved_documents" in the message.

After running the agent the contents of the list are:

{
    "documents_to_retrieve": [
      {
        "text": "the text of the first document",
        "embeddings": [1,2,3,4,5]
       },
       {
        "text": "the text of the second document",
        "embeddings": [6,7,8,9,10]
       }
    ],
  "retrieved_documents": [
      {
        "text": "the text of some document relevant",
        "embeddings_vector": [0.2,7,8,9,5]
       },
       {
        "text": "the text of another document",
        "embeddings_vector": [0.2,3,8,9,2]
       },
       {
        "text": "the text of another document",
        "embeddings_vector": [1.2,9,3,3,3]
       }
    ]
}

Topics

Input

Output

Configuration

Label
Type
Description

datasource

string

A reference to the datasource name declared in resources.datasources of the configuration.yaml manifest.

query

string

The query statement to run. Each placeholder “?” will be replaced with fields value in order.

Example:

fields

string[]

A collection of field values. Each value will be used in order to replace placeholders in the query (do not include mustache brackets, this is not a templated value).

Example collection:

  • “value.embeddings”

output-field

string

The name of an additional field to be added to message data containing query result (do not include mustache brackets, this is not a templated value).

Provide in the form: “value.<field-name>”

What's next?

The query language depends on the underlying vector database. For example, Pinecone uses a JSON query language. See the for more information.

It is possible to perform the same computation over a list of inputs - for example, a list of questions. You can take the as an example.

Structured and unstructured text

Implicit topic

Structured text

Implicit topic

For more on vector databases, see .

For more on the datasource resource, see .

{
"vector": ?,
"topK": 5,
"filter":
{"$or": [{"genre": "comedy"}, {"year":2019}]}
}
Pinecone
AstraDB
Pinecone Quickstart docs
Pinecone Query Language docs
Astra Vector DB
Astra DB
Apache Cassandra
Pinecone.io
Milvus.io
JDBC
Apache Solr
Flare pattern
Vector Databases
Datasource
?
?
?
?