LangStream Documentation
Langstream.aiLangStream GitHub RepoChangelog
  • LangStream Documentation
  • ❤️Langstream.ai
  • ⭐LangStream GitHub Repo
  • 📜Changelog
  • about
    • What is LangStream?
    • License
  • Get Started
  • installation
    • LangStream CLI
    • Docker
    • Minikube (mini-langstream)
    • Kubernetes
    • Build and install from source
  • Building Applications
    • Vector Databases
    • Application structure
      • Pipelines
      • Instances
      • Configuration
      • Topics
      • Assets
      • Secrets
      • YAML templating
      • Error Handling
      • Stateful agents
      • .langstreamignore
    • Sample App
    • Develop, test and deploy
    • Application Lifecycle
    • Expression Language
    • API Gateways
      • Websocket
      • HTTP
      • Message filtering
      • Gateway authentication
    • API Reference
      • Agents
      • Resources
      • Assets
  • LangStream CLI
    • CLI Commands
    • CLI Configuration
    • Web interface
  • Integrations
    • Large Language Models (LLMs)
      • OpenAI
      • Hugging Face
      • Google Vertex AI
      • Amazon Bedrock
      • Ollama
    • Data storage
      • Astra Vector DB
      • Astra
      • Cassandra
      • Pinecone
      • Milvus
      • Solr
      • JDBC
      • OpenSearch
    • Integrations
      • Apache Kafka Connect
      • Apache Camel
    • LangServe
  • Pipeline Agents
    • Agent Messaging
    • Builtin agents
      • Input & Output
        • webcrawler-source
        • s3-source
        • azure-blob-storage-source
        • sink
        • vector-db-sink
        • camel-source
      • AI Agents
        • ai-chat-completions
        • ai-text-completions
        • compute-ai-embeddings
        • flare-controller
      • Text Processors
        • document-to-json
        • language-detector
        • query
        • query-vector-db
        • re-rank
        • text-normaliser
        • text-extractor
        • text-splitter
        • http-request
      • Data Transform
        • cast
        • compute
        • drop
        • drop-fields
        • merge-key-value
        • unwrap-key-value
      • Flow control
        • dispatch
        • timer-source
        • trigger-event
    • Custom Agents
      • Python sink
      • Python source
      • Python processor
      • Python service
    • Agent Developer Guide
      • Agent Types
      • Agent Creation
      • Configuration and Testing
      • Environment variables
  • Messaging
    • Messaging
      • Apache Pulsar
      • Apache Kafka
      • Pravega.io
  • Patterns
    • RAG pattern
    • FLARE pattern
  • Examples
    • LangServe chatbot
    • LlamaIndex Cassandra sink
Powered by GitBook
On this page
  • Connecting to Solr
  • Special assets for Apache Solr Cloud
  • Querying Solr
  • Writing to Solr
  • Configuration
Edit on GitHub
  1. Integrations
  2. Data storage

Solr

PreviousMilvusNextJDBC

Last updated 1 year ago

LangStream allows you to use Apache Solr as a vector database. This is useful if you want to use a vector database that is hosted on-premises or in your own cloud environment.

You can find more about how to perform Vector Search in Apache Solr in the

Connecting to Solr

To use Apache Solr as a vector database, create a "vector-database" resource in your configuration.yaml file.

resources:
    - type: "vector-database"
      name: "SolrDatasource"
      configuration:
        service: "solr"        
        username: "${ secrets.solr.username }"
        password: "${ secrets.solr.password }"
        host: "${ secrets.solr.host }"
        port: "${ secrets.solr.port }"  
        collection-name: "${ secrets.solr.collection-name }"  
      

Explanation for the parameters:

  • username: the username

  • password: the password

  • host: the host

  • port: the port, usually it is 8983

  • collection-name: the name of the collection to connect to

Currently LangStream supports connecting to one Collection at a time, so you need to create a separate resource for each Collection.

LangStream uses the official Apache Solr Java client to connect to Solr.

Special assets for Apache Solr Cloud

You can use both Solr and Solr Cloud. If you use Solr Cloud then you can also manage the collections using the Solr Cloud API.

To do that, you need to create special assets in your pipeline: "solr-collection".

Here is an example of creating a collection named "documents" with a dense vector, as required to perform Vector Similarity Searches:

assets:
  - name: "documents-table"
    asset-type: "solr-collection"
    creation-mode: create-if-not-exists
    deletion-mode: delete
    config:
      collection-name: "documents"
      datasource: "SolrDataSource"
      create-statements:
        - api: "/api/collections"
          method: "POST"
          body: |
            {
              "name": "documents",
              "numShards": 1,
              "replicationFactor": 1
             }
        - "api": "/schema"
          "body": |
            {
             "add-field-type" : {
                   "name": "knn_vector",
                   "class": "solr.DenseVectorField",
                   "vectorDimension": "1536",
                   "similarityFunction": "cosine"
              }
             }

        - "api": "/schema"
          "body": |
            {
              "add-field":{
                "name":"embeddings",
                "type":"knn_vector",
                "stored":true,
                "indexed":true
                }
            }
        - "api": "/schema"
          "body": |
            {
               "add-field":{
                   "name":"text",
                   "type":"string",
                   "stored":true,
                   "indexed":false,
                   "multiValued": false
               }
            }

As you can see in the "create-statements" section above, you can configure a number of "commands" that translate to Solr API calls. You can invoke any of the APIs, for each command you have to declare:

  • the "api" you want to call: "/schema" or "/api/collections"

  • the body of the HTTP Request, as a JSON string

  • you can also set the HTTP method, if you want to use something different than "POST"

Querying Solr

Use the "query-vector-db" agent to query Solr with the following parameters:

  - name: "lookup-related-documents"
    type: "query-vector-db"
    configuration:
      datasource: "SolrDataSource"
      query: |
        {
          "q": "{!knn f=embeddings topK=10}?"
        }
      fields:
        - "fn:toListOfFloat(value.question_embeddings)"
      output-field: "value.related_documents"

As usual you can use the '?' symbol as a placeholder for the fields that you specify in the "q" section.

For Apache Solr, the "query" field requires a JSON that describes the parameters to pass in the Solr query. Usually you provide a value for "q" for the main query string, and then add other parameters.

Writing to Solr

Use the "vector-db-sink" agent to write to Solr with the following parameters:

  - name: "Write to Solr"
    type: "vector-db-sink"
    input: chunks-topic
    configuration:
      datasource: "SolrDataSource"
      collection-name: "documents"
      fields:
        - name: "id"
          expression: "fn:concat(value.filename, value.chunk_id)"
        - name: "embeddings"
          expression: "fn:toListOfFloat(value.embeddings_vector)"
        - name: "text"
          expression: "value.text"

Configuration

If you set "api" to "/schema" then you are using the . If you set "api" to "/api/collections" then you are using the .

In the example above, we are using the "knn" query parser to perform a .

Set the collection-name to the name of the collection you want to write to, and define the fields in the "fields" list. This works similarly to the , where you define the name of the field and the expression to compute the value of the field.

Check out the full configuration properties in the .

official documentation
Solr Schema API
Solr Collections API
Vector Similarity Search
'compute' agent
API Reference page