LangStream Documentation
Langstream.aiLangStream GitHub RepoChangelog
  • LangStream Documentation
  • ❤️Langstream.ai
  • ⭐LangStream GitHub Repo
  • 📜Changelog
  • about
    • What is LangStream?
    • License
  • Get Started
  • installation
    • LangStream CLI
    • Docker
    • Minikube (mini-langstream)
    • Kubernetes
    • Build and install from source
  • Building Applications
    • Vector Databases
    • Application structure
      • Pipelines
      • Instances
      • Configuration
      • Topics
      • Assets
      • Secrets
      • YAML templating
      • Error Handling
      • Stateful agents
      • .langstreamignore
    • Sample App
    • Develop, test and deploy
    • Application Lifecycle
    • Expression Language
    • API Gateways
      • Websocket
      • HTTP
      • Message filtering
      • Gateway authentication
    • API Reference
      • Agents
      • Resources
      • Assets
  • LangStream CLI
    • CLI Commands
    • CLI Configuration
    • Web interface
  • Integrations
    • Large Language Models (LLMs)
      • OpenAI
      • Hugging Face
      • Google Vertex AI
      • Amazon Bedrock
      • Ollama
    • Data storage
      • Astra Vector DB
      • Astra
      • Cassandra
      • Pinecone
      • Milvus
      • Solr
      • JDBC
      • OpenSearch
    • Integrations
      • Apache Kafka Connect
      • Apache Camel
    • LangServe
  • Pipeline Agents
    • Agent Messaging
    • Builtin agents
      • Input & Output
        • webcrawler-source
        • s3-source
        • azure-blob-storage-source
        • sink
        • vector-db-sink
        • camel-source
      • AI Agents
        • ai-chat-completions
        • ai-text-completions
        • compute-ai-embeddings
        • flare-controller
      • Text Processors
        • document-to-json
        • language-detector
        • query
        • query-vector-db
        • re-rank
        • text-normaliser
        • text-extractor
        • text-splitter
        • http-request
      • Data Transform
        • cast
        • compute
        • drop
        • drop-fields
        • merge-key-value
        • unwrap-key-value
      • Flow control
        • dispatch
        • timer-source
        • trigger-event
    • Custom Agents
      • Python sink
      • Python source
      • Python processor
      • Python service
    • Agent Developer Guide
      • Agent Types
      • Agent Creation
      • Configuration and Testing
      • Environment variables
  • Messaging
    • Messaging
      • Apache Pulsar
      • Apache Kafka
      • Pravega.io
  • Patterns
    • RAG pattern
    • FLARE pattern
  • Examples
    • LangServe chatbot
    • LlamaIndex Cassandra sink
Powered by GitBook
On this page
  • Vector search example
  • What’s next?
Edit on GitHub
  1. Patterns

RAG pattern

PreviousPravega.ioNextFLARE pattern

Last updated 1 year ago

The Retrieval Augmented Generation (RAG) pattern is a powerful way to extend the knowledge base of a language model. It is used to:

  • Provide more accurate, up-to-date, and context-aware responses

  • Extend the knowledge base of the LLM

  • Improve the quality of the generated text

Vector search example

Once you have your populated with text and vector embeddings, you can use the to perform a similarity search.

The flow of a vector search application with the RAG pattern is as follows:

  • Start from a query text

  • Compute the of the query text

  • the vector database to get the most similar documents

  • the results to get the most relevant documents

  • Build a prompt with the query text and the most relevant documents

  • with the prompt to get the final response

  • Finally, return the response to the user

pipeline:
  - name: "convert-to-structure"
    type: "document-to-json"
    input: "questions-topic"
    configuration:
      text-field: "question"
  - name: "compute-embeddings"
    type: "compute-ai-embeddings"
    configuration:
      model: "${secrets.open-ai.embeddings-model}" # This needs to match the name of the model deployment, not the base model
      embeddings-field: "value.question_embeddings"
      text: "{{ value.question }}"
      flush-interval: 0
  - name: "lookup-related-documents"
    type: "query-vector-db"
    configuration:
      datasource: "JdbcDatasource"
      query: "SELECT text,embeddings_vector FROM documents ORDER BY cosine_similarity(embeddings_vector, CAST(? as FLOAT ARRAY)) DESC LIMIT 20"
      fields:
        - "value.question_embeddings"
      output-field: "value.related_documents"
  - name: "re-rank documents with MMR"
    type: "re-rank"
    configuration:
      max: 5 # keep only the top 5 documents, because we have an hard limit on the prompt size
      field: "value.related_documents"
      query-text: "value.question"
      query-embeddings: "value.question_embeddings"
      output-field: "value.related_documents"
      text-field: "record.text"
      embeddings-field: "record.embeddings_vector"
      algorithm: "MMR"
      lambda: 0.5
      k1: 1.52
      b: 0.75
  - name: "ai-chat-completions"
    type: "ai-chat-completions"

    configuration:
      model: "${secrets.open-ai.chat-completions-model}" # This needs to be set to the model deployment name, not the base name
      # on the log-topic we add a field with the answer
      completion-field: "value.answer"
      # we are also logging the prompt we sent to the LLM
      log-field: "value.prompt"
      # here we configure the streaming behavior
      # as soon as the LLM answers with a chunk we send it to the answers-topic
      stream-to-topic: "answers-topic"
      # on the streaming answer we send the answer as whole message
      # the 'value' syntax is used to refer to the whole value of the message
      stream-response-completion-field: "value"
      # we want to stream the answer as soon as we have 20 chunks
      # in order to reduce latency for the first message the agent sends the first message
      # with 1 chunk, then with 2 chunks....up to the min-chunks-per-message value
      # eventually we want to send bigger messages to reduce the overhead of each message on the topic
      min-chunks-per-message: 20
      messages:
        - role: system
          content: |
              An user is going to perform a questions, The documents below may help you in answering to their questions.
              Please try to leverage them in your answer as much as possible.
              Take into consideration that the user is always asking questions about the LangStream project.
              If you provide code or YAML snippets, please explicitly state that they are examples.
              Do not provide information that is not related to the LangStream project.
            
              Documents:
              {{# value.related_documents}}
              {{ text}}
              {{/ value.related_documents}}
        - role: user
          content: "{{ value.question}}"          

What’s next?

Do you have a website lying around just waiting to be turned into a useful chatbot ? This complete pipeline is available in the , and running it on your own is no sweat.

vector database
query-vector-db agent
embeddings
Query
Re-rank
Query the LLM
LangStream repo