LangStream Documentation
Langstream.aiLangStream GitHub RepoChangelog
  • LangStream Documentation
  • ❤️Langstream.ai
  • ⭐LangStream GitHub Repo
  • 📜Changelog
  • about
    • What is LangStream?
    • License
  • Get Started
  • installation
    • LangStream CLI
    • Docker
    • Minikube (mini-langstream)
    • Kubernetes
    • Build and install from source
  • Building Applications
    • Vector Databases
    • Application structure
      • Pipelines
      • Instances
      • Configuration
      • Topics
      • Assets
      • Secrets
      • YAML templating
      • Error Handling
      • Stateful agents
      • .langstreamignore
    • Sample App
    • Develop, test and deploy
    • Application Lifecycle
    • Expression Language
    • API Gateways
      • Websocket
      • HTTP
      • Message filtering
      • Gateway authentication
    • API Reference
      • Agents
      • Resources
      • Assets
  • LangStream CLI
    • CLI Commands
    • CLI Configuration
    • Web interface
  • Integrations
    • Large Language Models (LLMs)
      • OpenAI
      • Hugging Face
      • Google Vertex AI
      • Amazon Bedrock
      • Ollama
    • Data storage
      • Astra Vector DB
      • Astra
      • Cassandra
      • Pinecone
      • Milvus
      • Solr
      • JDBC
      • OpenSearch
    • Integrations
      • Apache Kafka Connect
      • Apache Camel
    • LangServe
  • Pipeline Agents
    • Agent Messaging
    • Builtin agents
      • Input & Output
        • webcrawler-source
        • s3-source
        • azure-blob-storage-source
        • sink
        • vector-db-sink
        • camel-source
      • AI Agents
        • ai-chat-completions
        • ai-text-completions
        • compute-ai-embeddings
        • flare-controller
      • Text Processors
        • document-to-json
        • language-detector
        • query
        • query-vector-db
        • re-rank
        • text-normaliser
        • text-extractor
        • text-splitter
        • http-request
      • Data Transform
        • cast
        • compute
        • drop
        • drop-fields
        • merge-key-value
        • unwrap-key-value
      • Flow control
        • dispatch
        • timer-source
        • trigger-event
    • Custom Agents
      • Python sink
      • Python source
      • Python processor
      • Python service
    • Agent Developer Guide
      • Agent Types
      • Agent Creation
      • Configuration and Testing
      • Environment variables
  • Messaging
    • Messaging
      • Apache Pulsar
      • Apache Kafka
      • Pravega.io
  • Patterns
    • RAG pattern
    • FLARE pattern
  • Examples
    • LangServe chatbot
    • LlamaIndex Cassandra sink
Powered by GitBook
On this page
  • Connecting to a JDBC Compliant Database
  • Loading the JDBC driver
  • Querying a JDBC datasource
  • Special assets for JDBC datasources
  • Writing to a JDBC datasource
  • Configuration
Edit on GitHub
  1. Integrations
  2. Data storage

JDBC

Connecting to a JDBC Compliant Database

Connect to any JDBC-compliant database using the "datasource" resource type.

configuration:
  resources:
    - type: "datasource"
      name: "PGDataSource"
      configuration:
        service: "jdbc"
        driverClass: "org.postgresql.Driver"
        url: "jdbc:postgresql://postgresql.default.svc.cluster.local:5432/"
        user: "${secrets.postgres.user}"
        password: "${secrets.postgres.password}"
  resources:
    - type: "datasource"
      name: "JdbcDatasource"
      configuration:
        service: "jdbc"
        driverClass: "herddb.jdbc.Driver"
        url: "jdbc:herddb:server:localhost:7000"
  dependencies:
    - name: "PostGRES JDBC Driver"
      url: "https://jdbc.postgresql.org/download/postgresql-42.6.0.jar"
      sha512sum: "ec3b57d8377715ef6286d457b610a2e056aa99dbf036f750c5e07370fc8b01414b2aef5e0232d561c50f22abf0da961ee432e53734cc193a3e9bdaf6231d4fa1"
      type: "java-library"
  dependencies:
    - name: "HerdDB.org JDBC Driver"
      url: "https://repo1.maven.org/maven2/org/herddb/herddb-jdbc/0.28.0/herddb-jdbc-0.28.0-thin.jar"
      sha512sum: "d8ea8fbb12eada8f860ed660cbc63d66659ab3506bc165c85c420889aa8a1dac53dab7906ef61c4415a038c5a034f0d75900543dd0013bdae50feafd46f51c8e"
      type: "java-library"

The driverClass class parameter is required - it is the entry point for bootstrapping the JDBC driver.

Loading the JDBC driver

To load the JDBC driver, provide a dependency to the driver jar file.

  dependencies:
    - name: "PostGRES JDBC Driver"
      url: "https://jdbc.postgresql.org/download/postgresql-42.6.0.jar"
      sha512sum: "ec3b57d8377715ef6286d457b610a2e056aa99dbf036f750c5e07370fc8b01414b2aef5e0232d561c50f22abf0da961ee432e53734cc193a3e9bdaf6231d4fa1"
      type: "java-library"

With this syntax, the LangStream CLI downloads the file and stores it in the java/lib folder.

Add a .gitignore rule to avoid committing the jar file to your git repository.

This is a sample .gitignore file to put at the root of your application directory in order to not commit the jar file to your git repository:

java/lib/*

Querying a JDBC datasource

You can query a JDBC datasource using the "query" or the "query-vector-db" agent in your pipeline.

pipeline:
  - name: "lookup-related-documents"
    type: "query-vector-db"
    configuration:
      datasource: "JdbcDatasource"
      query: "SELECT text FROM documents ORDER BY cosine_similarity(embeddings_vector, CAST(? as FLOAT ARRAY)) DESC LIMIT 5"
      fields:
        - "value.question_embeddings"

As usual you can use the '?' symbol as a placeholder for the fields that you specify in the "query" section. The syntax for performing an Approximate Nearest Neighbor search is specific to the database you are using, in the example above we are using the HerdDB syntax.

Special assets for JDBC datasources

You can automatically manage the tables on your JDBC datasource using the "jdbc-table" asset type.

assets:
  - name: "documents-table"
    asset-type: "jdbc-table"
    creation-mode: create-if-not-exists
    deletion-mode: delete
    config:
      table-name: "documents"    
      datasource: "JdbcDatasource"
      create-statements:
        - |
          CREATE TABLE documents (
          filename TEXT,
          chunk_id int,
          num_tokens int,
          lang TEXT,
          text TEXT,
          embeddings_vector FLOATA,
          PRIMARY KEY (filename, chunk_id));
      delete-statements:
        - "DROP TABLE IF EXISTS documents"

You can specify any number of statements in the "create-statements" and in the "delete-statements" sections, for instance to create indexes or other objects.

Writing to a JDBC datasource

Use the "vector-db-sink" agent with the following parameters to write to a JDBC database:

  - name: "Write"
    type: "vector-db-sink"
    input: chunks-topic
    configuration:
      datasource: "JdbcDatasource"
      table-name: "documents"
      fields:
        - name: "filename"
          expression: "value.filename"
          primary-key: true
        - name: "chunk_id"
          expression: "value.chunk_id"
          primary-key: true
        - name: "embeddings_vector"
          expression: "fn:toListOfFloat(value.embeddings_vector)"
        - name: "lang"
          expression: "value.language"
        - name: "text"
          expression: "value.text"
        - name: "num_tokens"
          expression: "value.chunk_num_tokens"

Configuration

PreviousSolrNextOpenSearch

Last updated 1 year ago

Set the table-name to the name of the table you want to write to. Define the fields in the "fields" list. This works similarly to the , and you can use the same syntax to define the fields. It is important that you tag the fields that are part of the primary key of the table with "primary-key: true". This is needed to correctly manage upserts and deletion from the table.

Check out the full configuration properties in the .

'compute' agent
API Reference page