Overview

SANDI Solr is a modern search and indexing API that combines the power of Apache Solr with cutting-edge Large Language Models (LLMs) to deliver semantic search, natural language processing (NLP), and Retrieval-Augmented Generation (RAG) capabilities. Built on Spring Boot 3 and Solr 9, SANDI-Solr API provides a complete, containerized solution for enterprises seeking advanced search functionality with AI integration.


One-Command Deployment with Docker Compose

One of SANDI Solr's most compelling features is its remarkably simple deployment process. The entire platform—including high-availability Solr cluster, ZooKeeper ensemble, embedding services, large language models, and NLP engine—can be launched with a single command:

/opt/sandi-solr$ docker-compose up -d

This single command orchestrates interconnected services that work together to provide enterprise-grade search capabilities. No complex configuration, no dependency hell—just a complete AI-powered search platform ready to index and search your content.


Comprehensive AI Stack

SANDI-Solr integrates multiple AI services to enable sophisticated semantic search and RAG capabilities:

Embedding Services

  • Qwen3 Embeddings: State-of-the-art multilingual embeddings
  • The embedding service runs on GPU for high-performance vector generation
sandi_emb3:80

Large Language Models

  • Qwen3 1.7B (sandi_llm1): Lightweight LLM for quick responses
  • Qwen3 4B (sandi_llm2): More powerful model for complex reasoning
  • Integrated and runs on GPU for local RAG applications
sandi_llm2:80

Natural Language Processing

  • SpaCy NLP (sandi_nlp1): Entity recognition, linguistic analysis
  • Enables advanced text processing and query understanding
sandi_nlp1:80

Re-Ranking Engine

  • Qwen3 Re-Ranker (sandi_rer1): GPU-accelerated result re-ranking
  • Improves search relevance by semantically reordering results
sandi_rer1:80

Architecture Highlights

Multi-Tenant Search Platform

SANDI supports multiple clients with isolated search configurations:

  • Client-specific Solr collections
  • Customizable field mappings (high-priority, low-priority, content fields)
  • Per-client synonym management
  • Flexible indexing and search workflows

High Availability Solr Cluster

The Docker deployment includes:

  • 2 Solr nodes (sandi_solr1, sandi_solr2)
  • 3-node ZooKeeper ensemble for distributed coordination
  • Automatic failover and load balancing
  • Configurable memory allocation per Solr node

Dual API Design

Search API

Port 8081 (sandi_search1)

Handles search queries with vector and legacy search fusion

Indexing API

Port 8082 (sandi_index1)

Manages document ingestion, parsing, and embedding generation

Service Ports Overview

Port Service Description
8081Search APIREST API for search queries
8082Indexing APIREST API for document indexing
8083Embedding ServiceText embedding generation
8084Language Model ServiceLLM for RAG and query expansion
8085NLP ServiceEntity extraction and text analysis
8086Re-Ranking ServiceSemantic result re-ordering
8087Client Search ProcessorWeb interface for search
8088Client Index ProcessorWeb interface for indexing
8981-8982Solr NodesApache Solr search engines
2181-2183ZooKeeper EnsembleDistributed coordination