Overview

SANDI Solr is a modern search and indexing API that combines the power of Apache Solr with cutting-edge Large Language Models (LLMs) to deliver semantic search, natural language processing (NLP), and Retrieval-Augmented Generation (RAG) capabilities. Built on Spring Boot 3 and Solr 9, SANDI-Solr API provides a complete, containerized solution for enterprises seeking advanced search functionality with AI integration.

One-Command Deployment with Docker Compose

One of SANDI Solr's most compelling features is its remarkably simple deployment process. The entire platform—including high-availability Solr cluster, ZooKeeper ensemble, embedding services, large language models, and NLP engine—can be launched with a single command:

/opt/sandi-solr$ docker-compose up -d

This single command orchestrates interconnected services that work together to provide enterprise-grade search capabilities. No complex configuration, no dependency hell—just a complete AI-powered search platform ready to index and search your content.

Comprehensive AI Stack

SANDI-Solr integrates multiple AI services to enable sophisticated semantic search and RAG capabilities:

Embedding Services

Qwen3 Embeddings: State-of-the-art multilingual embeddings
The embedding service runs on GPU for high-performance vector generation

sandi_emb3:80

Large Language Models

Qwen3 1.7B (sandi_llm1): Lightweight LLM for quick responses
Qwen3 4B (sandi_llm2): More powerful model for complex reasoning
Integrated and runs on GPU for local RAG applications

sandi_llm2:80

Natural Language Processing

SpaCy NLP (sandi_nlp1): Entity recognition, linguistic analysis
Enables advanced text processing and query understanding

sandi_nlp1:80

Re-Ranking Engine

Qwen3 Re-Ranker (sandi_rer1): GPU-accelerated result re-ranking
Improves search relevance by semantically reordering results

sandi_rer1:80

Architecture Highlights

Multi-Tenant Search Platform

SANDI supports multiple clients with isolated search configurations:

Client-specific Solr collections
Customizable field mappings (high-priority, low-priority, content fields)
Per-client synonym management
Flexible indexing and search workflows

High Availability Solr Cluster

The Docker deployment includes:

                        2 Solr nodes (sandi_solr1, sandi_solr2)
3-node ZooKeeper ensemble for distributed coordination
Automatic failover and load balancing
Configurable memory allocation per Solr node

                    

Dual API Design

Search API

Port 8081 (sandi_search1)

Handles search queries with vector and legacy search fusion

Indexing API

Port 8082 (sandi_index1)

Manages document ingestion, parsing, and embedding generation

Service Ports Overview

Port	Service	Description
`8081`	Search API	REST API for search queries
`8082`	Indexing API	REST API for document indexing
`8083`	Embedding Service	Text embedding generation
`8084`	Language Model Service	LLM for RAG and query expansion
`8085`	NLP Service	Entity extraction and text analysis
`8086`	Re-Ranking Service	Semantic result re-ordering
`8087`	Client Search Processor	Web interface for search
`8088`	Client Index Processor	Web interface for indexing
`8981-8982`	Solr Nodes	Apache Solr search engines
`2181-2183`	ZooKeeper Ensemble	Distributed coordination

SANDI Solr API

Table of Contents