RAG Implementation Guide

Hybrid search and sentence window retrieval for optimal knowledge integration.

15 min read

Advanced level

Sentence Window RAG Architecture

Traditional RAG systems lose context when chunking documents. Our sentence window approach solves this by:

Traditional RAG Problems

• Chunks lose surrounding context
• References become unclear ("X is a Y" without Y definition)
• Poor performance on detailed questions
• Context boundaries break logical flow

Sentence Window Solution

• Embed individual sentences for precision
• Retrieve surrounding context window
• Maintain logical document flow
• Better handling of references and definitions

Implementation Details

Chunking Strategy

Sentence Window Configuration

// Configure sentence window processing
await client.addKnowledge(agentId, {
  type: 'text',
  source: documentContent,
  metadata: {
    title: 'API Documentation',
    category: 'technical'
  },
  processingOptions: {
    strategy: 'sentence_window',
    settings: {
      sentencesPerChunk: 3,      // Embed 3 sentences together
      windowSize: 7,             // Retrieve ±3 sentences context
      overlapSentences: 1,       // 1 sentence overlap between chunks
      minChunkLength: 50,        // Minimum chunk character length
      maxChunkLength: 500        // Maximum chunk character length
    }
  }
});

Hybrid Search Implementation

Vector + Keyword Search

// Configure hybrid search for optimal retrieval
const queryOptions = {
  retrievalStrategy: 'hybrid',
  searchWeights: {
    vectorSimilarity: 0.7,     // 70% semantic similarity
    keywordMatch: 0.3          // 30% exact keyword matching
  },
  filters: {
    category: ['technical', 'api'],
    minSimilarity: 0.6,
    maxResults: 5
  },
  reranking: {
    enabled: true,
    model: 'cross-encoder',    // Specialized reranking model
    threshold: 0.8
  }
};

const response = await client.query(agentId, {
  message: userQuery,
  retrievalOptions: queryOptions
});

Custom Embeddings

Domain-Specific Embeddings

// Use domain-specific embedding models
await client.configureEmbeddings(agentId, {
  model: 'text-embedding-ada-002',  // or custom fine-tuned model
  dimensions: 1536,
  customSettings: {
    domainSpecific: true,
    industry: 'healthcare',          // Optimize for domain
    language: 'en',
    preprocessing: {
      removeStopWords: false,        // Keep context words
      normalizeCase: true,
      removeSpecialChars: false      // Keep technical symbols
    }
  }
});

// Batch process multiple knowledge sources
const sources = [
  { url: 'https://docs.medical-api.com', category: 'api' },
  { url: 'https://medical-guidelines.org', category: 'clinical' },
  { text: companyPolicies, category: 'internal' }
];

await client.batchAddKnowledge(agentId, sources, {
  processingOptions: {
    strategy: 'sentence_window',
    parallel: true,
    maxConcurrency: 3
  }
});

Search Optimization

Query Preprocessing

Query Enhancement Pipeline

class QueryProcessor {
  async enhanceQuery(originalQuery: string, context: any) {
    // Step 1: Extract intent and entities
    const intent = await this.extractIntent(originalQuery);
    const entities = await this.extractEntities(originalQuery);
    
    // Step 2: Expand query with synonyms and context
    const expandedQuery = await this.expandQuery(originalQuery, {
      synonyms: true,
      context: context.previousQueries,
      domain: context.category
    });
    
    // Step 3: Generate multiple query variations
    const queryVariations = [
      originalQuery,
      expandedQuery,
      await this.paraphraseQuery(originalQuery),
      await this.addTechnicalTerms(originalQuery, entities)
    ];
    
    return {
      primaryQuery: originalQuery,
      variations: queryVariations,
      intent,
      entities,
      searchStrategy: this.determineSearchStrategy(intent)
    };
  }

  private determineSearchStrategy(intent: string): 'vector' | 'keyword' | 'hybrid' {
    if (intent === 'factual_lookup') return 'keyword';
    if (intent === 'conceptual') return 'vector';
    return 'hybrid';
  }
}

Performance Tuning

Embedding Optimization

Use batch embedding for multiple sources
Cache embeddings to avoid recomputation
Use appropriate embedding dimensions
Optimize chunk sizes for your domain

Search Performance

Implement result caching for common queries
Use similarity thresholds to filter irrelevant results
Limit retrieval count based on context window
Monitor and tune search parameters

RAG Performance Monitoring

Knowledge Retrieval Analytics

// Monitor RAG performance
const ragMetrics = await client.getKnowledgeAnalytics({
  timeframe: '7d',
  agentId: agentId,
  metrics: [
    'retrieval_accuracy',    // How often retrieved chunks are relevant
    'context_completeness',  // Whether full context is captured
    'search_latency',        // Time to retrieve relevant chunks
    'embedding_cache_hits'   // Cache efficiency metrics
  ]
});

// A/B test different RAG configurations
const ragTest = await client.createABTest({
  name: 'RAG Configuration Test',
  variants: {
    variant_a: {
      chunkSize: 3,
      windowSize: 5,
      searchWeight: { vector: 0.8, keyword: 0.2 }
    },
    variant_b: {
      chunkSize: 5,
      windowSize: 7,
      searchWeight: { vector: 0.6, keyword: 0.4 }
    }
  },
  success_metrics: ['retrieval_accuracy', 'response_relevance']
});