RAG Implementation Guide

Hybrid search and sentence window retrieval for optimal knowledge integration.

15 min read
Advanced level

Sentence Window RAG Architecture

Traditional RAG systems lose context when chunking documents. Our sentence window approach solves this by:

Traditional RAG Problems

  • • Chunks lose surrounding context
  • • References become unclear ("X is a Y" without Y definition)
  • • Poor performance on detailed questions
  • • Context boundaries break logical flow

Sentence Window Solution

  • • Embed individual sentences for precision
  • • Retrieve surrounding context window
  • • Maintain logical document flow
  • • Better handling of references and definitions

Implementation Details

Chunking Strategy

Sentence Window Configuration
// Configure sentence window processing
await client.addKnowledge(agentId, {
  type: 'text',
  source: documentContent,
  metadata: {
    title: 'API Documentation',
    category: 'technical'
  },
  processingOptions: {
    strategy: 'sentence_window',
    settings: {
      sentencesPerChunk: 3,      // Embed 3 sentences together
      windowSize: 7,             // Retrieve ±3 sentences context
      overlapSentences: 1,       // 1 sentence overlap between chunks
      minChunkLength: 50,        // Minimum chunk character length
      maxChunkLength: 500        // Maximum chunk character length
    }
  }
});

Hybrid Search Implementation

Vector + Keyword Search
// Configure hybrid search for optimal retrieval
const queryOptions = {
  retrievalStrategy: 'hybrid',
  searchWeights: {
    vectorSimilarity: 0.7,     // 70% semantic similarity
    keywordMatch: 0.3          // 30% exact keyword matching
  },
  filters: {
    category: ['technical', 'api'],
    minSimilarity: 0.6,
    maxResults: 5
  },
  reranking: {
    enabled: true,
    model: 'cross-encoder',    // Specialized reranking model
    threshold: 0.8
  }
};

const response = await client.query(agentId, {
  message: userQuery,
  retrievalOptions: queryOptions
});

Custom Embeddings

Domain-Specific Embeddings
// Use domain-specific embedding models
await client.configureEmbeddings(agentId, {
  model: 'text-embedding-ada-002',  // or custom fine-tuned model
  dimensions: 1536,
  customSettings: {
    domainSpecific: true,
    industry: 'healthcare',          // Optimize for domain
    language: 'en',
    preprocessing: {
      removeStopWords: false,        // Keep context words
      normalizeCase: true,
      removeSpecialChars: false      // Keep technical symbols
    }
  }
});

// Batch process multiple knowledge sources
const sources = [
  { url: 'https://docs.medical-api.com', category: 'api' },
  { url: 'https://medical-guidelines.org', category: 'clinical' },
  { text: companyPolicies, category: 'internal' }
];

await client.batchAddKnowledge(agentId, sources, {
  processingOptions: {
    strategy: 'sentence_window',
    parallel: true,
    maxConcurrency: 3
  }
});

Search Optimization

Query Preprocessing

Query Enhancement Pipeline
class QueryProcessor {
  async enhanceQuery(originalQuery: string, context: any) {
    // Step 1: Extract intent and entities
    const intent = await this.extractIntent(originalQuery);
    const entities = await this.extractEntities(originalQuery);
    
    // Step 2: Expand query with synonyms and context
    const expandedQuery = await this.expandQuery(originalQuery, {
      synonyms: true,
      context: context.previousQueries,
      domain: context.category
    });
    
    // Step 3: Generate multiple query variations
    const queryVariations = [
      originalQuery,
      expandedQuery,
      await this.paraphraseQuery(originalQuery),
      await this.addTechnicalTerms(originalQuery, entities)
    ];
    
    return {
      primaryQuery: originalQuery,
      variations: queryVariations,
      intent,
      entities,
      searchStrategy: this.determineSearchStrategy(intent)
    };
  }

  private determineSearchStrategy(intent: string): 'vector' | 'keyword' | 'hybrid' {
    if (intent === 'factual_lookup') return 'keyword';
    if (intent === 'conceptual') return 'vector';
    return 'hybrid';
  }
}

Performance Tuning

Embedding Optimization

  • Use batch embedding for multiple sources
  • Cache embeddings to avoid recomputation
  • Use appropriate embedding dimensions
  • Optimize chunk sizes for your domain

Search Performance

  • Implement result caching for common queries
  • Use similarity thresholds to filter irrelevant results
  • Limit retrieval count based on context window
  • Monitor and tune search parameters

RAG Performance Monitoring

Knowledge Retrieval Analytics
// Monitor RAG performance
const ragMetrics = await client.getKnowledgeAnalytics({
  timeframe: '7d',
  agentId: agentId,
  metrics: [
    'retrieval_accuracy',    // How often retrieved chunks are relevant
    'context_completeness',  // Whether full context is captured
    'search_latency',        // Time to retrieve relevant chunks
    'embedding_cache_hits'   // Cache efficiency metrics
  ]
});

// A/B test different RAG configurations
const ragTest = await client.createABTest({
  name: 'RAG Configuration Test',
  variants: {
    variant_a: {
      chunkSize: 3,
      windowSize: 5,
      searchWeight: { vector: 0.8, keyword: 0.2 }
    },
    variant_b: {
      chunkSize: 5,
      windowSize: 7,
      searchWeight: { vector: 0.6, keyword: 0.4 }
    }
  },
  success_metrics: ['retrieval_accuracy', 'response_relevance']
});