RAG Implementation Guide
Hybrid search and sentence window retrieval for optimal knowledge integration.
15 min read
Advanced level
Sentence Window RAG Architecture
Traditional RAG systems lose context when chunking documents. Our sentence window approach solves this by:
Traditional RAG Problems
- • Chunks lose surrounding context
- • References become unclear ("X is a Y" without Y definition)
- • Poor performance on detailed questions
- • Context boundaries break logical flow
Sentence Window Solution
- • Embed individual sentences for precision
- • Retrieve surrounding context window
- • Maintain logical document flow
- • Better handling of references and definitions
Implementation Details
Chunking Strategy
Sentence Window Configuration
// Configure sentence window processing
await client.addKnowledge(agentId, {
type: 'text',
source: documentContent,
metadata: {
title: 'API Documentation',
category: 'technical'
},
processingOptions: {
strategy: 'sentence_window',
settings: {
sentencesPerChunk: 3, // Embed 3 sentences together
windowSize: 7, // Retrieve ±3 sentences context
overlapSentences: 1, // 1 sentence overlap between chunks
minChunkLength: 50, // Minimum chunk character length
maxChunkLength: 500 // Maximum chunk character length
}
}
});Hybrid Search Implementation
Vector + Keyword Search
// Configure hybrid search for optimal retrieval
const queryOptions = {
retrievalStrategy: 'hybrid',
searchWeights: {
vectorSimilarity: 0.7, // 70% semantic similarity
keywordMatch: 0.3 // 30% exact keyword matching
},
filters: {
category: ['technical', 'api'],
minSimilarity: 0.6,
maxResults: 5
},
reranking: {
enabled: true,
model: 'cross-encoder', // Specialized reranking model
threshold: 0.8
}
};
const response = await client.query(agentId, {
message: userQuery,
retrievalOptions: queryOptions
});Custom Embeddings
Domain-Specific Embeddings
// Use domain-specific embedding models
await client.configureEmbeddings(agentId, {
model: 'text-embedding-ada-002', // or custom fine-tuned model
dimensions: 1536,
customSettings: {
domainSpecific: true,
industry: 'healthcare', // Optimize for domain
language: 'en',
preprocessing: {
removeStopWords: false, // Keep context words
normalizeCase: true,
removeSpecialChars: false // Keep technical symbols
}
}
});
// Batch process multiple knowledge sources
const sources = [
{ url: 'https://docs.medical-api.com', category: 'api' },
{ url: 'https://medical-guidelines.org', category: 'clinical' },
{ text: companyPolicies, category: 'internal' }
];
await client.batchAddKnowledge(agentId, sources, {
processingOptions: {
strategy: 'sentence_window',
parallel: true,
maxConcurrency: 3
}
});Search Optimization
Query Preprocessing
Query Enhancement Pipeline
class QueryProcessor {
async enhanceQuery(originalQuery: string, context: any) {
// Step 1: Extract intent and entities
const intent = await this.extractIntent(originalQuery);
const entities = await this.extractEntities(originalQuery);
// Step 2: Expand query with synonyms and context
const expandedQuery = await this.expandQuery(originalQuery, {
synonyms: true,
context: context.previousQueries,
domain: context.category
});
// Step 3: Generate multiple query variations
const queryVariations = [
originalQuery,
expandedQuery,
await this.paraphraseQuery(originalQuery),
await this.addTechnicalTerms(originalQuery, entities)
];
return {
primaryQuery: originalQuery,
variations: queryVariations,
intent,
entities,
searchStrategy: this.determineSearchStrategy(intent)
};
}
private determineSearchStrategy(intent: string): 'vector' | 'keyword' | 'hybrid' {
if (intent === 'factual_lookup') return 'keyword';
if (intent === 'conceptual') return 'vector';
return 'hybrid';
}
}Performance Tuning
Embedding Optimization
- Use batch embedding for multiple sources
- Cache embeddings to avoid recomputation
- Use appropriate embedding dimensions
- Optimize chunk sizes for your domain
Search Performance
- Implement result caching for common queries
- Use similarity thresholds to filter irrelevant results
- Limit retrieval count based on context window
- Monitor and tune search parameters
RAG Performance Monitoring
Knowledge Retrieval Analytics
// Monitor RAG performance
const ragMetrics = await client.getKnowledgeAnalytics({
timeframe: '7d',
agentId: agentId,
metrics: [
'retrieval_accuracy', // How often retrieved chunks are relevant
'context_completeness', // Whether full context is captured
'search_latency', // Time to retrieve relevant chunks
'embedding_cache_hits' // Cache efficiency metrics
]
});
// A/B test different RAG configurations
const ragTest = await client.createABTest({
name: 'RAG Configuration Test',
variants: {
variant_a: {
chunkSize: 3,
windowSize: 5,
searchWeight: { vector: 0.8, keyword: 0.2 }
},
variant_b: {
chunkSize: 5,
windowSize: 7,
searchWeight: { vector: 0.6, keyword: 0.4 }
}
},
success_metrics: ['retrieval_accuracy', 'response_relevance']
});