Cache Settings
Configure Semcache's caching behavior to optimize performance for your workload.
Current Configuration
Semcache currently uses a configuration file to specify runtime configurations.
Default Settings
# Recommended for production (current default values)
log_level: info # Possible values: debug, info, warning, error, critical
similarity_threshold: 0.90 # Float value between 0 and 1
port: 8080
eviction_policy:
policy_type: memory_limit_mb # or entry_limit
value: 4096
These values are stored in config.yaml, but can be overriden with a custom file if required.
docker run -v /path/to/your/config.yaml:/app/config.yaml semcache/semcache:latest
Similarity Threshold
Current Behavior
- Default: 0.90 (0.8 cosine similarity required)
- Range: 0.0 to 1.0
- Algorithm: Cosine similarity
Entry Limits
Current Behavior
- Default: 4096mb cache size maximum
- Eviction: Least Recently Used (LRU)
- Memory: Automatic cleanup when memory pressure detected
Embedding Model
Current Model
- Model: AllMiniLML6V2
- Dimensions: 384
- Performance: ~50ms embedding generation
- Language: Optimized for English
Model Characteristics
Model: sentence-transformers/all-MiniLM-L6-v2
- Size: 23MB
- Speed: Fast
- Quality: Good for semantic similarity
- Languages: Primarily English
Storage Configuration
Current Storage
- Type: In-memory only
- Persistence: None (lost on restart)
- Backup: Not supported
Performance Tuning
Cache Hit Rate Optimization
Monitor hit rates:
- Visit admin dashboard at
http://localhost:8080/admin
- Track cache hits vs misses
- Adjust similarity threshold based on results
Tuning strategies:
- Lower threshold (0.85-0.9) for higher hit rates
- Higher threshold (0.95+) for more precise matching
- Increase entry limit for applications with diverse queries
Memory Optimization
Current limitations:
- No memory limits enforced
- Automatic cleanup based on system pressure
- LRU eviction when entry limit reached
Best practices:
- Monitor memory usage with system tools
- Set appropriate entry limits for available RAM
- Consider horizontal scaling for high-memory workloads
Development configuration
# Recommended for development
log_level: debug # Possible values: debug, info, warning, error, critical
similarity_threshold: 0.90 # Float value between 0 and 1
port: 8080
eviction_policy:
policy_type: entry_limit # or memory_limit_mb
value: 100