Implements plugin-based content filtering system with multi-level caching: Core Components: - FilterEngine: Main orchestrator for content filtering - FilterCache: 3-level caching (memory, AI results, filterset results) - FilterConfig: Configuration loader for filter_config.json & filtersets.json - FilterResult & AIAnalysisResult: Data models for filter results Architecture: - BaseStage: Abstract class for pipeline stages - BaseFilterPlugin: Abstract class for filter plugins - Multi-threaded parallel processing support - Content-hash based AI result caching (cost savings) - Filterset result caching (fast filterset switching) Configuration: - filter_config.json: AI models, caching, parallel workers - Using only Llama 70B for cost efficiency - Compatible with existing filtersets.json Integration: - apply_filterset() API compatible with user preferences - process_batch() for batch post processing - Lazy-loaded stages to avoid import errors when AI disabled Related to issue #8 (filtering engine implementation) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
28 lines
667 B
JSON
28 lines
667 B
JSON
{
|
|
"ai": {
|
|
"enabled": false,
|
|
"openrouter_key_file": "openrouter_key.txt",
|
|
"models": {
|
|
"cheap": "meta-llama/llama-3.3-70b-instruct",
|
|
"smart": "meta-llama/llama-3.3-70b-instruct"
|
|
},
|
|
"parallel_workers": 10,
|
|
"timeout_seconds": 60,
|
|
"note": "Using only Llama 70B for cost efficiency"
|
|
},
|
|
"cache": {
|
|
"enabled": true,
|
|
"ai_cache_dir": "data/filter_cache",
|
|
"filterset_cache_ttl_hours": 24
|
|
},
|
|
"pipeline": {
|
|
"default_stages": ["categorizer", "moderator", "filter", "ranker"],
|
|
"batch_size": 50,
|
|
"enable_parallel": true
|
|
},
|
|
"output": {
|
|
"filtered_dir": "data/filtered",
|
|
"save_rejected": false
|
|
}
|
|
}
|