Add filter pipeline core infrastructure (Phase 1)

Implements plugin-based content filtering system with multi-level caching: Core Components: - FilterEngine: Main orchestrator for content filtering - FilterCache: 3-level caching (memory, AI results, filterset results) - FilterConfig: Configuration loader for filter_config.json & filtersets.json - FilterResult & AIAnalysisResult: Data models for filter results Architecture: - BaseStage: Abstract class for pipeline stages - BaseFilterPlugin: Abstract class for filter plugins - Multi-threaded parallel processing support - Content-hash based AI result caching (cost savings) - Filterset result caching (fast filterset switching) Configuration: - filter_config.json: AI models, caching, parallel workers - Using only Llama 70B for cost efficiency - Compatible with existing filtersets.json Integration: - apply_filterset() API compatible with user preferences - process_batch() for batch post processing - Lazy-loaded stages to avoid import errors when AI disabled Related to issue #8 (filtering engine implementation) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-11 22:46:10 -05:00
parent 07df6d8f0a
commit 94e12041ec
10 changed files with 1143 additions and 0 deletions
--- a/filter_config.json
+++ b/filter_config.json
@@ -0,0 +1,27 @@
+{
+  "ai": {
+    "enabled": false,
+    "openrouter_key_file": "openrouter_key.txt",
+    "models": {
+      "cheap": "meta-llama/llama-3.3-70b-instruct",
+      "smart": "meta-llama/llama-3.3-70b-instruct"
+    },
+    "parallel_workers": 10,
+    "timeout_seconds": 60,
+    "note": "Using only Llama 70B for cost efficiency"
+  },
+  "cache": {
+    "enabled": true,
+    "ai_cache_dir": "data/filter_cache",
+    "filterset_cache_ttl_hours": 24
+  },
+  "pipeline": {
+    "default_stages": ["categorizer", "moderator", "filter", "ranker"],
+    "batch_size": 50,
+    "enable_parallel": true
+  },
+  "output": {
+    "filtered_dir": "data/filtered",
+    "save_rejected": false
+  }
+}