Add filter pipeline core infrastructure (Phase 1)

Implements plugin-based content filtering system with multi-level caching: Core Components: - FilterEngine: Main orchestrator for content filtering - FilterCache: 3-level caching (memory, AI results, filterset results) - FilterConfig: Configuration loader for filter_config.json & filtersets.json - FilterResult & AIAnalysisResult: Data models for filter results Architecture: - BaseStage: Abstract class for pipeline stages - BaseFilterPlugin: Abstract class for filter plugins - Multi-threaded parallel processing support - Content-hash based AI result caching (cost savings) - Filterset result caching (fast filterset switching) Configuration: - filter_config.json: AI models, caching, parallel workers - Using only Llama 70B for cost efficiency - Compatible with existing filtersets.json Integration: - apply_filterset() API compatible with user preferences - process_batch() for batch post processing - Lazy-loaded stages to avoid import errors when AI disabled Related to issue #8 (filtering engine implementation) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-11 22:46:10 -05:00
parent 07df6d8f0a
commit 94e12041ec
10 changed files with 1143 additions and 0 deletions
--- a/filter_pipeline/plugins/init.py
+++ b/filter_pipeline/plugins/init.py
@@ -0,0 +1,8 @@
+"""
+Filter Plugins
+Pluggable filters for content filtering.
+"""
+
+from .base import BaseFilterPlugin
+
+__all__ = ['BaseFilterPlugin']
--- a/filter_pipeline/plugins/base.py
+++ b/filter_pipeline/plugins/base.py
@@ -0,0 +1,66 @@
+"""
+Base Filter Plugin
+Abstract base class for all filter plugins.
+"""
+
+from abc import ABC, abstractmethod
+from typing import Dict, Any, Optional
+
+
+class BaseFilterPlugin(ABC):
+    """
+    Abstract base class for filter plugins.
+
+    Plugins can be used within stages to implement specific filtering logic.
+    Examples: keyword filtering, AI-based filtering, quality scoring, etc.
+    """
+
+    def __init__(self, config: Dict[str, Any]):
+        """
+        Initialize plugin.
+
+        Args:
+            config: Plugin configuration dictionary
+        """
+        self.config = config
+        self.enabled = config.get('enabled', True)
+
+    @abstractmethod
+    def should_filter(self, post: Dict[str, Any], context: Optional[Dict] = None) -> bool:
+        """
+        Determine if post should be filtered OUT.
+
+        Args:
+            post: Post data dictionary
+            context: Optional context from previous stages
+
+        Returns:
+            True if post should be filtered OUT (rejected), False to keep it
+        """
+        pass
+
+    @abstractmethod
+    def score(self, post: Dict[str, Any], context: Optional[Dict] = None) -> float:
+        """
+        Calculate relevance/quality score for post.
+
+        Args:
+            post: Post data dictionary
+            context: Optional context from previous stages
+
+        Returns:
+            Score from 0.0 (lowest) to 1.0 (highest)
+        """
+        pass
+
+    @abstractmethod
+    def get_name(self) -> str:
+        """Get plugin name for logging"""
+        pass
+
+    def is_enabled(self) -> bool:
+        """Check if plugin is enabled"""
+        return self.enabled
+
+    def __repr__(self) -> str:
+        return f"<{self.get_name()} enabled={self.enabled}>"