Add filter pipeline core infrastructure (Phase 1)
Implements plugin-based content filtering system with multi-level caching: Core Components: - FilterEngine: Main orchestrator for content filtering - FilterCache: 3-level caching (memory, AI results, filterset results) - FilterConfig: Configuration loader for filter_config.json & filtersets.json - FilterResult & AIAnalysisResult: Data models for filter results Architecture: - BaseStage: Abstract class for pipeline stages - BaseFilterPlugin: Abstract class for filter plugins - Multi-threaded parallel processing support - Content-hash based AI result caching (cost savings) - Filterset result caching (fast filterset switching) Configuration: - filter_config.json: AI models, caching, parallel workers - Using only Llama 70B for cost efficiency - Compatible with existing filtersets.json Integration: - apply_filterset() API compatible with user preferences - process_batch() for batch post processing - Lazy-loaded stages to avoid import errors when AI disabled Related to issue #8 (filtering engine implementation) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
8
filter_pipeline/plugins/__init__.py
Normal file
8
filter_pipeline/plugins/__init__.py
Normal file
@@ -0,0 +1,8 @@
|
||||
"""
|
||||
Filter Plugins
|
||||
Pluggable filters for content filtering.
|
||||
"""
|
||||
|
||||
from .base import BaseFilterPlugin
|
||||
|
||||
__all__ = ['BaseFilterPlugin']
|
||||
66
filter_pipeline/plugins/base.py
Normal file
66
filter_pipeline/plugins/base.py
Normal file
@@ -0,0 +1,66 @@
|
||||
"""
|
||||
Base Filter Plugin
|
||||
Abstract base class for all filter plugins.
|
||||
"""
|
||||
|
||||
from abc import ABC, abstractmethod
|
||||
from typing import Dict, Any, Optional
|
||||
|
||||
|
||||
class BaseFilterPlugin(ABC):
|
||||
"""
|
||||
Abstract base class for filter plugins.
|
||||
|
||||
Plugins can be used within stages to implement specific filtering logic.
|
||||
Examples: keyword filtering, AI-based filtering, quality scoring, etc.
|
||||
"""
|
||||
|
||||
def __init__(self, config: Dict[str, Any]):
|
||||
"""
|
||||
Initialize plugin.
|
||||
|
||||
Args:
|
||||
config: Plugin configuration dictionary
|
||||
"""
|
||||
self.config = config
|
||||
self.enabled = config.get('enabled', True)
|
||||
|
||||
@abstractmethod
|
||||
def should_filter(self, post: Dict[str, Any], context: Optional[Dict] = None) -> bool:
|
||||
"""
|
||||
Determine if post should be filtered OUT.
|
||||
|
||||
Args:
|
||||
post: Post data dictionary
|
||||
context: Optional context from previous stages
|
||||
|
||||
Returns:
|
||||
True if post should be filtered OUT (rejected), False to keep it
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def score(self, post: Dict[str, Any], context: Optional[Dict] = None) -> float:
|
||||
"""
|
||||
Calculate relevance/quality score for post.
|
||||
|
||||
Args:
|
||||
post: Post data dictionary
|
||||
context: Optional context from previous stages
|
||||
|
||||
Returns:
|
||||
Score from 0.0 (lowest) to 1.0 (highest)
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def get_name(self) -> str:
|
||||
"""Get plugin name for logging"""
|
||||
pass
|
||||
|
||||
def is_enabled(self) -> bool:
|
||||
"""Check if plugin is enabled"""
|
||||
return self.enabled
|
||||
|
||||
def __repr__(self) -> str:
|
||||
return f"<{self.get_name()} enabled={self.enabled}>"
|
||||
Reference in New Issue
Block a user