Add filter pipeline core infrastructure (Phase 1)

Implements plugin-based content filtering system with multi-level caching:

Core Components:
- FilterEngine: Main orchestrator for content filtering
- FilterCache: 3-level caching (memory, AI results, filterset results)
- FilterConfig: Configuration loader for filter_config.json & filtersets.json
- FilterResult & AIAnalysisResult: Data models for filter results

Architecture:
- BaseStage: Abstract class for pipeline stages
- BaseFilterPlugin: Abstract class for filter plugins
- Multi-threaded parallel processing support
- Content-hash based AI result caching (cost savings)
- Filterset result caching (fast filterset switching)

Configuration:
- filter_config.json: AI models, caching, parallel workers
- Using only Llama 70B for cost efficiency
- Compatible with existing filtersets.json

Integration:
- apply_filterset() API compatible with user preferences
- process_batch() for batch post processing
- Lazy-loaded stages to avoid import errors when AI disabled

Related to issue #8 (filtering engine implementation)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2025-10-11 22:46:10 -05:00
parent 07df6d8f0a
commit 94e12041ec
10 changed files with 1143 additions and 0 deletions

View File

@@ -0,0 +1,8 @@
"""
Filter Plugins
Pluggable filters for content filtering.
"""
from .base import BaseFilterPlugin
__all__ = ['BaseFilterPlugin']

View File

@@ -0,0 +1,66 @@
"""
Base Filter Plugin
Abstract base class for all filter plugins.
"""
from abc import ABC, abstractmethod
from typing import Dict, Any, Optional
class BaseFilterPlugin(ABC):
"""
Abstract base class for filter plugins.
Plugins can be used within stages to implement specific filtering logic.
Examples: keyword filtering, AI-based filtering, quality scoring, etc.
"""
def __init__(self, config: Dict[str, Any]):
"""
Initialize plugin.
Args:
config: Plugin configuration dictionary
"""
self.config = config
self.enabled = config.get('enabled', True)
@abstractmethod
def should_filter(self, post: Dict[str, Any], context: Optional[Dict] = None) -> bool:
"""
Determine if post should be filtered OUT.
Args:
post: Post data dictionary
context: Optional context from previous stages
Returns:
True if post should be filtered OUT (rejected), False to keep it
"""
pass
@abstractmethod
def score(self, post: Dict[str, Any], context: Optional[Dict] = None) -> float:
"""
Calculate relevance/quality score for post.
Args:
post: Post data dictionary
context: Optional context from previous stages
Returns:
Score from 0.0 (lowest) to 1.0 (highest)
"""
pass
@abstractmethod
def get_name(self) -> str:
"""Get plugin name for logging"""
pass
def is_enabled(self) -> bool:
"""Check if plugin is enabled"""
return self.enabled
def __repr__(self) -> str:
return f"<{self.get_name()} enabled={self.enabled}>"