Commit Graph

2 Commits

Author SHA1 Message Date
a3ea1e9bdb Add filter pipeline stages and plugins (Phase 2 & 3)
Implements complete content filtering pipeline with AI-powered analysis:

Phase 2 - Pipeline Stages:
- CategorizerStage: AI topic detection with content-hash caching
- ModeratorStage: Safety/quality analysis (violence, hate speech, quality scores)
- FilterStage: Fast rule-based filtering from filtersets.json
- RankerStage: Multi-factor scoring (quality, recency, source tier, engagement)

Phase 3 - Filter Plugins:
- KeywordFilterPlugin: Blocklist/allowlist keyword filtering
- QualityFilterPlugin: Quality metrics (length, caps, clickbait detection)

AI Client:
- OpenRouterClient: Llama 70B integration with retry logic
- Methods: categorize(), moderate(), score_quality(), analyze_sentiment()
- Content-hash based caching for cost efficiency

Pipeline Flow:
Raw Post → Categorizer → Moderator → Filter → Ranker → Scored Post

Key Features:
- All AI results cached permanently by content hash
- Parallel processing support (10 workers)
- Fallback modes when AI disabled
- Comprehensive scoring breakdown
- Plugin architecture for extensibility

Related to filtering engine implementation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-11 22:54:38 -05:00
94e12041ec Add filter pipeline core infrastructure (Phase 1)
Implements plugin-based content filtering system with multi-level caching:

Core Components:
- FilterEngine: Main orchestrator for content filtering
- FilterCache: 3-level caching (memory, AI results, filterset results)
- FilterConfig: Configuration loader for filter_config.json & filtersets.json
- FilterResult & AIAnalysisResult: Data models for filter results

Architecture:
- BaseStage: Abstract class for pipeline stages
- BaseFilterPlugin: Abstract class for filter plugins
- Multi-threaded parallel processing support
- Content-hash based AI result caching (cost savings)
- Filterset result caching (fast filterset switching)

Configuration:
- filter_config.json: AI models, caching, parallel workers
- Using only Llama 70B for cost efficiency
- Compatible with existing filtersets.json

Integration:
- apply_filterset() API compatible with user preferences
- process_batch() for batch post processing
- Lazy-loaded stages to avoid import errors when AI disabled

Related to issue #8 (filtering engine implementation)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-11 22:46:10 -05:00