SOLO HACKATHON WINNER - Beat 10+ teams in 48 hours. Built full-stack ML platform detecting extremist content at 94% accuracy. Technical stack: BERT for context (768-dim embeddings), CNN for classification (3 conv layers), custom NLP for user profiling. Processed 100 Reddit users last 100 posts in 2 minutes on laptop CPU. Frontend: React with WebSocket updates. Backend: FastAPI + PostgreSQL. Connected with Rust because why not learn during a hackathon? Special feature: User radicalization timeline showing progression over months.
Problem
Analyzing users and subreddits for racist and extremist content; measuring average racism levels across communities.
Solution
Built an AI/ML pipeline with concise UI and analytics to detect and summarize high-risk content and users.
The Reddit Racism Analyzer is a groundbreaking AI-powered web application that revolutionizes content moderation by detecting and analyzing racist and hateful content across Reddit communities. Built during the LSE Code Camp 2025, this comprehensive tool leverages an ensemble of state-of-the-art machine learning models to provide detailed insights for both individual users and entire communities.
🏆 Overall Winner - LSE Code Camp 2025 Managing Information Systems and Digital Innovation Track
The project emerged from recognizing the critical need for scalable, accurate content moderation tools as online communities grow exponentially. Traditional moderation approaches struggle with context, scale, and consistency—this AI-driven solution addresses these challenges while maintaining ethical considerations and reducing moderator burnout.
Competition Context
LSE Code Camp 2025
Event: London School of Economics Code Camp 2025
Track: Managing Information Systems and Digital Innovation
Duration: 48-hour intensive development competition
Participants: 200+ students from top universities across Europe
Judges: Industry experts from tech companies and academic institutions
Achievement: Overall Winner across all tracks
Problem Statement
Modern online communities face an epidemic of hate speech and racist content that:
Overwhelms human moderators with volume and psychological burden
Creates inconsistent enforcement across similar content
Fails to identify subtle or coded racist language
Lacks scalable solutions for growing communities
Provides no insights into community health trends
Solution Architecture
Core Innovation: Multi-Model Ensemble Approach
Unlike single-model solutions that often produce false positives or miss nuanced content, our system combines three specialized AI models:
Toxic-BERT: Optimized for general toxicity detection
Hate Speech Model: Specialized for social media hate detection
Twitter-RoBERTa: Expert in short-form content analysis
This ensemble approach achieves 85%+ accuracy while reducing false positives by 40% compared to single-model systems.
Technical Architecture
Backend Infrastructure
Flask Framework: Lightweight, scalable web application framework
Transformers Library: Hugging Face state-of-the-art NLP models
SQLite Database: Intelligent caching system for performance optimization
ThreadPoolExecutor: Parallel processing for concurrent analysis
Reddit API Integration: Real-time data ingestion with rate limiting
AI Processing Pipeline
Content Preprocessing: Text normalization, emoji handling, and context extraction
Multi-Model Inference: Parallel processing through ensemble models
Confidence Scoring: Weighted averaging with uncertainty quantification
Context Analysis: Educational content detection to reduce false positives
Temporal Tracking: Pattern analysis across user history timelines
Frontend Experience
Responsive Design: Mobile-first interface optimized for moderators
Real-time Progress: Live updates during analysis with WebSocket connections
Interactive Reports: Dynamic visualizations with drill-down capabilities
Professional Export: PDF reports for documentation and compliance
Key Features & Capabilities
User Analysis Engine
Comprehensive Profiling: Analyzes entire user post/comment history
Risk Scoring: 0-1 scale with 6-level classification system
Pattern Detection: Identifies escalation trends and behavioral changes
Content Flagging: Highlights specific problematic posts with context
Temporal Analysis: Tracks racism patterns over time periods
Community Health Assessment
Subreddit Analysis: Evaluates entire community health metrics
Risk Distribution: Statistical breakdown of user risk levels
Cross-Community Patterns: Identifies users active in multiple problematic spaces
Moderation Insights: Actionable recommendations for community improvement
Advanced Analytics
Network Analysis: Maps relationships between problematic users
Trend Identification: Detects emerging hate speech patterns
Comparative Benchmarking: Community health vs. similar subreddits
Predictive Modeling: Early warning system for community degradation
Technical Challenges & Solutions
1. Context-Aware False Positive Reduction
Challenge: Educational content, historical discussions, and academic research often contain racist language in non-harmful contexts
Solution: Developed context classification model that identifies educational intent, reducing false positives by 40%
2. Real-time Processing at Scale
Challenge: Analyzing 100+ posts per user while maintaining responsive user experience
Solution: Implemented parallel processing with ThreadPoolExecutor and intelligent caching, achieving sub-60-second analysis times
3. Model Ensemble Optimization
Challenge: Balancing accuracy across different types of racist content while maintaining performance
Solution: Created weighted ensemble system with confidence scoring and specialized model routing
4. Ethical AI Implementation
Challenge: Ensuring fair, unbiased analysis across different communities and user types
Solution: Implemented bias detection metrics and transparent confidence scoring with human review recommendations
Innovation Highlights
1. 6-Level Classification System
Revolutionary rating system from "Anti-Racist" to "CEO of Racism" provides nuanced understanding beyond binary classification:
Hey! I'm Naoise. Ask me anything about my work in AI, product management, or how I built that hackathon project solo. I'll give you the real story, not the LinkedIn version.