Content Filtering & Safety
Set up automated content filtering to keep your hub safe and welcoming. Learn pattern matching, filter actions, and best practices.
Content Filtering & Safety
Automated content filtering is your first line of defense against inappropriate content. This guide covers setting up effective filters, understanding pattern matching, and implementing a layered safety approach.
What You'll Learn
- How to configure automated content filtering
- Pattern matching techniques for effective filtering
- Choosing the right filter actions
- Building a comprehensive safety strategy
- Balancing automation with human moderation
Understanding Content Filtering
How Filtering Works
InterChat's content filtering system operates in real-time:
- Message Analysis - Every message is checked against your filter patterns
- Pattern Matching - Text is compared using various matching techniques
- Action Execution - Violations trigger automated responses
- Logging - All filter actions are recorded for review
Filter Actions Explained
Prevents message from appearing
When to use:
- Obvious inappropriate content
- Spam patterns
- Prohibited links or content
- Clear rule violations
What happens:
- Message is stopped before reaching other servers
- Original sender sees their message normally
- Other servers never see the blocked content
- Action is logged for moderator review
Best for: High-confidence violations that should never appear
Automatically removes user from hub (10 minutes)
When to use:
- Severe violations (hate speech, harassment)
- Spam bots or malicious accounts
- Content that requires immediate removal
- Patterns indicating bad faith participation
What happens:
- User is immediately blacklisted for 10 minutes
- All their recent messages are flagged for review
- They cannot participate until blacklist expires
- Moderators are notified of the action
Best for: Serious violations requiring immediate action
Notifies moderators for human review
When to use:
- Borderline content needing context
- New or experimental filter patterns
- Content that might have false positives
- Situations requiring human judgment
What happens:
- Message appears normally in all servers
- Alert sent to moderators with violation details
- Moderators can take action if needed
- Helps refine filter patterns over time
Best for: Content that needs human evaluation
Setting Up Content Filtering
Access Filter Configuration
/hub config anti-swear hub:YourHubName
This opens the content filtering management interface.
Create Your First Filter
- Click "Add New Filter"
- Enter your pattern (see pattern guide below)
- Choose the appropriate action
- Add a description for your team
- Save the filter
Test Your Filters
- Start with "Send Alert" action for new patterns
- Test with trusted community members
- Monitor alerts for false positives
- Adjust patterns based on results
Refine and Optimize
- Review filter logs regularly
- Update patterns based on new violations
- Remove or modify filters causing false positives
- Share effective patterns with your moderation team
Pattern Matching Guide
Pattern Types
Pattern: word
Matches: Only the exact word "word"
Examples:
- ✅ "word" (matches)
- ❌ "words" (doesn't match)
- ❌ "keyword" (doesn't match)
- ❌ "sword" (doesn't match)
Best for:
- Specific prohibited terms
- Exact phrases or commands
- Brand names or specific references
- When precision is critical
Use case: Blocking specific slurs or exact spam phrases
Pattern: word*
Matches: Any word starting with "word"
Examples:
- ✅ "word" (matches)
- ✅ "words" (matches)
- ✅ "wordsmith" (matches)
- ❌ "keyword" (doesn't match)
- ❌ "sword" (doesn't match)
Best for:
- Word variations and plurals
- Terms with common suffixes
- Catching creative spelling attempts
- Broad category filtering
Use case: Blocking "spam*" to catch "spam", "spammer", "spamming"
Pattern: *word
Matches: Any word ending with "word"
Examples:
- ✅ "word" (matches)
- ✅ "keyword" (matches)
- ✅ "password" (matches)
- ❌ "words" (doesn't match)
- ❌ "wordsmith" (doesn't match)
Best for:
- Common word endings
- Suffix-based violations
- Catching variations of prohibited terms
- Technical term filtering
Use case: Blocking "*bot" to catch "spambot", "autobot", etc.
Pattern: *word*
Matches: Any text containing "word"
Examples:
- ✅ "word" (matches)
- ✅ "words" (matches)
- ✅ "keyword" (matches)
- ✅ "sword" (matches)
- ✅ "wordsmith" (matches)
Best for:
- Broad content filtering
- Catching creative spelling
- General topic restrictions
- When context doesn't matter
Use case: Blocking "discord.gg" to catch all Discord invites
⚠️ Warning: Can cause many false positives
Advanced Pattern Strategies
Layered Filtering Approach:
- Exact matches for known violations
- Prefix/suffix matches for variations
- Contains matches for broad categories
- Alert actions for borderline content
Example Filter Set for Spam Prevention:
Pattern: "discord.gg" | Action: Block | Description: Discord invite links
Pattern: "bit.ly*" | Action: Send Alert | Description: Shortened URLs for review
Pattern: "*free nitro*" | Action: Blacklist | Description: Common scam phrase
Pattern: "join my server" | Action: Send Alert | Description: Potential server advertising
Building Effective Filter Lists
Common Filter Categories
Profanity and Hate Speech:
Approach:
- Start with exact matches for clear violations
- Use prefix/suffix for variations
- Consider context and community standards
- Regular review and updates needed
Example Patterns:
[slur] | Block | Exact hate speech term
[profanity]* | Block | Profanity and variations
*[offensive-term]* | Send Alert | Borderline content
Best Practices:
- Research community-specific terms
- Consider cultural and regional differences
- Balance strictness with false positives
- Have clear appeal processes
Spam and Scam Prevention:
Common Spam Patterns:
*free nitro* | Blacklist | Nitro scams
*click here* | Send Alert | Suspicious links
discord.gg | Block | Invite links
*dm me* | Send Alert | Potential spam
*check my bio* | Send Alert | Profile spam
Scam Indicators:
- Promises of free premium services
- Urgent action required language
- External link requests
- Too-good-to-be-true offers
Strategy:
- Block obvious scams immediately
- Alert on suspicious patterns
- Monitor for new scam techniques
NSFW Content Filtering:
Technical Approach:
- Enable built-in NSFW detection
- Add text-based NSFW filters
- Consider community standards
- Clear policies on borderline content
Example Filters:
*nsfw* | Block | NSFW references
*18+* | Send Alert | Age-restricted content
[explicit-terms] | Block | Sexual content
Considerations:
- Community age demographics
- Professional vs. casual environments
- Cultural sensitivity
- Clear content policies
Server Advertising Prevention:
Advertising Patterns:
discord.gg | Block | Discord invites
*join my server* | Send Alert | Direct advertising
*new server* | Send Alert | Server promotion
*looking for members* | Send Alert | Recruitment
Balanced Approach:
- Block direct invite links
- Alert on promotional language
- Allow legitimate server mentions
- Consider hub purpose (networking vs. focused discussion)
Exceptions:
- Collaborative project invitations
- Educational server sharing
- Hub-approved partnerships
Filter Maintenance
Regular Review Schedule:
Weekly:
- Review alert logs for new patterns
- Check for false positives
- Update patterns based on violations
Monthly:
- Comprehensive filter effectiveness review
- Community feedback incorporation
- Pattern optimization and cleanup
Quarterly:
- Complete filter strategy assessment
- Community standards review
- Technology and trend updates
Advanced Safety Features
Hub-Level Safety Settings
/hub config settings hub:YourHubName
Essential Safety Settings:
Block NSFW 🔞
- Automatically detects and blocks inappropriate images
- Uses Discord's built-in NSFW detection
- Recommended for most communities
Spam Filter 🛡️
- Detects and prevents spam messages
- Includes rate limiting and pattern detection
- Should always be enabled
Hide Links 🔗
- Prevents all links from being sent
- Extreme measure for high-risk situations
- Consider impact on legitimate sharing
Block Invites 📨
- Prevents Discord invite links
- Reduces server advertising and raids
- Consider hub purpose (networking vs. focused)
Use Nicknames 👤
- Shows server-specific nicknames
- Can help or hurt depending on community
- Consider consistency vs. personalization
Reactions 😀
- Allows cross-server emoji reactions
- Generally positive for community building
- Can be disabled if misused
Message Formatting
- Rich embeds and formatting
- Usually beneficial for communication
- Disable if causing display issues
Layered Security Approach
Level 1: Automated Filtering
- Content filters for obvious violations
- NSFW detection for inappropriate images
- Spam filters for repetitive content
- Invite blocking for unwanted advertising
Level 2: Community Reporting
- Easy reporting system for users
- Quick moderator notification
- Context-aware human review
- Community-driven safety
Level 3: Active Moderation
- Human oversight and intervention
- Complex situation handling
- Appeal and review processes
- Community relationship building
Level 4: Emergency Response
- Rapid response for serious violations
- Temporary hub restrictions
- Coordination with Discord Trust & Safety
- Crisis communication plans
Best Practices
Filter Design Principles
Start Conservative:
- Begin with "Send Alert" for new patterns
- Gradually increase strictness based on results
- Monitor for false positives carefully
- Adjust based on community feedback
Be Specific:
- Prefer exact matches when possible
- Use broader patterns only when necessary
- Document the reasoning behind each filter
- Regular review and refinement
Consider Context:
- What's appropriate varies by community
- Professional vs. casual environments
- Age demographics and cultural factors
- Hub purpose and goals
Community Communication
Transparency:
- Explain filtering policies clearly
- Provide examples of what's not allowed
- Offer appeal processes for mistakes
- Regular policy updates and communication
Education:
- Help members understand the rules
- Provide guidance on appropriate content
- Share the reasoning behind policies
- Encourage positive community culture
Avoiding Common Pitfalls
Over-Filtering:
- Too many false positives frustrate users
- Overly broad patterns catch innocent content
- Excessive automation reduces human judgment
- Community feels unwelcome or restricted
Under-Filtering:
- Inappropriate content damages community
- Members leave due to poor environment
- Reputation and growth suffer
- Moderation team becomes overwhelmed
Inconsistent Enforcement:
- Different standards for different users
- Unclear or changing policies
- Lack of documentation and training
- Community loses trust in moderation
Monitoring and Analytics
Key Metrics to Track
Filter Effectiveness:
- Number of violations caught
- False positive rate
- Community feedback on filtering
- Moderator workload reduction
Community Health:
- Report frequency and types
- Member satisfaction with safety
- Retention rates and growth
- Quality of discussions
Regular Assessment
Monthly Filter Review:
- Analyze filter logs and statistics
- Identify patterns in violations
- Assess false positive rates
- Gather community feedback
- Update filters based on findings
Quarterly Safety Audit:
- Comprehensive policy review
- Community standards assessment
- Technology and trend updates
- Moderation team training needs
- Strategic safety planning
Next Steps
Your community is safer! Continue building comprehensive safety with these advanced topics:
- Activity Logging - Monitor and track all hub activity
- Moderation Team Building - Scale your human moderation
- Crisis Management - Handle serious incidents effectively
- Community Guidelines - Develop comprehensive policies
Need help with content filtering? Join our support community to get advice from experienced moderators and safety experts!