Introduction
Search lists are powerful tools in Aid4Mail that enable batch filtering using multiple keywords, patterns, and expressions. They're essential for large-scale investigations where you need to search for hundreds or thousands of terms efficiently.
When to Use Search Lists
Key Advantage
Aid4Mail processes search lists from top to bottom, stopping at the first match. This "first-match-wins" approach allows for sophisticated prioritization strategies that can dramatically improve search performance.
Basic Rules
Search lists follow specific formatting rules to ensure proper parsing and optimal performance.
File Format
- • Plain text file (.txt extension)
- • UTF-8 encoding recommended
- • One search term per line
- • Blank lines allowed for organization
What NOT to Include
- • No Boolean operators (AND, OR, NOT)
- • No search operators in the list
- • No comments or annotations
- • No quotes unless searching for them literally
Example Search List File
confidential proprietary trade secret insider trading corrupt* fraud* embezzle~ money<+3>laundering {[R]=\b(classified|restricted)\b}
Notice the organization with blank lines between related terms. This improves readability without affecting functionality.
Wildcards and Operators
Search lists support the same powerful wildcards and operators as Aid4Mail's regular filtering, including PCRE2 regular expressions.
Fast Performance
Plain Text (Fastest)
Especially fast with 20+ characters
intellectual property theft
Asterisk (*)
Zero or more characters
corrupt*
Question Mark (?)
Exactly one character
organi?e
Hash (#)
Zero or one non-alphanumeric
don#t
Medium Performance
~
Stemming
Finds word variations using dictionary
steal~ → steal, stole, stolen
Slower Performance (Use Sparingly)
<+n>
Ordered proximity
money<+3>laundering
<n>
Any order proximity
insider<5>trading
<.>
Same sentence
confidential<.>project
<*>
Same paragraph
trade<*>secret
{[R]=pattern}
Complex regex
{[R]=\b[A-Z]{2,4}-\d{4,6}\b}
Warning: Limit proximity operators to 2-3 per search term. Excessive use significantly impacts performance.
Performance Optimization
Understanding how Aid4Mail processes search lists is crucial for creating high-performance filters that can handle millions of emails efficiently.
Primary Performance Rule
Aid4Mail processes the list from top to bottom. As soon as a match is found, remaining terms are skipped. This means term ordering directly impacts performance.
- Most likely matches → Place at the TOP
- Least likely matches → Place at the BOTTOM
Optimization Technique
Group related plain terms into a single regex for better efficiency:
❌ Inefficient:
inappropriate unwelcome unwanted
✅ Efficient:
{[R]=\b(inappropriate|unwelcome|unwanted)\b}
Performance Impact
10x faster
With proper ordering
90% reduction
In processing time
Instant matches
For common terms
Search List Ordering
Proper ordering is the key to maximizing search list performance. Follow these prioritization rules for optimal results.
Ordering Priority (Top to Bottom)
Most Common Terms
Terms appearing in majority of emails
meeting, report, update, project
Industry/Context Specific
Terms common in your investigation domain
confidential, proprietary, contract
Simple Wildcards
Basic pattern matching
corrupt*, fraud*, *@company.com
Stemming Terms
Word variations
steal~, discriminate~, harass~
Proximity Searches
Terms near each other
insider<5>trading, money<+3>laundering
Complex Regex
Resource-intensive patterns
{[R]=\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b}
Best Practices
Follow these proven practices to create effective, maintainable, and high-performance search lists.
Do's
- Use
#
for apostrophes (don#t) - Group related terms with blank lines
- Combine similar plain terms into regex
- Test with sample data before production
- Use stemming for language variations
Don'ts
- Don't use overly broad terms alone
- Don't include redundant variations
- Don't use 3+ proximity operators
- Don't mix AND/OR/NOT in the list
- Don't forget to test edge cases
Advanced Techniques
Leverage regular expressions and advanced patterns for sophisticated filtering scenarios.
Email Address Pattern
{[R]=\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b}
Matches any email address format. Useful for finding communications with external parties.
Phone Number Pattern
{[R]=\b(\+?1[-.]?)?\(?[0-9]{3}\)?[-.]?[0-9]{3}[-.]?[0-9]{4}\b}
Matches various US phone number formats including international prefix.
Case/Reference Numbers
{[R]=\b[A-Z]{2,4}-\d{4,8}\b}
Matches patterns like CASE-2024001, REF-123456. Adjust the ranges for your specific format.
Real-World Examples
Complete search list examples for common investigation scenarios, optimized for performance.
Financial Fraud Investigation
# Common business terms (high frequency) {[R]=\b(meeting|report|update|invoice|payment)\b} # Financial terms (medium frequency) {[R]=\b(account|transfer|wire|deposit|withdrawal)\b} data retention policy audit trail # Fraud indicators (simple wildcards) embezzl* fraud* launder* misappropriat* # Stemming for variations steal~ manipulate~ falsif~ # Proximity searches (slower) money<+3>laundering offshore<5>account insider<.>trading # Complex patterns (slowest) {[R]=\$[0-9]{1,3}(,[0-9]{3})*(\.[0-9]{2})?} {[R]=\b[A-Z]{2,4}-\d{6,8}\b}
Performance Note: This list is optimized with common terms first, progressing to more complex patterns. Expected to process millions of emails efficiently.
HR Investigation
# Common workplace terms {[R]=\b(employee|staff|team|office|department)\b} # HR-specific terms {[R]=\b(performance|review|feedback|evaluation)\b} human resources personnel file # Potential issues discriminat* harass* hostile* retaliat* # Stemming intimidate~ threaten~ bully~ # Context searches inappropriate<.>behavior hostile<+3>environment sexual<5>harassment
Tip: For HR investigations, consider creating separate lists for different violation types to improve precision and reduce false positives.