In digital forensics and eDiscovery, duplicate emails can inflate costs by 30–60%. When investigations span multiple platforms—RelativityOne, Nuix, EnCase, X-Ways—traditional tools force you to reprocess everything through a single vendor, wasting time and money.
The Electronic Discovery Reference Model (EDRM) solved this with the DupeID project, creating a universal standard for email identification. Aid4Mail was one of the first forensic tools to implement this standard—and went further with MIH+, an enhancement that guarantees deduplication coverage for 100% of emails, not just 80–90%.
Why This Matters
This guide explains the EDRM MIH standard, Aid4Mail’s MIH+ implementation, and how to use these tools to transform your email investigations.
Understanding EDRM and the DupeID Project
What is the EDRM?
The Electronic Discovery Reference Model (EDRM) is a globally recognized framework that defines best practices for handling electronic data in legal proceedings. Developed by industry leaders, it guides organizations through eight distinct phases:
Identification
Locating potential sources of ESI
Preservation
Ensuring data integrity
Collection
Gathering relevant data
Processing
Preparing data for review
Review
Evaluating data for relevance
Analysis
Deeper examination of evidence
Production
Delivering evidence in required formats
Presentation
Displaying evidence in court
The DupeID Project: Solving Deduplication
In February 2023, EDRM launched the Duplicate Identification (DupeID) Project, led by Beth Patterson. Its mission: create a standardized, cross-platform method for generating unique identifiers for email messages.
Goals of DupeID:
- Enable consistent email identification across different systems
- Support deduplication without vendor lock-in
- Facilitate cross-referencing and comparison of datasets
- Reduce costs and timelines in investigations
- Create an open standard complementing proprietary methods
The outcome was the EDRM Message Identification Hash (MIH)—a simple yet elegant solution to a decades-old problem.
The Cross-Platform Deduplication Challenge
The Problem: Vendor Lock-In and Data Redundancy
In multi-custodian investigations involving multiple email platforms, duplicate emails dramatically inflate data volumes. A typical case might involve:
- Multiple email platforms (Gmail, Microsoft 365, Yahoo, IMAP servers)
- Various mailbox formats (PST, OST, mbox, EML, MSG)
- Different custodian accounts across organizations
- Historical archives from legacy systems
The result? Massive redundancy. The same email thread might appear dozens of times across different custodians, formats, and systems.
The Historical Solution: Costly Reprocessing
Historically, specialized tools offered deduplication—but only within their own proprietary ecosystems. Each vendor used unique algorithms that couldn’t communicate across platforms.
The traditional approach required:
- Collect data from all sources
- Ingest everything into a single platform
- Deduplicate within that platform’s ecosystem
- Accept vendor lock-in or face costly data migration
The Cost of the Status Quo
If you needed to compare datasets from different vendors—Nuix, RelativityOne, and X-Ways—the only option was to reprocess all data through a single platform.
- Weeks of additional processing time
- Exponentially higher infrastructure costs
- Massive storage requirements
- Vendor dependency for entire case lifecycle
The Real-World Impact
Consider a typical multi-custodian investigation:
Cost Implications:
• Review at $1–$3 per email = $500,000–$1,500,000 without deduplication
• Review with deduplication = $200,000–$1,050,000
💰 Savings: $300,000–$450,000 on a single case
The lack of a standardized, cross-platform method for email identification was costing the industry billions of dollars annually.
The EDRM Message Identification Hash (MIH)
What is the EDRM MIH?
The EDRM MIH is an MD5 hash value generated from the Message-ID field in an email’s SMTP header:
EDRM MIH = MD5(Message-ID)
Why the Message-ID Field?
The Message-ID is a unique identifier assigned to emails by mail servers when messages are sent. Defined in RFC 822 (and later RFC 2822 and RFC 5322), it’s designed to be globally unique:
Message-ID: <20250315123045.abc123@mail.example.com>
Key characteristics:
- Globally unique across all email systems
- Assigned at message creation time
- Preserved when emails are forwarded, replied to, or migrated
- Present in the vast majority of received emails
By hashing this field with MD5, the EDRM MIH creates a 128-bit fingerprint that uniquely identifies an email across any platform or vendor.
The MIH’s Limitation: Null Values
While elegant, the EDRM MIH specification has one significant limitation:
Critical Limitation
Emails without a Message-ID field produce a null value.
This affects:
- Draft emails (not yet sent, no Message-ID assigned)
- Outgoing messages (some systems don’t preserve Message-ID)
- Corrupted or incomplete email headers
- Certain proprietary email formats
📊 Impact: In a typical collection, 10–20% of messages may lack a Message-ID, making them unidentifiable using standard EDRM MIH alone.
This creates a significant gap in cross-platform deduplication capabilities—a gap that Aid4Mail’s MIH+ was designed to close.
Aid4Mail’s MIH+ Implementation
Introducing MIH+: Guaranteed Non-Null Hash Values
To address the limitations of EDRM MIH while maintaining full compatibility with the standard, Aid4Mail developed MIH+—an enhanced variant that guarantees a non-null value for every email.
MIH+ Algorithm
1. For emails WITH a Message-ID field:
MIH+ = EDRM MIH (identical)
MIH+ = MD5(Message-ID)
2. For emails WITHOUT a Message-ID:
MIH+ uses alternative metadata:
MIH+ = MD5(sender + date + subject)
MIH+ = MD5(entire SMTP header)
How Aid4Mail Generates MIH+ for Messages Without Message-ID
When the Message-ID field is missing, Aid4Mail constructs a hash source using available metadata in a specific order of precedence:
Sender Field
- From field
- Sender field
- Reply-To field
Date Field
- Date field
- Most recent Received field (topmost)
Subject Field
Subject field (no fallback)
Example: For a draft email without a Message-ID:
From: alice@company.com
Date: 2025-03-15 14:30:00
Subject: Q1 Budget Review
MIH+ = MD5("alice@company.com" + "2025-03-15 14:30:00" + "Q1 Budget Review")
This approach ensures:
- Every email gets a unique identifier
- Cross-platform comparison is always possible
- Deduplication can occur for all message types
MIH+ vs. Standard MIH: Compatibility
Critical Compatibility Point
- For emails with a Message-ID, MIH+ produces identical values to EDRM MIH
- This ensures full interoperability with other vendors supporting EDRM MIH
- Only emails without a Message-ID produce different values—and in those cases, standard MIH would have returned null anyway
✅ Result: Aid4Mail’s MIH+ extends the EDRM standard without breaking compatibility, enabling true cross-platform deduplication for 100% of emails rather than just 80–90%.
Performance-Optimized Architecture
While MIH+ provides critical compatibility and cross-platform identification capabilities, Aid4Mail uses a performance-optimized approach for actual deduplication operations during processing.
Why not use MIH+ directly for deduplication?
MD5 hash generation and comparison, while reliable, is computationally expensive when processing hundreds of thousands or millions of emails. Aid4Mail implements a two-tier hashing system:
High-Speed Int64 Hash
Purpose: Internal deduplication
- Lightning-fast comparisons
- 64-bit integer format
- Optimal memory usage
- Handles massive datasets
MIH+ MD5 Hash
Purpose: Cross-platform compatibility
- Export & metadata
- Search & file naming
- EDRM compliance
- Vendor interoperability
Benefits of this architecture:
- 10× faster deduplication during processing
- Minimal memory footprint for large collections
- Full EDRM MIH+ compatibility for cross-platform workflows
- Best of both worlds: performance AND interoperability
Using MIH+ in Aid4Mail
Aid4Mail provides comprehensive MIH+ support across multiple features, enabling forensic examiners and eDiscovery professionals to leverage EDRM standards throughout their workflows.
1. Search and Filtering on MIH+ Values
Available in: Aid4Mail Investigator and Enterprise editions
The MIH_Plus
search operator enables precise filtering based on MIH+ hash values. This is particularly powerful when working with datasets from multiple vendors.
Basic syntax:
MIH_Plus:7b7e8488d0b11ff6dd30064fa5ff79c1
Advanced syntax with search lists:
MIH_Plus:{exact=C:\Cases\Case-001\MIH-List.txt}
Use cases:
- Deduplication across vendors: Import MIH+ values from RelativityOne, Nuix, or other platforms and exclude matching emails from Aid4Mail processing
- Targeted collection: Use MIH+ lists to identify and collect specific emails across multiple custodians
- Cross-reference verification: Confirm that specific messages exist in both your dataset and a vendor’s production
- Privileged document tracking: Maintain MIH+ lists of privileged communications and automatically filter them across all custodians
Example Workflow:
- Export MIH+ values from Relativity for already-reviewed emails
- Save as
MIH-Reviewed.txt
- In Aid4Mail, use:
NOT MIH_Plus:{exact=C:\MIH-Reviewed.txt}
- Process only emails not yet reviewed in Relativity
Result: Eliminates redundant processing and review, saving significant time and cost.
2. File Naming with MIH+ Signatures
When exporting emails to .eml
, .msg
, or .txt
formats, Aid4Mail can use MIH+ values as file names.
Advantages:
- Consistent naming: Same email always has the same filename across all exports
- Cross-platform identification: Files can be matched across different processing runs
- No length limitations: Unlike subject-based naming, MIH+ produces fixed-length 32-character names
- Special character handling: No illegal characters or truncation issues
- Deduplication at OS level: Operating system tools can identify duplicates by filename
How to enable:
In Aid4Mail session settings:
Target > File name > Use MIH+ signature
Resulting file names:
7b7e8488d0b11ff6dd30064fa5ff79c1.eml
3d4f2a1e9c8b7a6d5e4f3a2b1c0d9e8f.msg
a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6.txt
3. Template Token: {MIH_Plus}
The {MIH_Plus}
template token inserts MIH+ values into custom file and folder names, email headers, and metadata exports.
Example uses:
Custom file naming:
{CustodianName}_{MIH_Plus}.eml
Result: Alice-Smith_7b7e8488d0b11ff6dd30064fa5ff79c1.eml
Metadata extraction to CSV:
Column Configuration: Subject, From, Date, MIH_Plus
Adding X-EDRM-MIH header to emails:
Email Header Configuration: Include X-EDRM-MIH field
4. Metadata Extraction
Aid4Mail’s Column Configuration Editor includes the EDRM.MIH_Plus
token for extracting MIH+ values to CSV, XML, JSON, and TSV files.
Typical metadata export:
Subject,From,Date,MIH_Plus,Folder
Q1 Budget Review,alice@company.com,2025-03-15,7b7e8488...,Inbox
Re: Q1 Budget Review,bob@company.com,2025-03-16,3d4f2a1e...,Sent Items
This enables:
- Cross-platform deduplication in Excel or databases
- Custom analysis scripts using MIH+ as the join key
- Import into other tools for further processing
5. Exporting MIH+ Lists
Aid4Mail can generate plain-text lists of MIH+ values—ideal for sharing with other platforms or vendors.
How to export:
- Select Plain Text as target format
- Enable Export to a single text file
- Under Email Header Configuration, choose Only EDRM MIH values
- Apply desired filters to define scope
- Process
Result: A text file with one MIH+ value per line
7b7e8488d0b11ff6dd30064fa5ff79c1
3d4f2a1e9c8b7a6d5e4f3a2b1c0d9e8f
a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6
Use cases:
- Share with opposing counsel for agreed deduplication
- Import into Relativity, Nuix, or other platforms
- Create privilege logs or production indexes
- Document chain of custody
Real-World Benefits and ROI
1. Cost Savings
Reduced Review Costs
- 30–60% fewer emails to review
- $1–$3 per email cost eliminated for duplicates
- Faster case resolution
Example Case:
500,000 emails with 40% duplication
Savings: $400,000
Reduced Hosting Costs
- 30–60% smaller datasets
- Lower cloud platform fees
- Reduced infrastructure needs
Hosting Savings:
$27,000–$45,000 per TB annually
1 TB → 100 GB = $24–$41K saved
Eliminated Reprocessing Costs
- No need to reprocess datasets from multiple vendors
- Avoid costly platform lock-in
- Maintain flexibility across tools and systems
2. Time Efficiency
- Faster Investigations: Immediate deduplication across all datasets without waiting for single-platform reprocessing
- Accelerated Review: Smaller, deduplicated datasets enable focus on unique, responsive content
- Streamlined Workflows: Consistent identifiers across all tools simplify communication with co-counsel
3. Improved Accuracy and Defensibility
Complete Coverage
- MIH+ handles 100% of emails
- No gaps in deduplication
- Comprehensive chain of custody
EDRM Compliance
- Industry-standard identification
- Recognized by courts
- Defensible in proceedings
Cross-Platform Integrity
- Same hash values across systems
- Verifiable methodology
- Transparent and auditable
4. Flexibility and Vendor Independence
- Multi-Platform Workflows: Use best-in-class tools for each task without forced platform lock-in
- Collaborative Investigations: Share datasets with confidence and coordinate with multiple law firms
- Future-Proofing: EDRM standard ensures long-term compatibility and protects investment in processed data
Real-World Use Cases
Multi-Vendor Litigation
Scenario:
A law firm handling complex litigation involving:
- 20 custodians across three companies
- Email data already processed in RelativityOne for Company A
- Company B’s data in Nuix Workstation
- Company C using internal tools
Challenge:
How to deduplicate across all three datasets without reprocessing?
Solution with Aid4Mail MIH+:
- Export MIH+ lists from RelativityOne (Company A)
- Export MIH+ lists from Nuix (Company B)
- Process Company C data with Aid4Mail, excluding matches
- Ingest only unique Company C emails into review platform
Result:
- ✅ 60% reduction in Company C review volume
- ✅ $300,000 saved in hosting and review costs
- ✅ Two weeks faster case resolution
Government Investigation with Multiple Agencies
Scenario:
A regulatory investigation involving FBI (EnCase), SEC (custom tools), and DOJ (Aid4Mail)
Challenge:
Multiple agencies need to coordinate without duplicating review efforts
Solution with Aid4Mail MIH+:
- FBI exports MIH+ list of already-reviewed emails
- SEC exports MIH+ list from their system
- DOJ uses Aid4Mail to process new custodian data, excluding reviewed items
- All agencies maintain separate MIH+ lists for coordination
Result:
- ✅ No duplicate review across agencies
- ✅ Faster investigation timeline
- ✅ Clear audit trail for all parties
- ✅ Defensible methodology for court proceedings
M&A Due Diligence
Scenario:
Company acquiring a competitor needs to review 10 years of email archives from multiple legacy systems (Exchange, Gmail, PST files)
Challenge:
Avoid re-reviewing emails already cleared by target company’s counsel
Solution with Aid4Mail MIH+:
- Receive MIH+ list of cleared emails from target company
- Collect all email sources with Aid4Mail
- Filter using MIH+ search to exclude cleared messages
- Focus review on new, uncleared content
Result:
- ✅ 70% reduction in review volume
- ✅ $500,000 saved in due diligence costs
- ✅ Three-week faster deal closure
Getting Started with MIH+ in Aid4Mail
Edition Requirements
MIH+ functionality is available in all Aid4Mail editions:
Aid4Mail Converter
299/year
Basic MIH+ support
- File naming
- Metadata extraction
- Template tokens
Aid4Mail Investigator
999/year
Full MIH+ features
- All Converter features
- Search operators
- Search lists
Aid4Mail Enterprise
4999/year
Unlimited scale
- All Investigator features
- CLI automation
- Batch processing
Basic Configuration
Aid4Mail’s default configuration already generates MIH+ values optimally. No special setup is required.
Default Configuration:
App Settings > Sessions > File naming & duplicate detection
Generate hash value from: Message-ID header (EDRM MIH)
Using MIH+ for Cross-Platform Deduplication
Scenario: You want to exclude emails already reviewed in RelativityOne.
- 1
Export MIH+ values from Relativity
(or request from vendor)
- 2
Save as a text file
One MIH+ value per line:
C:\Cases\Case-001\Relativity-MIH.txt
- 3
In Aid4Mail, add to your filter script
Session Settings > Filters > Item filtering > Search query:
NOT MIH_Plus:{exact=C:\Cases\Case-001\Relativity-MIH.txt}
- 4
Process normally
Aid4Mail will exclude all matching emails
Exporting MIH+ Lists
To create a MIH+ list from your Aid4Mail collection:
- Select Plain Text as target format
- Enable Export to a single text file
- Under Email header configuration, choose Only EDRM MIH values
- Apply desired filters (e.g.,
Class:responsive
if using AI classification) - Process
Result: A text file with one MIH+ value per line, ready to share with other platforms.
Frequently Asked Questions
What’s the difference between MIH and MIH+?
EDRM MIH generates hash values from the Message-ID field. If an email lacks a Message-ID (drafts, outgoing messages), it returns a null value.
MIH+ is Aid4Mail’s enhancement that guarantees a non-null hash for every email by using alternative metadata (sender + date + subject) when Message-ID is missing. For emails with Message-ID, MIH+ produces identical values to EDRM MIH, ensuring full compatibility.
Can I use Aid4Mail’s MIH+ with other eDiscovery platforms?
Yes, absolutely. Aid4Mail’s MIH+ is fully compatible with the EDRM MIH standard. You can:
- Export MIH+ lists and import them into Relativity, Nuix, or other platforms
- Import MIH lists from other vendors and use them in Aid4Mail searches
- Share MIH+ values with opposing counsel or co-counsel
- Use MIH+ for cross-platform deduplication without data loss
Does using MIH+ slow down processing?
No. Aid4Mail uses a dual-hash architecture:
- During processing: Aid4Mail uses a high-speed Int64 hash for lightning-fast deduplication (10× faster than MD5)
- For export/compatibility: MIH+ MD5 hashes are generated on-demand for cross-platform use
This ensures you get the best of both worlds: maximum speed during processing and full EDRM compatibility for cross-platform workflows.
Which Aid4Mail edition do I need for MIH+ search operators?
Aid4Mail Investigator or Enterprise.
MIH+ search operators (like MIH_Plus:{exact=file.txt}
) require advanced filtering capabilities available in Investigator (999/year) or Enterprise (4999/year) editions.
Aid4Mail Converter (299/year) supports MIH+ for file naming, metadata extraction, and template tokens, but not search operators.
How much can I save using MIH+ deduplication?
Typical savings range from $300,000 to $500,000 per case.
This comes from:
- 30–60% reduction in review volume = lower attorney fees
- 80–90% hosting cost savings = $24,000–$41,000 per TB annually
- Eliminated reprocessing costs across vendors
- Faster case resolution = weeks saved
Example: A 500,000-email case with 40% duplication and $2/email review cost saves approximately $400,000 using MIH+ deduplication.
Is EDRM MIH accepted in court?
Yes. The EDRM is a globally recognized framework developed by industry leaders and accepted by courts worldwide. The MIH standard:
- Follows industry best practices
- Provides transparent, auditable methodology
- Maintains defensible chain of custody
- Complies with FRCP, GDPR, and other regulations
Aid4Mail’s MIH+ implementation extends the standard while maintaining full compatibility, ensuring defensibility in legal proceedings.
Can I use MIH+ with emails that don’t have Message-IDs?
Yes—that’s exactly what MIH+ was designed for.
Standard EDRM MIH returns null values for emails without Message-IDs (drafts, outgoing messages, corrupted headers). MIH+ solves this by:
- Using alternative metadata (sender + date + subject) to generate hash values
- Ensuring 100% of emails have identifiable hash values
- Maintaining compatibility with EDRM MIH for emails that do have Message-IDs
This means you get complete deduplication coverage instead of the 80–90% coverage of standard MIH.