Auris Intelligence — Eric Tetzlaff

The Problem

65,000 documents. One question: what actually happened?

Auris Intelligence was built to support civil litigation defense — specifically, to ingest, organize, extract, and analyze a large and heterogeneous document corpus spanning financial records, internal communications, email, and mobile communications across multiple years of business operations.

The challenge wasn't just volume. It was format heterogeneity, document quality variance, and a legal requirement that every piece of evidence be cryptographically verifiable before it could be used in court. PDFs. Excel spreadsheets. Outlook MSG email files. Word documents. Scanned images of physical records. iOS SMS backups. All of it needed to be ingested, fingerprinted, extracted, and made queryable — without ever compromising the evidentiary integrity of a single file.

A law firm was going to rely on this system's output. The standard was: every citation must be traceable to a specific document, and every document must have a verifiable chain of custody.

The Corpus

110,000+ documents. Every one fingerprinted.

110K+

Total documents

70K+

MSG email files

40K+

Supporting documents

Every document in the Auris corpus carries a SHA-256 hash computed at ingest, stored in a persistent manifest alongside source path, file size, modification timestamp, ingest timestamp, and copy verification status. The ingestion pipeline is idempotent — re-runs skip already-hashed files by SHA match, preventing duplicate processing and maintaining manifest integrity across multiple ingest passes.

The pipeline is format-agnostic across ten-plus file types: PDF, DOCX, XLSX, XLS, CSV, HTML, plain text, Outlook MSG, PNG, JPG — and iOS SMS backup archives via iMazing API integration. Every channel of digital communication relevant to the case is represented and indexed.

The Key Decision

Context-window-aware subagent scaffolding.

The naive approach to analyzing a large document corpus is to feed as much as possible to the AI coordinator and ask it to reason across everything. This approach fails at scale — it's expensive, slow, produces worse answers because relevant signal is diluted by noise, and hits context limits immediately on a corpus of this size.

Auris uses a deliberate subagent scaffold designed around a single principle: all formatting, OCR, extraction, and keyword analysis work happens in worker subagents and never touches the coordinator's context window.

Design Decision — Keyword-First Hydration Strategy

For low-quality scan batches of 5,000+ images, full extraction before relevance assessment is prohibitively expensive. Auris runs a lightweight keyword-first filtering pass after initial enhancement — only documents containing confirmed keyword markers are selected for full extraction. Targeted hydration then expands context only around confirmed keyword zones before committing to a full document extraction pass. All OCR and formatting work runs in designated worker subagents. Only extracts, summaries, and structured tables are returned upward to the coordinator for analysis. Compute cost is proportional to relevance, not to corpus size.

The AI orchestration layer uses Claude Opus via the Anthropic API with domain-encoded system prompts that include forensic accounting disambiguation logic — specifically, instructions that prevent the system from flagging same-date same-reference duplicate journal entries as suspicious, because that's normal double-entry bookkeeping. The system is trained to know the difference between legitimate accounting practice and financial anomaly. That distinction is not in any general-purpose LLM prompt. It came from domain expertise encoded into the system architecture.

Python Node.js Anthropic Claude Opus SHA-256 Chain of Custody Tesseract OCR pdf2image iMazing iOS API extract_msg pdfplumber openpyxl

The Outcome

Attorney-ready output. Novel forensic hypotheses. Active production use.

Auris generates professional legal memoranda in attorney-ready DOCX format, with branded design, subpoena target tables, SHA-cited document references, and embedded chain-of-custody attestations. Every citation in every memo is traceable to a specific document in the manifest by SHA-256 hash.

Beyond retrieval and citation, the system has surfaced novel forensic hypotheses through document pattern analysis — lines of inquiry that typically require weeks of forensic accountant review. The ability to analyze behavioral patterns across 110,000 documents simultaneously, with domain-encoded analytical logic, produces insights that human review at this corpus scale simply cannot match in timeframe.

Auris is currently in active production use by Wickens, Herzer, Panza LLP in civil litigation defense. The platform is being developed for broader legal market deployment as a standalone forensic intelligence product.

NOTE: Case-specific details, party names, and litigation specifics are not disclosed here. Architecture, pipeline design, and output capabilities are described in general terms. Wickens, Herzer, Panza LLP has been informed of the platform's intended broader deployment.