Skip to content
Back to work
Fail-closed PII ingestion gateWIP

CAIO Vault Gate

A Layer2 security gate that independently re-scans PII regardless of source declarations — hardened by 3 adversarial red-team rounds (unicode, homoglyph, spread evasion) and never logging PII values.

Internal operations tool — no public live URL. Below is the architecture diagram.

Patientdata ingest1OS sandbox2PreToolUse hook3n8n ingest gateProtectedknowledge baseExternal egress blockedThree-layer PII defense — quarantine on uncertainty

Overview

In Korea's medical and beauty vertical, patient data flows in from many sources, each carrying its own claim about what it contains. The dangerous assumption is to trust those declarations: a feed that says "no PII here" is exactly how a resident registration number or card digit slips into a system that was never cleared to hold it. CAIO Vault Gate refuses that trust model entirely.

It sits as a Layer2 checkpoint that treats every upstream contract as a hint, not a guarantee, and independently re-scans each payload before anything is admitted. The design is fail-closed: when classification is uncertain, the content is quarantined rather than waved through. Routing runs as discrete stages — contract validation, PII scan, sensitivity tiering, then deduplication — so each concern is testable in isolation rather than tangled in one pass.

What makes it engineering-notable is the adversarial posture. Detection was hardened against attackers who actively disguise identifiers — Unicode tricks, homoglyph substitution, and values spread across fields to dodge naive matching — using NFKC normalization and Luhn checks as defensive primitives. Just as deliberate: the gate logs type names and counts only, never the sensitive values themselves, so the safety layer can't become the leak.

Highlights

  • Fail-closed design — quarantine on uncertainty, ignore source declarations and re-scan independently
  • 3 adversarial red-team rounds (16+ payloads/round: unicode, homoglyph, spread)
  • Zero PII leakage — logs type names and counts only, never values
  • Contract-based 4-stage routing (contract validation → PII scan → sensitivity tier → dedup)

Metrics

126
tests
3
red-team rounds
6
PII types
0
false positives

Tech stack

Python 3.13pytestPyYAMLNFKC normalizationLuhn