JD Codec’s privacy posture is built around one constraint: PII never leaves the customer’s machine. The codec lives in the cloud. The redaction layer lives on-device. The two are designed so the cloud cannot see PII even by accident.Documentation Index
Fetch the complete documentation index at: https://jdcodec.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
A concrete example
Imagine your agent is helping a customer sign up for a service. The form asks for name, email, phone, date of birth, and credit card. The agent reads the page. The connector — running on your machine — scans every value before anything leaves the machine. Some values match a known PII pattern and get replaced with a category placeholder; others stay as-is:What stays local
The connector runs a Privacy Shield over every snapshot before it’s sent. The Shield is a regex-and-rule pack that matches and replaces:- Email addresses
- Credit-card numbers (with Luhn verification + safe-list for known test PANs)
- Phone numbers (US, E.164, AU national, and permissive variants)
- Postal addresses
- API keys for major providers (OpenAI, Anthropic, Google, etc. — to prevent customer credentials leaking through agent screenshots)
- Tax / government ID numbers (TFN, SSN-shaped)
- IP addresses
- Several other categories — see the connector’s pii-ruleset for the full list
{{REDACTED_EMAIL}} or {{REDACTED_CC}} — double curly braces, category name in caps. The placeholder is what the cloud sees. The original value never leaves your machine.
URLs are scanned and redacted on the same pass — path segments that look like IDs, tokens, or session keys get replaced before the URL crosses the network boundary.
What reaches the cloud
For each snapshot, the cloud receives:- The redacted YAML representation of the page
- The redacted URL
- An audit signal (
client_redacted: true) proving the Shield ran - Category-level counts of what was redacted (e.g.
{ email: 2, CC_GENERIC: 1 }) — counts only, never values - Standard request metadata (your API key id, session id, task id, step number)
400 privacy_shield_missing. This is by design. A misconfigured connector — say, a client integrator forgetting to wire the Shield, or a build that accidentally stripped it — gets a hard failure on the very first request rather than silently shipping PII to the cloud. Loud failure on misconfiguration is the entire point: the system is designed so the unsafe path is impossible to take by accident.
What’s persisted
The cloud persists metadata only. Specifically:- One
UsageEventper snapshot, retained for 90 days. Includes session/task/step IDs, compression numbers (input/output chars, codec time), redaction category counts, and the redacted URL. No snapshot bodies. No compressed output. No PII. - API key metadata (your public
api_key_id, key status, configured TTL overrides). Only the public half is stored; the secret half of your key is never persisted — only its hash.
- Raw snapshot YAML
- Compressed output
- Per-step DOM content
- In-memory session state (cleared on TTL or process restart)
- Any PII (it never reached the cloud)
What logs see
Logs includeapi_key_id, request IDs, session/task/step IDs, compression numbers, and error codes. They do not include snapshot bodies, compressed output, redacted values, or the secret half of any API key.
Error responses follow the same rule. Error messages are category-level — they never echo any portion of the submitted snapshot.
Audit trail
This posture is enforced, not just documented:- Privacy Shield is mandatory. The connector cannot ship a snapshot without it. The cloud refuses requests that don’t carry the audit signal.
- A grep-gate runs on every connector commit to prevent codec-internal vocabulary from leaking into customer-visible code.
- A weekly audit rescans every published artefact (npm, PyPI, public source repo) for the same patterns — catches anything that slips past the local gate.
- The connector source is public at github.com/jdcodec/connector. You can verify the redaction logic yourself.
What’s not yet covered
- Server-side secondary check — a future feature where the cloud sniffs for PII patterns as a defence-in-depth layer (in addition to refusing requests without the audit signal). Reserved in the API contract as
privacy_shield_violation; not enforced today. - Customer-controlled retention — the 90-day window is a global default. Per-customer override is on the roadmap.
- Custom redaction rules — the Shield’s rule pack is shared across all customers today. Allow-listing custom domains or adding custom redaction rules is on the roadmap.