Skip to main content

Documentation Index

Fetch the complete documentation index at: https://jdcodec.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

JD Codec’s privacy posture is built around one constraint: PII never leaves the customer’s machine. The codec lives in the cloud. The redaction layer lives on-device. The two are designed so the cloud cannot see PII even by accident.

A concrete example

Imagine your agent is helping a customer sign up for a service. The form asks for name, email, phone, date of birth, and credit card. The agent reads the page. The connector — running on your machine — scans every value before anything leaves the machine. Some values match a known PII pattern and get replaced with a category placeholder; others stay as-is:
On the page:                          What leaves the machine:
  Name:    Alice Tan                    Name:    Alice Tan          ← unchanged (see below)
  Email:   [email protected]            Email:   {{REDACTED_EMAIL}}
  Phone:   (415) 555-0123               Phone:   {{REDACTED_PHONE}}
  DOB:     born 1985                    DOB:     born {{REDACTED_DOB}}
  Card:    4111 1111 1111 1111          Card:    {{REDACTED_CC}}

What the cloud sees:
  page structure (forms, labels, layout) ✓
  unchanged values like "Alice Tan"      ✓
  redacted placeholders                  ✓
  category counts: { EMAIL: 1, PHONE: 1, DOB: 1, CC: 1 }   ✓
  raw redacted values                    ✗ (never)
The agent still gets a usable representation back from the codec — it knows there’s an email field, that it’s been filled with something email-shaped, what to do next. The cloud has compressed the snapshot effectively. The pattern-matched values never crossed the network boundary. About the unchanged fields. The Privacy Shield matches structured patterns — things with predictable shape like emails, phone numbers, credit cards, dates of birth, addresses, API keys, tax IDs, IP addresses. It does not match arbitrary names, free-text comments, or any value without a regex-detectable shape. If your application processes data that doesn’t fit a known pattern but you still need it kept private, talk to support — custom redaction rules are on the roadmap.

What stays local

The connector runs a Privacy Shield over every snapshot before it’s sent. The Shield is a regex-and-rule pack that matches and replaces:
  • Email addresses
  • Credit-card numbers (with Luhn verification + safe-list for known test PANs)
  • Phone numbers (US, E.164, AU national, and permissive variants)
  • Postal addresses
  • API keys for major providers (OpenAI, Anthropic, Google, etc. — to prevent customer credentials leaking through agent screenshots)
  • Tax / government ID numbers (TFN, SSN-shaped)
  • IP addresses
  • Several other categories — see the connector’s pii-ruleset for the full list
Each match becomes a category-level placeholder like {{REDACTED_EMAIL}} or {{REDACTED_CC}} — double curly braces, category name in caps. The placeholder is what the cloud sees. The original value never leaves your machine. URLs are scanned and redacted on the same pass — path segments that look like IDs, tokens, or session keys get replaced before the URL crosses the network boundary.

What reaches the cloud

For each snapshot, the cloud receives:
  • The redacted YAML representation of the page
  • The redacted URL
  • An audit signal (client_redacted: true) proving the Shield ran
  • Category-level counts of what was redacted (e.g. { email: 2, CC_GENERIC: 1 }) — counts only, never values
  • Standard request metadata (your API key id, session id, task id, step number)
If the audit signal is missing or false, the cloud refuses the request with 400 privacy_shield_missing. This is by design. A misconfigured connector — say, a client integrator forgetting to wire the Shield, or a build that accidentally stripped it — gets a hard failure on the very first request rather than silently shipping PII to the cloud. Loud failure on misconfiguration is the entire point: the system is designed so the unsafe path is impossible to take by accident.

What’s persisted

The cloud persists metadata only. Specifically:
  • One UsageEvent per snapshot, retained for 90 days. Includes session/task/step IDs, compression numbers (input/output chars, codec time), redaction category counts, and the redacted URL. No snapshot bodies. No compressed output. No PII.
  • API key metadata (your public api_key_id, key status, configured TTL overrides). Only the public half is stored; the secret half of your key is never persisted — only its hash.
Not persisted:
  • Raw snapshot YAML
  • Compressed output
  • Per-step DOM content
  • In-memory session state (cleared on TTL or process restart)
  • Any PII (it never reached the cloud)

What logs see

Logs include api_key_id, request IDs, session/task/step IDs, compression numbers, and error codes. They do not include snapshot bodies, compressed output, redacted values, or the secret half of any API key. Error responses follow the same rule. Error messages are category-level — they never echo any portion of the submitted snapshot.

Audit trail

This posture is enforced, not just documented:
  • Privacy Shield is mandatory. The connector cannot ship a snapshot without it. The cloud refuses requests that don’t carry the audit signal.
  • A grep-gate runs on every connector commit to prevent codec-internal vocabulary from leaking into customer-visible code.
  • A weekly audit rescans every published artefact (npm, PyPI, public source repo) for the same patterns — catches anything that slips past the local gate.
  • The connector source is public at github.com/jdcodec/connector. You can verify the redaction logic yourself.

What’s not yet covered

  • Server-side secondary check — a future feature where the cloud sniffs for PII patterns as a defence-in-depth layer (in addition to refusing requests without the audit signal). Reserved in the API contract as privacy_shield_violation; not enforced today.
  • Customer-controlled retention — the 90-day window is a global default. Per-customer override is on the roadmap.
  • Custom redaction rules — the Shield’s rule pack is shared across all customers today. Allow-listing custom domains or adding custom redaction rules is on the roadmap.
If your use case has a privacy posture not covered here, reach out at [email protected].