Skip to main content
JD Codec’s privacy posture is built around one constraint: PII never leaves the customer’s machine by default. The codec lives in the cloud. The redaction layer lives on-device. The two are designed so the cloud cannot see PII even by accident. There is one deliberate, opt-in exception — turning the Shield off — covered at the end of this page.

A concrete example

Imagine your agent is helping a customer sign up for a service. The form asks for name, email, phone, date of birth, and credit card. The agent reads the page. The connector — running on your machine — scans every value before anything leaves the machine. Some values match a known PII pattern and get replaced with a category placeholder; others stay as-is:
On the page:                          What leaves the machine:
  Name:    Alice Tan                    Name:    Alice Tan          ← unchanged (see below)
  Email:   [email protected]            Email:   {{REDACTED_EMAIL}}
  Phone:   (415) 555-0123               Phone:   {{REDACTED_PHONE}}
  DOB:     born 1985                    DOB:     born {{REDACTED_DOB}}
  Card:    4111 1111 1111 1111          Card:    {{REDACTED_CC}}

What the cloud sees:
  page structure (forms, labels, layout) ✓
  unchanged values like "Alice Tan"      ✓
  redacted placeholders                  ✓
  category counts: { EMAIL: 1, PHONE: 1, DOB: 1, CC: 1 }   ✓
  raw redacted values                    ✗ (never)
The agent still gets a usable representation back from the codec — it knows there’s an email field, that it’s been filled with something email-shaped, what to do next. The cloud has compressed the snapshot effectively. The pattern-matched values never crossed the network boundary. About the unchanged fields. The Privacy Shield matches structured patterns — things with predictable shape like emails, phone numbers, credit cards, dates of birth, addresses, API keys, tax IDs, IP addresses. It does not match arbitrary names, free-text comments, or any value without a regex-detectable shape. If your application processes data that doesn’t fit a known pattern but you still need it kept private, talk to support — custom redaction rules are on the roadmap.

What stays local

The connector runs a Privacy Shield over every snapshot before it’s sent. The Shield is a regex-and-rule pack that matches and replaces:
  • Email addresses
  • Credit-card numbers (with Luhn verification + safe-list for known test PANs)
  • Phone numbers (US, E.164, AU national, and permissive variants)
  • Postal addresses
  • API keys for major providers (OpenAI, Anthropic, Google, etc. — to prevent customer credentials leaking through agent screenshots)
  • Tax / government ID numbers (TFN, SSN-shaped)
  • IP addresses
  • Several other categories — see the connector’s pii-ruleset for the full list
Each match becomes a category-level placeholder like {{REDACTED_EMAIL}} or {{REDACTED_CC}} — double curly braces, category name in caps. The placeholder is what the cloud sees. The original value never leaves your machine. URLs are scanned and redacted on the same pass — path segments that look like IDs, tokens, or session keys get replaced before the URL crosses the network boundary.

What reaches the cloud

For each snapshot, the cloud receives:
  • The redacted YAML representation of the page
  • The redacted URL
  • An audit signal (client_redacted: true) proving the Shield ran
  • Category-level counts of what was redacted (e.g. { email: 2, CC_GENERIC: 1 }) — counts only, never values
  • Standard request metadata (your API key id, session id, task id, step number)
If the audit signal is missing or false, the cloud refuses the request with 400 privacy_shield_missing. This is by design. A misconfigured connector — say, a client integrator forgetting to wire the Shield, or a build that accidentally stripped it — gets a hard failure on the very first request rather than silently shipping PII to the cloud. Loud failure on misconfiguration is the entire point: the system is designed so the unsafe path is impossible to take by accident.

What’s persisted

The cloud persists metadata only. Specifically:
  • One UsageEvent per snapshot, retained for 90 days. Includes session/task/step IDs, compression numbers (input/output chars, codec time), redaction category counts, and the redacted URL. No snapshot bodies. No compressed output. No PII.
  • API key metadata (your public api_key_id, key status, configured TTL overrides). Only the public half is stored; the secret half of your key is never persisted — only its hash.
Not persisted:
  • Raw snapshot YAML
  • Compressed output
  • Per-step DOM content
  • In-memory session state (cleared on TTL or process restart)
  • Any PII (it never reached the cloud)

What logs see

Logs include api_key_id, request IDs, session/task/step IDs, compression numbers, and error codes. They do not include snapshot bodies, compressed output, redacted values, or the secret half of any API key. Error responses follow the same rule. Error messages are category-level — they never echo any portion of the submitted snapshot.

Audit trail

This posture is enforced, not just documented:
  • Privacy Shield is on by default and the connector cannot bypass it on its own. The cloud refuses requests that don’t carry the audit signal. The one exception, turning the Shield off, requires both a key we provisioned for it and a deliberate two-step opt-in in your connector.
  • A grep-gate runs on every connector commit to prevent codec-internal vocabulary from leaking into customer-visible code.
  • A weekly audit rescans every published artefact (npm, PyPI, public source repo) for the same patterns — catches anything that slips past the local gate.
  • The connector source is public at github.com/jdcodec/connector. You can verify the redaction logic yourself.

What’s not yet covered

  • Server-side secondary check — a future feature where the cloud sniffs for PII patterns as a defence-in-depth layer (in addition to refusing requests without the audit signal). Reserved in the API contract as privacy_shield_violation; not enforced today.
  • Customer-controlled retention — the 90-day window is a global default. Per-customer override is on the roadmap.
  • Custom redaction rules — the Shield’s rule pack is shared across all customers today. Allow-listing custom domains or adding custom redaction rules is on the roadmap.

Turning the Shield off

By default the Shield runs on every snapshot, and that is the right setting for almost everyone. Some customers, though, value JD Codec purely for compression and have their own data boundary: they run their own model endpoint, hold consent for the data they process, or are building a dataset where the real values are the point. For those cases the Shield can be turned off, so snapshots are sent unredacted. This is opt-in and deliberately hard to switch on by accident. Two independent things must both be true:
  1. Your key must be provisioned for it. Bypass is enabled per key, on our side. There is no self-service toggle — contact support to enable it for your key.
  2. Your connector must opt in, in two steps. Set both:
    JDC_PRIVACY_SHIELD=off
    JDC_PRIVACY_SHIELD_BYPASS_ACK=1
    
    (or the ~/.jdcodec/config.json equivalents: "privacy_shield": "off", "privacy_shield_bypass_ack": true). The off-switch alone does nothing — without the acknowledgement the connector keeps the Shield on and logs a warning that both are required.
When bypass is active you’ll see a warning line in your connector logs on every snapshot. That’s intentional — an unredacted send should never be silent. What you’re accepting, stated plainly: unredacted page content and URLs reach the cloud, and from there your downstream LLM. The Shield does not run — it’s all-or-nothing, not per-field. What does not change: we still never persist your snapshot bodies, the default stays fully shielded for every other key, and every bypassed request is audit-tagged on our side. See Turning the Privacy Shield off for the full walkthrough, including how to turn it back off and what we do and don’t retain under bypass. If your use case has a privacy posture not covered here, reach out at [email protected].