A Look at OpenAI's Privacy Filter — And the V4 and V7 Taxonomies Hidden in the Code

On April 22nd, OpenAI released Privacy Filter under the Apache 2.0 license — a 1.5-billion parameter (50M active) bidirectional token classifier for PII detection, with a 128,000-token context window and a CLI tool called opf.

We build PII Eraser, so more open work in this space is something we’re genuinely happy to see. The release is thoughtful, the model card is quite detailed, and the architectural choices are interesting enough to be worth walking through.

This post covers four things: what’s clever about the architecture, the entity taxonomy (including a discovery in the source code that nobody seems to have picked up on yet), the accuracy story, and what deployment actually looks like today.

Architecture at a Glance

Similar to how many embedding and diffusion models are created today, Privacy Filter starts from a gpt-oss-style autoregressive checkpoint and converts it — post-training — into a BERT-style bidirectional token classifier. The causal mask is relaxed to bidirectional within a banded window (effective attention span of 257 tokens), the language-model head is replaced with a classification head over BIOES-tagged privacy labels, and the model is fine-tuned with a supervised token-classification loss.

At inference time, rather than taking a naive argmax per token, a constrained Viterbi decoder enforces valid BIOES transitions and scores complete label paths. Users can tune decoding parameters at runtime to trade precision for recall — a nice touch that avoids the usual “one fixed operating point” limitation.

The 128K context window allows long inputs in a single forward pass, which is great for high-throughput use cases. The caveat is that the model has only 8 transformer layers and a 257-token attention window. That’s an efficient design, but it also means long-range reasoning across a 128K input is limited in practice — the model can see all of it, but it cannot deeply integrate signals that are far apart.

Another architectural detail worth noting is that the model does not currently support detecting nested entities (for example, identifying a private_person nested inside a private_email).

The Entity Taxonomy — and a Gem Hidden in the Code

The shipped model uses a V2 taxonomy of 8 categories.

Identifier	Description
`private_person`	The name of a private person, including usernames and handles that identify a specific person.
`account_number`	A credit card number, bank account number, or other account identifier.
`private_url`	A web URL or IP address that is meant for a private audience or identifies a private person.
`private_email`	An email address used for personal communication or that identifies a private person.
`private_phone`	A phone number associated with a private person.
`private_address`	A specific location or address associated with a private person.
`secret`	An API key, password, or other credential.
`private_date`	The date of birth, birth year, or other datetime that identifies a private person.

A few observations on V2:

There are no dedicated categories for government IDs, passport numbers, or tax numbers — these are all collapsed into account_number.
There is no coverage for organization-level sensitive data: company names, counterparties, deal values, or project codenames. Privacy Filter is firmly oriented towards personal privacy, not confidential business information.
The model is trained to skip placeholder entities like “John Doe” or example API keys and to skip public information. Distinguishing real from public information is a useful capability — but it’s also a real-world minefield. Is an employee’s name private? What about a lawyer’s name in a contract? Context is often ambiguous, and missing a real identifier because it looked placeholder-shaped is a worse failure mode than the reverse.

V4 and V7: Taxonomies OpenAI Left in the Source

Buried in opf/_common/label_space.py, there are two additional taxonomies defined alongside V2 — named V4 and V7 — that appear to be unreleased checkpoints or internal roadmap artifacts. To our knowledge, nobody has highlighted this yet.

DEFAULT_CATEGORY_VERSION: Final[str] = "v2"
"""Default category taxonomy used when a checkpoint provides no label-space hint."""

SPAN_CLASS_NAMES_BY_CATEGORY_VERSION: Final[dict[str, tuple[str, ...]]] = {
    "v2": (
        BACKGROUND_CLASS_LABEL,
        "account_number",
        "private_address",
        "private_date",
        "private_email",
        "private_person",
        "private_phone",
        "private_url",
        "secret",
    ),
    "v4": (
        BACKGROUND_CLASS_LABEL,
        "private_person",
        "other_person",
        "personal_url",
        "other_url",
        "personal_location",
        "other_location",
        "personal_email",
        "other_email",
        "personal_phone",
        "other_phone",
        "personal_date",
        "other_date",
        "personal_id",
        "secret",
    ),
    "v7": (
        BACKGROUND_CLASS_LABEL,
        "personal_name",
        "personal_handle",
        "other_person",
        "personal_email",
        "other_email",
        "personal_phone",
        "other_phone",
        "personal_location",
        "other_location",
        "personal_url",
        "other_url",
        "personal_org",
        "personal_gov_id",
        "personal_fin_id",
        "personal_health_id",
        "personal_device_id",
        "personal_vehicle_id",
        "personal_property_id",
        "personal_edu_id",
        "personal_emp_id",
        "personal_membership_id",
        "personal_registry_id",
        "personal_date",
        "secret",
        "secret_url",
    ),
}
"""Span-level label taxonomy for each supported category version."""

V4 introduces a systematic personal_ vs other_ distinction across most entity types — an elegant way to formalise the “private vs public” framing the model card alludes to in its limitations section.

V7 is much more granular: 26 classes that split personal identifiers into personal_gov_id, personal_fin_id, personal_health_id, personal_vehicle_id, personal_membership_id, personal_edu_id, personal_emp_id, personal_property_id, personal_registry_id, and more. A secret_url class is also introduced — a reasonable addition given how frequently credentials are embedded in URLs.

It’s worth noting that even V7 remains firmly personal-identifier-oriented. There’s a personal_org category, but no category for organization-level confidential information: company identifiers, deal values, customer names used in a B2B context. The roadmap, as visible in code, does not appear to extend into that space.

We have no idea whether V4 or V7 checkpoints will be released — but seeing them sitting there in the source is a useful signal about where OpenAI sees this work going.

Training Data and Accuracy

Privacy Filter is trained on a mix of public data and internally generated synthetic data. For public datasets with missing ground truth, OpenAI used a GPT-5-family model with a 2×2 annotation protocol (two prompt formats × two reasoning settings).

Synthetic Data Quality Problems

The model card highlights a common problem when creating PII detection models: synthetic-data quality. In the PII-Masking-300k evaluation, OpenAI found enough mislabelled examples that they ran a conservative rule-based cleanup pass and a reasoning-model adjudication step before reporting “corrected” metrics. The corrected token-level F1 on PII-Masking-300k is 0.974 — strong — but it should be noted that this dataset doesn’t contain the nuances that real-world data has, such as what OpenAI describe in their adversarial testing section.

Stress Testing Findings

A couple of findings from the stress-testing section are worth flagging.

Clue Position Matters

In Table 5 of the model card, performance on “PII + Clue” (where the contextual clue follows the entity) is meaningfully worse than on “Clue + PII” (overall recall of 0.705 vs 0.863). This is a common problem with PII Detection models, and is likely exacerbated by autoregressive pretraining — left-to-right priors don’t fully disappear just because you relax the attention mask during post-training.

Real-World Formatting Hurts Performance

Table 11 shows some substantial performance drops on adversarial formats commonly found in real-world data:

Phonetic alphabet (“charlie oscar lima…”): precision drops to 0.273.
Line breaks in URLs and addresses: precision of 0.453, with URLs frequently misclassified as private_person after being split across lines.
Digit-words (“two six eight…”): precision 0.715, recall 0.674.
Spacing variants: precision 0.699.

The practical implication here is important: Privacy Filter in its current form is not a great fit for ASR transcripts and scanned documents, where digit-word spellings and non-standard spacing are routine. It’s also going to struggle on layout-heavy documents where entities get split across lines.

These are not exotic attack vectors — they’re the kind of formatting artefacts you find in real enterprise data.

Deployment: What Actually Ships

What ships today is the model weights and a CLI utility:

$ opf "Ben Morgan lives at 12 3rd St. Call him at 123 456 7890."
<PRIVATE_PERSON> lives at <PRIVATE_ADDRESS>. Call him at <PRIVATE_PHONE>.

There’s no containerized REST API, no HTTP server mode, and no batch support yet (though there is an open GitHub issue tracking the latter).

Crucially, Privacy Filter only accepts raw text strings. There is no native support for structured conversational formats like OpenAI chats. If you want to use this to build a PII guardrail for LLM agents, you will need to process messages individually, which hurts accuracy.

Teams that want to deploy Privacy Filter as a service will need to build the deployment layer themselves — HTTP framing, request validation, chat-format parsing, concurrency, observability, base-image hardening, and dependency reviews. That’s the normal gap between an open-source model release and a deployable service, and it’s fair for OpenAI to leave it there — the model is the artefact they wanted to share for the preview release.

Closing Thoughts

OpenAI has done a great job with the Privacy Filter preview release. It proves that small, fast, bidirectional models are effective for data minimization tasks and that in 2026 there is still a massive need for detecting PII before model training or hitting cloud-based LLMs. If you have an internal ML team ready to fine-tune a model on your specific domain data and build the surrounding API infrastructure, it is an excellent foundation.

The V4 and V7 taxonomies sitting in the source are the most interesting thing about the release from our perspective. V7 in particular — with its granular split of personal identifiers into government, financial, health, vehicle, education, employment, membership, and registry categories — points to where this kind of work is going. Given this is a preview release, there’s a good chance we’ll see them formally released in the future.

However, for teams that need immediate, out-of-the-box deployment—complete with air-gapped REST APIs, robust European localizations, corporate entity detection, and native LLM chat support—purpose-built enterprise solutions remain the most direct path to production.

Either way, it’s a good time to be working on this problem.