OpenAI Just Released Free Tools to Help Developers Keep Teens Safe on AI. Rewrites Its Rules for How ChatGPT Talks to Minors

Viviana Aristigueta

2 months ago

Teenager using laptop at home with parent checking in from doorway

OpenAI published a package of teen safety measures on Monday that does something the company hasn’t done before: releases the actual tools developers need to build safer products for minors, rather than just asking them to follow policies.

The centerpiece is gpt-oss-safeguard, a pair of open-weight reasoning models — a 120-billion-parameter version and a 20-billion-parameter version — released under the Apache 2.0 license. Any developer can take them, modify them, and deploy them for free. The models are designed to sit between a user’s message and the AI’s response, classifying the content according to a developer-defined policy in real time. Feed it a teen safety policy and it will flag messages involving self-harm, dangerous challenges, graphic content, or romantic roleplay before those responses ever reach a user.

That technical release is paired with an update to OpenAI’s Model Spec — the document that defines how ChatGPT and other OpenAI models are supposed to behave. The new version adds explicit Under-18 Principles, a set of behavioral rules developed with Common Sense Media and everyone.ai, grounded in what the company describes as developmental science. The principles instruct the model to apply extra caution across several categories: self-harm and suicide, sexualized content, dangerous activities and substances, body image, disordered eating, and any request to keep unsafe behavior secret from adults.

The safeguard model works by interpreting a developer-written policy at inference time rather than being hard-coded to a fixed set of rules. That’s a meaningful design choice. It means a developer running a tutoring app can write a more conservative policy than someone building a general-purpose assistant, and the same underlying model adapts to both.

OpenAI is also deploying an age prediction model directly on ChatGPT consumer accounts. When the system lacks confidence about whether an account belongs to a minor, it defaults to the under-18 experience. The company didn’t specify how the prediction works, but acknowledged it would use behavioral signals — which raises its own questions about what data gets analyzed and how that process is disclosed to users.

The teen safety push comes as regulators in the US and Europe are increasingly scrutinizing what AI companies do when minors are in the room. OpenAI has faced sustained criticism over the gap between its stated safety commitments and its actions, and this announcement is at least partly a response to that pressure. Several bills pending in Congress would require age verification or mandatory safety standards for AI products used by children.

The gpt-oss-safeguard models are available now on Hugging Face. OpenAI published a companion technical report detailing the models’ benchmark performance on harm classification tasks. The teen safety prompt templates that work with the models are available as part of a developer toolkit the company is calling the Teen Safety Blueprint.

Whether any of this changes how developers actually build matters more than the release itself. Most apps that teens use don’t disclose which underlying AI models they use, let alone whether they’ve implemented content policies for minors. The pattern at OpenAI has been to publish safety frameworks and tools and leave adoption voluntary — a dynamic that critics say consistently favors deployment speed over enforceable standards.

The Apache 2.0 license on gpt-oss-safeguard means there’s no mechanism for OpenAI to monitor or audit how the models get used once they’re downloaded.