Use Case

AI Applications

Filter LLM inputs and outputs. Protect users from harmful model responses and your model from malicious prompts.

The fastest-growing moderation need

Every LLM-powered application needs content moderation on two fronts:

  • Input filtering: Block harmful, illegal, or manipulative user prompts before they reach the LLM.

  • Output filtering: Catch dangerous or inappropriate model responses before users see them.

Using a single provider for both creates a single point of failure and blind spots. OpenModeration lets you layer multiple providers for defense in depth.

Input → Moderated → LLM

text
User prompt -> {"input": prompt} -> POST /v1/moderations ->
-> if not flagged -> forward to GPT-4, Claude...
-> if flagged -> return "Content policy violation"

LLM Output → Moderated → User

text
LLM response -> {"input": response} -> POST /v1/moderations ->
-> if not flagged -> display to user
-> if flagged -> show fallback response

Defense in depth for AI safety

Prompt injection

Detect jailbreak attempts and prompt manipulation.

Harmful content

Block hate, violence, sexual content in inputs and outputs.

PII leakage

Prevent models from generating or exposing personally identifiable information.

Policy compliance

Enforce custom policies with LLM-as-classifier. Fully configurable categories.

Ready to simplify your moderation stack?

Deploy in minutes with Docker or start a free trial. One API for every moderation provider, with no vendor lock-in.