The Lists Management endpoints in the Moderation Service API provide essential tools for creating and managing lists of keywords or regex patterns that are crucial for effective message moderation. These endpoints enable app owners and collaborators to define specific terms, phrases, or patterns that, when detected in user-generated content, trigger moderation actions. The next section provides a detailed elaboration of the capabilities offered.
Default lists are predefined lists of words, patterns and sentences that are readily available for use on your platform. Here are the standard default lists available:
Our default list is a comprehensive compilation of predefined profane words and phrases. This list is designed to enhance message moderation efforts by automatically identifying and flagging inappropriate language.
The Platform Cicurvention list contains a curated set of sentences and words designed to identify attempts to circumvent platform rules and policies. These phrases are used by the AI Platform Circumvention Rule to detect and prevent efforts aimed at bypassing restrictions, ensuring compliance and maintaining platform integrity.
The Default Spam Detection List identifies repetitive or irrelevant messages promoting products, services, or schemes without user consent. It helps filter out unwanted content, including bulk messages, phishing attempts, and fraudulent offers, ensuring a cleaner and more secure communication experience.
The Default Scam Detection List includes messages crafted to deceive users by creating a sense of urgency, promising false rewards, or impersonating trusted entities. These messages often aim to manipulate users into sharing personal information, making payments, or clicking on malicious links. The list helps identify and block scams, protecting users from fraud, phishing attempts, and other deceptive practices.
The Fraud or Scam Indicators list is designed to detect manipulated images used for fraudulent or deceptive activities. It helps flag fake documents, counterfeit products, and misleading visuals that could be used to scam users or spread misinformation.
The Terrorism or Extremist Promotion list identifies imagery that endorses terrorism, violent extremism, or radical ideologies. It helps prevent the spread of extremist propaganda, recruitment materials, and content that incites violence.
The Minor Safety and Exploitation list is used to detect sexualized or exploitative imagery of minors. It helps prevent child abuse, grooming, and the sharing of harmful content, ensuring compliance with child protection policies.
The Privacy or Personal Data list flags images that expose sensitive or private information, such as identification documents, financial details, or personal records. This helps prevent identity theft, unauthorized data leaks, and privacy violations.
The Graphic Violence or Gore list identifies violent or gory imagery, including depictions of severe injuries, crime scenes, or graphic deaths. It helps limit exposure to disturbing content and ensures a safer viewing experience.
The Explicit or Sexual Content list is designed to detect nudity, sexually explicit imagery, or highly suggestive content. It helps enforce platform guidelines by filtering out inappropriate material.
The Hate or Harassment list flags imagery containing hateful symbols, offensive gestures, or harassment. It helps identify and prevent content that promotes discrimination, hate speech, or targeted abuse.
The Hate and Harassment list detects messages that contain hate speech, threats, slurs, or harassment directed at individuals or groups. It helps create a respectful and safe online environment by preventing abusive behavior.
The Explicit or Inappropriate Content list identifies text that includes explicit sexual descriptions, extreme violence, or other unsuitable material. It helps ensure compliance with content policies and maintains platform integrity.
The Impersonation or Fraud list detects deceptive attempts to impersonate individuals, businesses, or organizations. It helps prevent identity theft, scam attempts, and fraudulent activities.
Non-Consensual Sexual Content or Exploitation Prompt
The Non-Consensual Sexual Content or Exploitation list flags messages that depict or encourage non-consensual sexual acts, grooming, or coercion. It helps protect users from exploitation and ensures adherence to safety policies.
The Privacy and Sensitive Info list identifies messages that share personal or sensitive information without consent. It helps protect user privacy by preventing unauthorized data exposure.
The Self-Harm or Suicidal Content list detects messages indicating self-harm, suicidal thoughts, or encouragement of self-injury. It helps enable early intervention and support mental health safety.
The Spam and Scam list identifies spam messages, phishing attempts, and fraudulent schemes. It helps filter out unwanted content, including bulk messages and misleading offers, ensuring a cleaner and more secure communication environment.
The Violent or Terroristic Threats list detects content that promotes violence, terrorism, or extremist actions. It helps prevent harmful speech, glorification of violence, and threats against individuals or groups.