Censor bad words using Python

Guide to implement profanity text fIlter using python.

Censor bad words using Python

Building social media website or user generated text content or subject to pass profanity filter.

Profanity is a socially offensive use of language,[1] which may also be called cursing, cussing, swearing, or expletives. Accordingly, profanity is language use that is sometimes deemed impolite, rude, indecent, or culturally offensive. - Wikipedia

Using better-profanity package

Its a python package which is used to censor bad words and custom listed words from the text.

it is inspired from profanity package which is maintained by Ben Friedland, this package is much faster than on the original.

Install better-profanity package

pip install better-profanity

How does this works?

better-profanity package ships with predefined set of bad words by default. it used string comparison to match the given text with predefined words.

We can load custom set of wordlist using load_censor_words() function.

Default wordlist

better_profanity/profanity_wordlist.txt at master · snguyenthanh/better_profanity
Blazingly fast cleaning swear words (and their leetspeak) in strings - better_profanity/profanity_wordlist.txt at master · snguyenthanh/better_profanity

Censor bad words

To censor the bad words we need to use censor() method from the profanity package. It will filter the swear words from the text.

from better_profanity import profanity

text = 'You piec3 of sHIT.'

censored = profanity.censor(text)
print(censored)

# Output: You **** of ****.

Censor words with word dividers

better-profanity package mask the words separated not just the space but also dividers such as _, , .

from better_profanity import profanity

if __name__ == "__main__":
    text = "...sh1t...hello_cat_fuck,,,,123"

    censored_text = profanity.censor(text)
    print(censored_text)

# Output: "...****...hello_cat_****,,,,123"
    

Censor words with custom character

The character in second parameter in .censor() will be used to replace the swear words.

from better_profanity import profanity

if __name__ == "__main__":
    text = "You p1ec3 of sHit."

    censored_text = profanity.censor(text, '-')
    print(censored_text)
# Output: You ---- of ----.

Adding custom censor words

Function load_censor_words takes a List of strings as censored words. The provided list will replace the default wordlist.

from better_profanity import profanity

if __name__ == "__main__":
    custom_badwords = ['happy', 'jolly', 'merry']
    profanity.load_censor_words(custom_badwords)

    print(profanity.contains_profanity("Have a merry day! :)"))
# Output: Have a **** day! :)

Conclusion

We have seen how to use profanity filter with Python. If you like the post please share it in social media and with your friends.

better-profanity
Blazingly fast cleaning swear words (and their leetspeak) in strings