c# - Filtering bad words and all permutations of intentionally misspelled words? -
what's way using regular expressions filter curse words block of text?
i don't want replace "ass" in classic (a clbuttic mistake), needs able earch word boundary.
additionally, need catch permutations such l33tpeak, spaces in word, etc. doesn't have perfect (the system going have message flagging capabilities) should majority of cursing people may use.
pg13 example: if trying block word "moist", should able match "moist" "m01st", "moist", "m0ist" , "m oist".
here's c# equivalent closed thread located @ "bad words" filter, based off answer @unknwntech provided:
public string replacebadwords(string data, string[] badwords, out int badwordcount) { int count = 0; regex r; string op = data; foreach (var word in badwords) { var expword = expandbadwordtoincludeintentionalmisspellings(word); r = new regex(@"(?<pre>\s+)(?<word>" + expword + @")(?<post>\s+|\!\?|\.)"); var matches = r.matches(data); foreach (match match in matches) { string pre = match.groups["pre"].value; string post = match.groups["post"].value; string output = pre + new string('*', word.length) + post; op = op.replace(match.value, output); count++; } } badwordcount = count; return op; } public string expandbadwordtoincludeintentionalmisspellings(string word) { var chars = word .tochararray(); var op = "[" + string.join("][", chars) + "]"; return op .replace("[a]", "[a @]") .replace("[b]", "[b b i3 l3 i3]") .replace("[c]", "(?:[c c \\(]|[k k])") .replace("[d]", "[d d]") .replace("[e]", "[e e 3]") .replace("[f]", "(?:[f f]|[ph ph ph ph])") .replace("[g]", "[g g 6]") .replace("[h]", "[h h]") .replace("[i]", "[i l ! 1]") .replace("[j]", "[j j]") .replace("[k]", "(?:[c c \\(]|[k k])") .replace("[l]", "[l l 1 ! i]") .replace("[m]", "[m m]") .replace("[n]", "[n n]") .replace("[o]", "[o o 0]") .replace("[p]", "[p p]") .replace("[q]", "[q q 9]") .replace("[r]", "[r r]") .replace("[s]", "[s s $ 5]") .replace("[t]", "[t t 7]") .replace("[u]", "[u u v v]") .replace("[v]", "[v v u u]") .replace("[w]", "[w w vv vv]") .replace("[x]", "[x x]") .replace("[y]", "[y y]") .replace("[z]", "[z z 2]") ; }
this job @ preventing clbuttic mistakes (yes, google it) long have bad word list.
Comments
Post a Comment