Re: [GD-Consoles] username dictionaries

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Tue, 27 May 2003 18:21:27 +0800
"Research \(GameBrains\)" <res...@ga...> wrote:

> I'm trying to find a dictionary of vulgar, profane and obscene
> usernames so that we can prevent users from signing up for an
> account using one.  This must be a solved problem but I can't
> seem to find any resources for this.  I thought perhaps the
> console people that hang out in this forum might be more likely
> to know something about this?

    If by "solved problem" you mean "quagmire of madness in which
many a good programmer has been lost", or perhaps "a solved problem
the same way natural language parsing is a solved problem", then
yes.  You'll get to something that handles some of the trivial stuff
quite quickly, but you'll never get it all.

    It's the strong ai problem, and you've got people working
against you trying to see what they can slip by your validator.  Not
only that, but you have people who's legitimate names may well
contain substrings that match against your "bad word" dictionary
(Sexton, Crapper...).  The best you can hope for is to flag
suspicious names for later evaluation by a human.

    It gets that much worse if you have to internationalize the
thing; "shite" (shitay) is the imperative form of "suru" ("to do") in
Japanese, and "phuque" is French for "sea lion" or "seal".  Every
language in the world used for human discourse has its fair share of
the vulgar, the profane and the obscene, and in many cases there's
bad phonetic crosstalk with "good" words in other languages.

    You also have the problem that if you do this kind of filtering,
you've legally taken on a policy, which may have wider implications
than you think.  For example, if you're filtering what people say
in the slightest way (even just username validation), in some of the
more litigious parts of the world you might find that opened you up
to liability if some legal dispute (harassment?  slander?  mp3
trading?) came up between some of your users, or between one of your
users and the outside world.

    Fundamentally, however, your biggest problem is your users;  Anyone
who was going to try to use a "bad word" as their user name is going
to try to do the same within whatever limits your system imposes.  You'll
wind up with standard h4x0r speak, rude combinations of allowed words
(how do you plan on blocking something like "HamsterStuffer" or
"ManPole" or...?), and words that you won't know are offensive until you
get mail from the offended.  Do you really think you can easily assemble
a dictionary of all the racial slurs in the world?

    If you really must filter user names, you're going to need a person
to do it, and you're probably going to want a tool that deals with batches
of names and categorizes them based on suspiciousness.  You'll still have
lots of misses, the human reviewer will make mistakes and be subject to
sliding standards based on their mood, but that's about the best you
can hope for.

    Or you could just assign a name, or give them whatever name you find
on the billing address.  Most users will hate that, though.

    If it's for kids, and you really, really want to sand off all the
corners, you could always make the user name something like "adjective
adjective noun", and you supply the lists of from which to pick in
clickable form.  That solves the profanity problem (unless people can
chat in-game, in which case you're screwed anyways...), at the expense
of making initial name selection a trying experience for the user:

    "Sorry, user name 'happy fluffy bunny' is already in use.  Sorry, user
name 'fluffy happy bunny' is already in use.  Sorry, user name...".

							Todd.

-- 
  Todd Showalter
  to...@ro...