Re: [Algorithms] Algorithm for determining 'word difficulty'
Brought to you by:
vexxed72
From: jhorton <jh...@ro...> - 2010-06-18 17:54:19
|
My daughter has brought home dozens of books at various difficulty levels from the school. I'm sure you could search for things like grade 1 word lists and so on. Scholastic books mostly it seems. On Fri, Jun 18, 2010 at 12:18:14PM -0500, John Ratcliff wrote: > I have an interesting little project I'm working on and I thought I would > solicit the list to see if anyone else has some ideas. > > I'm creating an educational word game that focuses on spelling and > vocabulary; it is designed to run on mobile devices (Ipad, Iphone, Droid, > etc.). This is just a fun little side project I'm doing so my son can learn > more hands on programming. My daughter is doing the artwork so we are > making it a little family project. > > I first wrote this game for an Apple II in 1983 so it's kind of fun to be > making a new version for today's devices. Back then, I didn't have enough > memory to store a really large word list. Today I have the ability to store > the entire English dictionary. And, not just the words, but also every > component associated with each word (synonyms, etymology, definitions, etc.) > > The algorithm I am looking for is how to automatically come up with a > 'difficulty' metric for each word in the English language. > > My thoughts are that I could consider the following: > > (1) Length of the word, though to be honest very short words can be > difficult too if they are obscure. > (2) Number of definitions. > (3) Field of study of the word (biology, physics, etc.) The open source > English dictionary I have access to provides this data. > (4) Whether the word is a verb, noun, etc. > (5) Cross reference each word against a thesaurus and consider the > difficulty/obscurity based on how many synonyms and antonyms there are > total. > > One thing that would help immensely if if I had access to a word list of the > 'most common' words in the English language. Hopefully I can find such a > list and this would provide me an excellent first guess at whether or not a > word is obscure or not. > > When you play the game you get to choose the difficulty level you want to > play at really could have two metrics. Difficulty to spell, or difficulty > in terms of knowing recognizing the word. (The game itself more or less > works like wheel or fortune or hangman, you are just trying to guess a > single word rather than a phrase). > > Any thoughts on an algorithm which could more or less automatically score > the entire English language by 'difficultly to spell' and 'difficulty to > recognize'? Assuming you have as input all of the data in a standard > dictionary and thesaurus? > > Thanks, > > John > ------------------------------------------------------------------------------ > ThinkGeek and WIRED's GeekDad team up for the Ultimate > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the > lucky parental unit. See the prize list and enter to win: > http://p.sf.net/sfu/thinkgeek-promo > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list |