Re: [Algorithms] Algorithm for determining 'word difficulty'
Brought to you by:
vexxed72
From: Samuel M. <sam...@go...> - 2010-06-18 18:12:32
|
I think a really good metric for difficulty would be how often other people spell this particular word wrong. So you could take a lot of text that *includes misspelled words* and count the odds. Public forums or other websites would be perfect. You can even pick your target group ;) The difficulty here is to find out what word was actually meant by a misspelled one. But on the other hand, just take any reasonable metric for distance between words (there should be literature on that regarding automatic spell correction) and assume the closest one (or the N closest words that are not too far away) was meant. Maybe you could also use google... generate a few variations on each word and count the number of google results ;) Greets, Samuel On Fri, Jun 18, 2010 at 7:35 PM, jhorton <jh...@ro...> wrote: > My daughter has brought home dozens of books at various difficulty levels from the school. I'm sure you could search for things like grade 1 word lists and so on. Scholastic books mostly it seems. > > > On Fri, Jun 18, 2010 at 12:18:14PM -0500, John Ratcliff wrote: >> I have an interesting little project I'm working on and I thought I would >> solicit the list to see if anyone else has some ideas. >> >> I'm creating an educational word game that focuses on spelling and >> vocabulary; it is designed to run on mobile devices (Ipad, Iphone, Droid, >> etc.). This is just a fun little side project I'm doing so my son can learn >> more hands on programming. My daughter is doing the artwork so we are >> making it a little family project. >> >> I first wrote this game for an Apple II in 1983 so it's kind of fun to be >> making a new version for today's devices. Back then, I didn't have enough >> memory to store a really large word list. Today I have the ability to store >> the entire English dictionary. And, not just the words, but also every >> component associated with each word (synonyms, etymology, definitions, etc.) >> >> The algorithm I am looking for is how to automatically come up with a >> 'difficulty' metric for each word in the English language. >> >> My thoughts are that I could consider the following: >> >> (1) Length of the word, though to be honest very short words can be >> difficult too if they are obscure. >> (2) Number of definitions. >> (3) Field of study of the word (biology, physics, etc.) The open source >> English dictionary I have access to provides this data. >> (4) Whether the word is a verb, noun, etc. >> (5) Cross reference each word against a thesaurus and consider the >> difficulty/obscurity based on how many synonyms and antonyms there are >> total. >> >> One thing that would help immensely if if I had access to a word list of the >> 'most common' words in the English language. Hopefully I can find such a >> list and this would provide me an excellent first guess at whether or not a >> word is obscure or not. >> >> When you play the game you get to choose the difficulty level you want to >> play at really could have two metrics. Difficulty to spell, or difficulty >> in terms of knowing recognizing the word. (The game itself more or less >> works like wheel or fortune or hangman, you are just trying to guess a >> single word rather than a phrase). >> >> Any thoughts on an algorithm which could more or less automatically score >> the entire English language by 'difficultly to spell' and 'difficulty to >> recognize'? Assuming you have as input all of the data in a standard >> dictionary and thesaurus? >> >> Thanks, >> >> John > >> ------------------------------------------------------------------------------ >> ThinkGeek and WIRED's GeekDad team up for the Ultimate >> GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the >> lucky parental unit. See the prize list and enter to win: >> http://p.sf.net/sfu/thinkgeek-promo > >> _______________________________________________ >> GDAlgorithms-list mailing list >> GDA...@li... >> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list >> Archives: >> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > > > ------------------------------------------------------------------------------ > ThinkGeek and WIRED's GeekDad team up for the Ultimate > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the > lucky parental unit. See the prize list and enter to win: > http://p.sf.net/sfu/thinkgeek-promo > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > |