Re: [Algorithms] Algorithm for determining 'word difficulty'
Brought to you by:
vexxed72
From: Binh N. <ng...@gm...> - 2010-06-18 21:33:21
|
You can also use some online metrics: + Google News (syndicated of most common "news" + rank) + Number of hits per search On Fri, Jun 18, 2010 at 2:12 PM, Matthew Harmon <ma...@ev...>wrote: > As nice as "automatic" would be, I think if you are targeting very young > kids you'll also want to add a "human tuned" metric to the weighting > system. As you've no doubt seen, each school grade level has a list of > "sight words" that kids are supposed to know, as well as other common or > high-frequency words that they are supposed to be picking up. I'd guess > this information is available somewhere/somehow - even if it means asking > some teachers for help. (For that matter, it's possible there are > state-mandated lists of words that kids must know at different grade > levels.) > > I'd guess that the younger the age target, the more "human tuned" the word > selection is going to need to be. As they get older, you can probably rely > more on purely statistical metrics, etc. > > On Fri, Jun 18, 2010 at 12:18 PM, John Ratcliff <jra...@gm... > > wrote: > >> I have an interesting little project I'm working on and I thought I would >> solicit the list to see if anyone else has some ideas. >> >> I'm creating an educational word game that focuses on spelling and >> vocabulary; it is designed to run on mobile devices (Ipad, Iphone, Droid, >> etc.). This is just a fun little side project I'm doing so my son can learn >> more hands on programming. My daughter is doing the artwork so we are >> making it a little family project. >> >> I first wrote this game for an Apple II in 1983 so it's kind of fun to be >> making a new version for today's devices. Back then, I didn't have enough >> memory to store a really large word list. Today I have the ability to store >> the entire English dictionary. And, not just the words, but also every >> component associated with each word (synonyms, etymology, definitions, etc.) >> >> The algorithm I am looking for is how to automatically come up with a >> 'difficulty' metric for each word in the English language. >> >> My thoughts are that I could consider the following: >> >> (1) Length of the word, though to be honest very short words can be >> difficult too if they are obscure. >> (2) Number of definitions. >> (3) Field of study of the word (biology, physics, etc.) The open source >> English dictionary I have access to provides this data. >> (4) Whether the word is a verb, noun, etc. >> (5) Cross reference each word against a thesaurus and consider the >> difficulty/obscurity based on how many synonyms and antonyms there are >> total. >> >> One thing that would help immensely if if I had access to a word list of >> the 'most common' words in the English language. Hopefully I can find such >> a list and this would provide me an excellent first guess at whether or not >> a word is obscure or not. >> >> When you play the game you get to choose the difficulty level you want to >> play at really could have two metrics. Difficulty to spell, or difficulty >> in terms of knowing recognizing the word. (The game itself more or less >> works like wheel or fortune or hangman, you are just trying to guess a >> single word rather than a phrase). >> >> Any thoughts on an algorithm which could more or less automatically score >> the entire English language by 'difficultly to spell' and 'difficulty to >> recognize'? Assuming you have as input all of the data in a standard >> dictionary and thesaurus? >> >> Thanks, >> >> John >> >> >> ------------------------------------------------------------------------------ >> ThinkGeek and WIRED's GeekDad team up for the Ultimate >> GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the >> lucky parental unit. See the prize list and enter to win: >> http://p.sf.net/sfu/thinkgeek-promo >> _______________________________________________ >> GDAlgorithms-list mailing list >> GDA...@li... >> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list >> Archives: >> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list >> > > > > ------------------------------------------------------------------------------ > ThinkGeek and WIRED's GeekDad team up for the Ultimate > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the > lucky parental unit. See the prize list and enter to win: > http://p.sf.net/sfu/thinkgeek-promo > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > -- -------------------------------------------------- Binh Nguyen Computer Science Department Rensselaer Polytechnic Institute Troy, NY, 12180 -------------------------------------------------- |