Re: [Algorithms] Algorithm for determining 'word difficulty'
Brought to you by:
vexxed72
From: John R. <jra...@gm...> - 2010-06-18 17:37:01
|
Great idea Jeff, I didn't think about that. I could download a number of open source books at various reading levels, and then build a database against each. That actually sounds like the perfect solution! Project Guttenberg, here I come with my big ass Perl script..... On Fri, Jun 18, 2010 at 12:26 PM, Jeff Russell <je...@8m...>wrote: > My thought was the same as yours as far as the frequency of use entering > into it. If you can't find any good data on that, it might not be hard to > generate it if you have a good deal of text on hand. Run a number of novels > or something through some simple app that tracks word frequencies, and you > might have the start to a database at least. > > Jeff > > On Fri, Jun 18, 2010 at 12:18 PM, John Ratcliff <jra...@gm... > > wrote: > >> I have an interesting little project I'm working on and I thought I would >> solicit the list to see if anyone else has some ideas. >> >> I'm creating an educational word game that focuses on spelling and >> vocabulary; it is designed to run on mobile devices (Ipad, Iphone, Droid, >> etc.). This is just a fun little side project I'm doing so my son can learn >> more hands on programming. My daughter is doing the artwork so we are >> making it a little family project. >> >> I first wrote this game for an Apple II in 1983 so it's kind of fun to be >> making a new version for today's devices. Back then, I didn't have enough >> memory to store a really large word list. Today I have the ability to store >> the entire English dictionary. And, not just the words, but also every >> component associated with each word (synonyms, etymology, definitions, etc.) >> >> The algorithm I am looking for is how to automatically come up with a >> 'difficulty' metric for each word in the English language. >> >> My thoughts are that I could consider the following: >> >> (1) Length of the word, though to be honest very short words can be >> difficult too if they are obscure. >> (2) Number of definitions. >> (3) Field of study of the word (biology, physics, etc.) The open source >> English dictionary I have access to provides this data. >> (4) Whether the word is a verb, noun, etc. >> (5) Cross reference each word against a thesaurus and consider the >> difficulty/obscurity based on how many synonyms and antonyms there are >> total. >> >> One thing that would help immensely if if I had access to a word list of >> the 'most common' words in the English language. Hopefully I can find such >> a list and this would provide me an excellent first guess at whether or not >> a word is obscure or not. >> >> When you play the game you get to choose the difficulty level you want to >> play at really could have two metrics. Difficulty to spell, or difficulty >> in terms of knowing recognizing the word. (The game itself more or less >> works like wheel or fortune or hangman, you are just trying to guess a >> single word rather than a phrase). >> >> Any thoughts on an algorithm which could more or less automatically score >> the entire English language by 'difficultly to spell' and 'difficulty to >> recognize'? Assuming you have as input all of the data in a standard >> dictionary and thesaurus? >> >> Thanks, >> >> John >> >> >> ------------------------------------------------------------------------------ >> ThinkGeek and WIRED's GeekDad team up for the Ultimate >> GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the >> lucky parental unit. See the prize list and enter to win: >> http://p.sf.net/sfu/thinkgeek-promo >> _______________________________________________ >> GDAlgorithms-list mailing list >> GDA...@li... >> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list >> Archives: >> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list >> > > > > -- > Jeff Russell > Engineer, 8monkey Labs > www.8monkeylabs.com > > > ------------------------------------------------------------------------------ > ThinkGeek and WIRED's GeekDad team up for the Ultimate > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the > lucky parental unit. See the prize list and enter to win: > http://p.sf.net/sfu/thinkgeek-promo > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > |