|
From: Nick H. <NH...@SE...> - 2001-04-18 21:49:48
|
Some thoughts... Should there be a score or should it be pass/fail? This is the Word of God, you know. Is anything less than perfect acceptable? How about a visual summary, showing the user's entry compared to the correct verse. All errors could be highlighted. This might simplify the task of specifically identifying what words were out of order, what words were omitted, etc. ========================================================================== Your entry: All scripture is *insired* by God, * *profitable for teaching, *for correction, for reproof*, for training in righteousness; so that the man of God may be adequate, equipped for every good work. Verse: All scripture is inspired by God, and profitable for teaching, for reproof, for correction, for training in righteousness; so that the man of God may be adequate, equipped for every good work. =========================================================================== The highlights (represented by *) reveal that: - "insired" is a spelling error. - " " before profitable shows "and" was omitted. - "profitable...reprof" is out of order. "for training" is where the order resumes correctly. Different colored highlights can be used, too. Does this help? Nick -----Original Message----- From: Patrick Lacson [mailto:pa...@la...] Sent: Wednesday, April 18, 2001 2:22 PM To: Mike Lucas Cc: htw-list Subject: [Tsaphan-developers] Verse Diff program Mike, I'm ccing the htw list because we need everybody's feedback on this Diff algorithm. Also the basic requirement needs feedback: 1) Compare 2 String types 2) Allow comparison preference level (check for punctuation, CaPITilaZaTion, etc.. -- how accurate) 3) Compute the percentage based on how accurate/inaccurate the attempted verse is vs. the actual verse So here's a suggested test case (for 2Tim 3:16-17) ################################################################## All Scripture is inspired by God and profitable for teaching, for reproof, for correction, for training in righteousness; so that the man of God may be adequate, equipped for every good work. ################################################################## Attempted verse: ---------------- All scripture is insired by God, profitable for teaching, for correction, for reproof, for training in righteousness; so that the man of God may be adequate, equipped for every good work. RESPONSE: ---------- a) Mispelled -- "insired" b) Incorrect -- "God, profitable for teaching, for correction, for reproof, " c) Score is 70% My basic approach would be to use 2 arrays and tokenize the 2 strings into the arrays: actual_arr[0] = "All"; actual_arr[1] = "Scripture"; actual_arr[2] = "is"; actual_arr[3] = "inspired"; actual_arr[4] = "by" ... attemp_arr[0] = "All"; attempt_arr[1] = "Scripture"; attempt_arr[2] = "is"; attempt_arr[3] = "insired"; // red flag here for misspelled ... // continue to process Compare the 2 arrays (attempted/actual) for word matches, mispelled words, punctionation marks. So familiarize yourself with the StringTokenizer class and the Diff algo in the jcore.utils.Diff package. This may not be the best way to do this, but this atleast allows some easy answers right off the bat regarding the accuracy of their verse attempt, missing words/punctuation (basically any token), non-matching words. However the algo *may* get lost from a few missing words, so we have to make it smarter in figuring out where the remaining words are.. this is where the real challenge of this diff algo lies: maintaining context and pattern matching (via regular expressions??). So think about this approach and let me know what pros/cons you see -- shoot your ideas out and let me know cuz I need as much feedback from everybody on this as I can.. Here's some questions I had to ask myself about this design: 1) does it make sense to use 2 token-arrays for comparison 2) how do we maintain context if the 2nd array is missing words, how do we catch up to the original 3) Should we even do this approach? 4) Are there other systems out there that have a text parser already available that we can reuse 5) How should we score the attempt?? amount of words, mispelled words, missing words, etc.. Sorry if I'm being a bit verbose, but I'm very excited that we have another developer on the squad to help us out with this.. I'd like to bounce all ideas to the list and get everybody involved developer or not just to see if things are making sense -- I tend to think too short-term and neglect long-term implications. -P Michael Lucas wrote: > > Pat, > > I've done a limited amount of network programming and know nothing > on threads. I don't mind trying to work on it, if you are not too > constrained on time while I learn about it. Either project is good for > me. > > - Mike _______________________________________________ Tsaphan-developers mailing list Tsa...@li... http://lists.sourceforge.net/lists/listinfo/tsaphan-developers *********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. *********************************************************************** |