Samudra Manthan uses C and MPI for finding interesting n-grams(terms) in a large corpus of data. We use the GigaWord corpus to find top m interesting n-grams using TF*IDF measure.
Be the first to post a text review of Samudra-Manthan. Rate and review a project by clicking thumbs up or thumbs down in the right column.
Added functionality to provide your stoplist file and change the stoplist file
Be the first person to add a text review.
Copyright © 2009 Geeknet, Inc. All rights reserved. Terms of Use
Thanks for your rating!
Would you also like to write a review?
Thanks for your review!
Get credit for your review by logging in via OpenID. Click your account provider: