Note: We are working on updating our documentation, this page has been identified as still needing improvement. This notice will be removed once these improvements are complete.
Search Relevance Scoring
Relevance scoring is based on a combination of several factors:
- Term Frequency - A higher score is given to a document the more times a term is found in a document's field(s).
e.g., in a search for 'rss', a project named 'RSS Bandit' with 'RSS' in the description twice scores higher than a project named 'RSS Reader' with 'RSS' in the description only once.
- Inverse Document Frequency - The document frequency is the number of documents that contain the term. The inverse is positively correlated to the score. Rare terms give a higher score to a document.
e.g., in a search for 'rss podcast' projects matching 'podcast' (~150 projects) score higher than those matching 'rss' (~1k projects).
- Coordination Factor - The greater the number of query terms matched, the greater the score.
e.g., in a search for 'rss podcast' projects matching both 'rss' and 'podcast' score higher than those matching only 'rss' or 'podcast'
- Field Length - The shorter the matching field is, the greater the score.
e.g., in a search for 'rss java', a project description of 'rss reader in java' scores higher than a description of 'rss and atom reader written in java using spring framework and jsp'
For Project Search, the textual analysis factors apply in proportion to the following fields:
- Project Unix Group Name - 2x boost
- Project Name - 1x (baseline) boost
- Project Description - 0.5x boost
Note: the field length factor doesn't apply to the description field so longer descriptions aren't penalized
Other additional factors for project search:
- Query Term Count - If 3 or 4 query terms are entered, the project must match all but 1; if 5 or more query terms are entered, the project must match 80% of the query terms entered
- File Releases
- Number of Downloads
- Project Thumbs Up/Down Rating
Improving your Search Scoring:
- Put precise target terms in the Project Name & Description
- Make file releases
- Generate more downloads
- Receive positive ratings from users