Search Relevance Scoring
Relevance scoring is based on a combination of several factors:
- Term Frequency - A higher score is given to a document the more times a term is found in a document's field(s).
e.g., in a search for 'rss', a project named 'RSS Bandit' with 'RSS' in the description twice scores higher than a project named 'RSS Reader' with 'RSS' in the description only once.
- Inverse Document Frequency - The document frequency is the number of documents that contain the term. The inverse is positively correlated to the score. Rare terms give a higher score to a document.
e.g., in a search for 'rss podcast' projects matching 'podcast' (~150 projects) score higher than those matching 'rss' (~1k projects).
- Co-ordination Factor - The greater the number of query terms matched, the greater the score.
e.g., in a search for 'rss podcast' projects matching both 'rss' and 'podcast' score higher than those matching only 'rss' or 'podcast'
- Field Length - The shorter the matching field is, the greater the score.
e.g., in a search for 'rss java', a project description of 'rss reader in java' scores higher than a description of 'rss and atom reader written in java using spring framework and jsp'
For Project Search, the textual analysis factors apply in proportion to the following fields:
- Project Unix Group Name - 2x boost
- Project Name - 1x (baseline) boost
- Project Description - 0.5x boost
Note: the field length factor doesn't apply to the description field so longer descriptions aren't penalized
Other additional factors for project search:
- Query Term Count - If 3 or 4 query terms are entered, the project must match all but 1; if 5 or more query terms are entered, the project must match 80% of the query terms entered
- File Releases
- Number of Downloads
- Project Thumbs Up/Down Rating
So, to improve search scoring
- Put precise target terms in the Project Name & Description
- Make file releases
- Generate more downloads
- Receive positive ratings from users