I’ve contacted Bram re sites indexed by Google Scholar. We have a relatively small and new repository (http://scholar.sun.ac.za 3618 items and I year old), but Google returned 136,00 results when I conducted the site: search. Bram’s reply:
the more pages that are indexed, the more keywords might lead to your repository, and eventually the more traffic you will be able to direct to the repository from search engines. However, it's strange that google was able to index over 130.000 pages from your repository, while I currently see only 3618 items. Is there anything else aside from dspace, hosted under the domain scholar.sun.ac.za<http://scholar.sun.ac.za> ?
maybe it could be:
- links to all of your individual full texts
- links to different pages in the browse indexes
the google webmaster toolkit might be able to tell you more ( http://www.google.com/webmasters/ )
lirias.kuleuven.be<http://lirias.kuleuven.be> has over 200.000 items, but has "only" 257.000 pages indexed in google, the difference is a lot smaller there. But this site has a pagerank of 7, which will mean that, for example, when the search term "marketing" would get a result from lirias, and from sunscholar, the result from lirias will be up higher in the google search result listing than the result from sunscholar.
These kind of metrics are very useful, IF one of the goals of the repository is to enhance the online exposure & prestige for the institution. There are some more metrics in the attached report. You can generate such reports for free with the tool from seoquake.com<http://seoquake.com>. (I'm also still learning on this field, so I'm not an expert on all of these values. But for instance, the google pagerank of your site is 4, this is not bad but could be a lot better by, for example increasing the number of incoming links to your repository).
Of course, this has nothing to do with other goals, for example reaching a certain percentage of coverage of your institutions output (which percentage of the intellectual output, let's say from 2010, is already captured in the repository ? Are you particularly strong/weak in certain disciplines, ...)
We thought it might be useful to the rest of the DSpace community as well. Thanks Bram for your reply!
From: Bram Luyten [mailto:bram@...]
Sent: 08 December 2010 14:29
Subject: [Dspace-general] Coming up: The January Webometrics Repository Ranking - Is your DSpace repository ready ?
(This message was originally written as a facebook note<http://on.fb.me/gAfEBn> . The images and layout of the original messages were lost in the copy below)
Early 2011, the CSIC Cybermetrics Lab (Spain) will harvest, analyze and publish data about online visibility of IRs in its Top 800 Institutional Repository Ranking<http://repositories.webometrics.info/>. The January ranking will reflect the visibility of repositories in different online search engines. A limited time window snapshot of data is being used, as the ranking production schedule shows:
Data collection: 1st - 10th of January
Analysis: 10 - 24th of January
Publication: week of the 25th of January
Why would you or your management care ?
The aim of the Ranking is to support Open Access initiatives and therefore the free access to scientific publications in an electronic form and to other academic material. The web indicators are used to measure the global visibility and impact of the scientific repositories.
When increasing exposure for your digital research output is one of the objectives of your repository project, it's a logical consequence that you should attempt to measure how well you are doing in attaining this objective. Although internal metrics are the primary tools to track the progress in this area, the ranking offers great opportunities to identify and learn from other successful repositories.
Zooming in on your own repository metrics, comparing them over time, allows you to demonstrate progress in attaigning certain targets. If online exposure is the target you want to measure, it can be very useful to take a look at how many pages of your repository are indexed in the most popular search engines. This is also a very important metric in the repository ranking methodology.
To do this, you can either dig into the search engines yourself. For example, entering the query "site:<<repo url>>" in Google will show you the number of repository pages indexed. By doing this every month or week, you can keep track on how the exposure of your repository grows. Handy tools, such as the SEO Quake<http://www.seoquake.com/> browser add-on help you to automate this process.
The repository ranking totally disregards repository usage (pageviews, downloads), simply because those data are not easily accessible for the people at the Cybermetrics Lab. However, tools like Google Analytics<http://analytics.google.com/> or the internal DSpace SOLR statistics<http://bit.ly/bXCyFb> enable you to keep track of your repository usage.
Learning from the repository ranking
When internal metrics are the primary tools, already offering a wide range of options to track your repository's progress on certain metrics, why bother with the ranking ? Although there are a few pitfalls, there is definitely an opportunity to learn from successful repositories.
When comparing the ranking for your repository, to its previous ranking, you could get strange results (e.g. very big jumps) because of changes in the ranking algorithm. This was the case when comparing the rankings from January vs July 2010<http://www.facebook.com/note.php?note_id=410508203767>. However, for this edition, the Cybermetrics lab has assured that the ranking algorithm wouldn't be tweaked compared to the July 2010 edition.
Although it's clear from the ranking methodology<http://repositories.webometrics.info/methodology_rep.html> that repositories with a thousands of items generally score higher than very poorly populated ones, having the highest number of pages is no automatic ticket to the top. For example, although the University of Sao Paulo<http://www.teses.usp.br/index.php?lang=en> has 26.166 items indexed, it's placed higher than the Kyoto University's KURENAI repository<http://repository.kulib.kyoto-u.ac.jp/dspace/browse-title> with almost 100.000 items. The scores on four different indicators (size, visibility, rich files, scholar) show on which of the areas a repository can improve in order to improve its overall ranking.
Including your repository in the Webometrics Repository Ranking
If your repository hasn't been included in any of the previous rankings, it won't necessarily mean that it didn't perform well enough to make the top 800, but could also indicate that the Cybermetrics lab is not aware of your repository. Send an email with your repository URL to isidro.aguillo@...>, well before the 1st of January, to ensure that your URL gets included in the data collection phase.
Only repositories with an autonomous web domain or subdomain are included:
Although it will take some technical work to change your URL while ensuring proper redirects for your older URL's, it's definitely worth to go through this trouble in getting an autonomous web domain or subdomain.
Those repositories consisting only of one or several electronic journals (journal' portals), or devoted to non scientific papers or focusing in archival material are excluded.
Apart from these basics, more best practices can be consulted here<http://repositories.webometrics.info/best_practices.html>.
We wish you the best of success in optimizing your repository for increasing online exposure and hope to see an increasing number of DSpace installations in the new top 800.
with kindest regards,
@mire - http://www.atmire.com
Technologielaan 9 - 3001 Heverlee - Belgium
533 2nd Street - Encinitas, CA 92024 - USA
http://www.togather.eu - Before getting together, get Tog@...