Menu

Local and Global Citation Counts

One of the questions asked frequently by new users of CiteSpace is how come the citatation frequency of a paper shown in CiteSpace is often lower than what you may see on Google Scholar or the Web of Science. So the next questions may go like this: what do we miss and what can we do about it?
The discrepancies are due to the scope and the depth of sampling. The most common way to use CiteSpace begins with a dataset that you have collected from the Web of Science, Dimensions, the Lens, or some other repositories of scholarly publications. These repositories typically contain hundreds of millions of scholarly publications. They are like an ocean of papers. The dataset that you are typically interested in and intended to analyze, on the other hand, is a subset of the entire database of any one of these vast ocean of papers. As long as there are papers in the ocean staying outside the scope of your study, then you are dealing with a subset of the pool, i.e. the ocean, of the publications. Your scope is typically much narrower than the scope invisibly drawn by these individual oceans and, of course, even smaller than a combination of all the available oceans of papers.
Now let's see what difference this may lead when you count citations of a paper within your dataset and in one of the original ocean where the dataset was drawn in the first place, for the sake of argument, assuming that is Dimensions. The argument remains the same if you want to compare citation counts with other giants such as the Web of Science and Google Scholar. Assume Dimensions currently has 200,000,000 bibliographic records and you collected a dataset of interest consisting of 10,000 of them. The counting of a paper's citations based on the 10,000-item dataset gives you a local citation score, whereas the counting based on the 200 million base of Dimensions gives you a global citation score. Yes, even the 'global citatoin score' would be different if you choose different oceans as your base. The citation frequency of a paper you see in CiteSpace is a local citation score, which is typically lower and possibly much lower than a global citation score.
The difference between the local and global citation scores is made by articles outside your dataset but out there in the base ocean of yours.
So is it a good idea to expand your original dataset so that you can include these 'outsiders'? Not necessarily - it depends. The devil here is the concept of relevance. The relevance of these 'outsiders' tends to drop quickly, which is largly why they were not captured by your initial dataset in the first place. Including 'outsiders' with diminimshing relevance may reduce the intensity of the focus of your analysis and your study may drift away from your original interest, although one can never rule out the potential benenfits of drifting away from our original course of plan - the potential value of a surprise may outweigh the planned course (We can sidetrack here to another interesting topic - I should probably write about it in another piece).
If, after you carefully considered the implications, you decide it will be indeed a good idea to pull these 'outsiders' into your initial dataset, you may use the Cascading Citation Expansion strategy to incrementally expand your dataset in such a way that you can still exercise some selection criteria to maintain the relevance to a reasonable level as opposed to open up your gate for the entire ocean to flood in.
In summary, the answer to the local vs global citation score discrepancies depends on how you want to delineat the scope of the topic of interest and how you want to construct your dataset in such a way so that it is the most representative and the most efficient. In other words, the optimal dataset would be the smallest in size with the highest relevance.

Further Reading:
Chen, C. and Song, M. (2019) Visualizing a Field of Research: A Methodology of Systematic Scientometric Reviews. PLoS One 14 (10), e0223994.
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0223994

Posted by Chaomei Chen 2020-07-30

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.