CiteSpace / Discussion / General Discussion: Explaining entropy (and other information indices)

Stephan De Spiegeleire - 2020-12-26

One of the (many!) unique vizzes of CiteSpace that intrigue me is the 'entropy' one. But I don't yet fully grasp it. The way I (naively) interpret this metric is that it tries to capture the "wow-factor" (more than just 'uncertainty', which is what your 2008 articles seems to emphasize) : do new publications contain information that (even just lexically, based on the terms in its abstract or title or keywords) diverges from the 'cumulative knowledge' in a field, a discipline, etc. in such a way that it (potentially) enhances our understanding. So new/different pieces of information make the reader go 'wow, I never thought about THAT/looked at it from THAT point of view'.

And soI interpret this viz from your 2008 paper on this
as showing that after the Oklahoma city bombing and 9/11, the terrorism literature became 'richer', more diverse in its vocabulary. And that at other moments in this subfield, the literature became poorer/ LESS surprising. Is that intuition even half-way accurate? BTW - our own vizzes on our corpora just look like this . So they don't have the thicker red line for the spike in entropy. Is that normal? I.e. did you add those thick red lines manually, or did we miss sthg in the settings that prevents us for seeing the thick line(s)?

But so in one of your presentations, you also use this slide , in which you compare 4 different information indices applied to the same dataset: frequency-based, entropy-based, relative entropy-based, and information bias-based. Here I interpret the frequency as just salience ("how important is this term in this dataset?"; and entropy as terms that 'enriched' the scientific discussion most. But could you define the intuition behind 'relative entropy' in some more detail, and also its difference with 'non-relative' (?) entropy? And also the same for information bias?

Finally, also this viz is hard to interpret. At least the y-axis is the (to me yet not yet intelligible) 'relative entropy' but what's on the x and z-axes?

I think I speak on behalf of many users of CiteSpace if I say that the video you shared on some of the functionality of CiteSpace was fantastically useful, So the further uptake of CiteSpace would, IMO, be greatly enhanced by a number of 10-15' explanatory videos of various parts of CiteSpace - like the 'entropy' one I mention here.

Last edit: Stephan De Spiegeleire 2020-12-26

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Chaomei Chen - 2020-12-26
  
  Hi Stephan, thanks for posting. Good suggestions on making short videos to explain the underlying theories, concepts, and metrics behind CiteSpace. I will put it on my agenda. In the meantime. Here are some short answers:
  Figure 2: information entropy measures the uncertainty of a concept/topic in the sense of its semantic scope. An increased entropy after a major event indicates that the possible ways for that concept to play out, e.g., in terms of the number of distinct semantic contexts, increased. In other ways, now we would have to ask more questions before we can determine the exact role of the concept in the literature, which is consistent to the common way of explaining the meaning of information entropy.
  The last figure on relative entropy is a comparison of the topics over the years. It tries to answer this question: to what extent topics in a particular year differ from those in another year. If the plane is flat, then not much of changes over time. In contrast, the landscape is unstable over time. So this can be seen as a portrait of the stability of the underlying literature in terms of focal topics.
  
  Last edit: Chaomei Chen 2020-12-26
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Stephan De Spiegeleire - 2020-12-26

Thanks Chaomei! But that still doesn't answer all of my questions:

so is my (naive) 'wow-factor' interpretation of entropy wrong then?

can you explain in some more detail the difference between 'entropy' and 'relative entropy'?

what exactly does the 'information bias' metric convey (in layman terms) ?

how can we get the 'thick red' line (for increased entropy) that you had in your 2008 article in our own vizzes?

what exactly is portrayed on the x and the z axes of the last 3D viz? And how can we replicate that for our own dataset(s)?

Last edit: Stephan De Spiegeleire 2020-12-26
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Chaomei Chen - 2020-12-26
  
  The wow interpretation is not wrong. It covers a major scenario, but it is not all, which is why I emphasized the uncertainty, and this uncertainty differs from what I use in my work presented in Representing Scientific Knowledge: The Role of Uncertainty. That is another story for later. The uncertainty defined by information entropy in this context expands the dimensionality of the underlying concept. For example, the Oklahoma City Bombing significantly broadened the implications of 'physical injuries'. The major 'wow' effect was due to the new recognition of the possibility of 'physical injuries' that may reach a scale that can exceed the capacity of the emergency responding system and the entire healthcare system. This new recognition triggered new research on policy and practical implications of biological and chemical weapons, which would be one of several foreseeable scenarios from the above standpoint. The stress of the current COVID-19 pandemic on the healthcare system is unfortunately the most recent example along this line of worries.
  So in this way, the entropy can flag an expansion, or an explosion sometimes, of the dimensionality of what we think may be possible. What we can say is that a sharp increase of entropy of a concept is a sign that a large number of previously quiet topics or unnoticed topics have been activated and elevated. One may certainly wow to one of these newly found connections. The concept of uncertainty is more fundamental to explain what we see than identifying a surprise because a surprise is just one of the possibly many lessons learned from the indicator. To elaborate it further, we can refer to the notion of a concept tree rooted from each concept such as 'physical injuries'. The size of the tree will be expanded after a large increase of entropy. The number of branches would become larger, corresponding to the increased dimensionality. If you focus on the wos effect, you probably capture a good part of the pragmatic value.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Chaomei Chen - 2020-12-26
  
  in layman's terms, relative entropy is the difference between two entropy values. It is commonly used to compare two probabilistic distributions.
  information bias highlights 'odd' ones in your data. It is useful when you do not want to follow the 'more like this' type of recommendations.
  'thick red' line: I will check the code. Probably I need to polish it and release it. It will look the same as you see in the citation history visualization with a period of burst. If I remember correctly, you are the only few who explored this part of CiteSpace.
  last figure: x and y are years of publication, the z value is the year-year relative entropy across the entire vocabulary, i.e. all the keywords extracted from your dataset. I will also put this on my to-do list to check this function.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Stephan De Spiegeleire - 2020-12-26

Thanks Chaomei! Fascinating... I do think I may have experimented with more parts of CiteSpace than many, but I still keep discovering these new truly amazing layers of functionality that are really the dream of any analyst that wants to REALLY dig deeper not only in more technical bibliometric analysis but also in how a careful mapping of epistemic landscapes can help us to better leverage of the 'state of a field'. Including things like epistemic holes (what you call structural holes); epistemic progress (or regress) in fields; etc.

But so on entropy (and increased dimensionality) - where exactly in CiteSpace can we find out that, in the case of terrorism, for instance, it was terms like 'physical injuries' that 'branched out'? In the step-backward/forward walkthrough of clusters in the viz (new topics that emerge)? In changes in the year-to-year cluster terms ('new' terms in some clusters)? Somewhere else (because it seems to me that this has nothing to with co-citation networks, but just with changes in all the extracted ngrams over time)? Also would it be possible to generate some of the key terms behind some of the major spikes in entropy? (And if possible also declines - e.g. terms that seem to atrophy or become sclerotic, because these can be equally revealing)

On relative entropy - I still don't quite get it. How would you explain to your mother-in-law (for instance :)) the difference between the entropy, relative entropy and information bias graph vizzes in this slide . What does it really mean that forensic science is the top term on entropy, kibbutz on relative entropy and hostages on information bias?

on the 3D-viz on entropy (on terrorism) - I guess I'll wait for the video. But to me, optically, my first analytical impression from that viz is that is seems like the relative entropy was the highest in the early 90s (which makes sense because of 9/11). But what I can NOT intuit, is what it means that the 1990 values for the z-axis seem in the '10'-range in the early 90s, then go up to sthg like 13 (the y-axis marks aren't labelled, so it's hard to tell, and then go down again from about 1994 or so (the x-axis marks also aren't (re-)labeled on top, so also that is hard to see). Or that the y-values for the more recent years on the z-axis are the highest (in red); then seem to be going down for a few years, then up again by the time that (presumably) the first academic articles about 9/11 came out, but then pretty much down again until 2006. I'm not sure whether I'm making myself understood here. But again the 'big picture' of mostly declining entropy since 9/11 I 'get', because the whole 'plane' seems to be doing down as your eyes pan from the back, left and right to the bottom right (the most recent years). But I don't understand, for instance, what the very high 'red' peaks on the front left really mean, and how they compare to the very low navy blue patch in the back on the right.

on the thick red line - great, thanks!

Last edit: Stephan De Spiegeleire 2020-12-27
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Chaomei Chen - 2021-01-07
  
  There are also some detailed introductions and examples of these and relevant metrics in the following chapter of our 2017 book:
  Chen C., Song M. (2017) Measuring Scholarly Impact. In: Representing Scientific Knowledge. Springer. https://doi.org/10.1007/978-3-319-62543-0_4
  https://link.springer.com/chapter/10.1007/978-3-319-62543-0_4
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Stephan De Spiegeleire - 2021-01-07

Great. Thanks Chaomei! The entire book looks great. I'll definitely take a closer look at this. But it also reminded me of another issue: export of vizzes in vector format! To give but one example from the book (p. 82) - this is what the timeline view looks like in raster format - it's just not legible . It's just not legible. Would it be hard to give us some options for exporting vizzes from CiteSpace in SVG (or EPS or PDF or AI) format? And on a related topic: also the ability to export vizzes with a transparent (as opposed to a white or black) background would also be great.

Last edit: Stephan De Spiegeleire 2021-01-07

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Stephan De Spiegeleire - 2022-06-06

Still picking up on the regular Entropy viz within CiteSpace - is there a way to make sure the x-axis becomes visible? Right now they always end up like this .
Oh and also - I have yet to find a Windows piece of software that can open Citespace gml-files. I tried many, but there is always some invalid gml-parsing error. Do you have any suggestions for that?
Thanks!

Last edit: Stephan De Spiegeleire 2022-06-06

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Chaomei Chen - 2022-06-19
  
  I will add these two to my to-do list.
  For the gml case, what was the original data source, WoS? Can you find the error message and the corresponding lines in the gml file?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Stephan De Spiegeleire - 2022-06-19
    
    I can't find an example now; but next time I encounter this problem, I'll look into it and will share the gml.
    On the actual years not showing up on the x-axis in the entropy viz - I'm not sure whether this is related and whether you're aware of this, but it is also not visible in the main graph viz either - see
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Explaining entropy (and other information indices)

A widely used tool for visual exploration of scientific literature.

Forums

Help

Explaining entropy (and other information indices)

Explaining entropy (and other information indices)

A widely used tool for visual exploration of scientific literature.

Forums

Help

Explaining entropy (and other information indices) document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Explaining entropy (and other information indices)