Menu

WoS and Scopus recorrds

2021-03-15
2021-11-13
  • Miquel Angel Plaza

    Dr. Chen,
    I suppose this topic has been explained before but I haven't find it.
    Is there any possibility to work simultaneously with records from WoS and Scopus?
    I have seen that records from Scopus must be processed in CiteSpace to be converted to a WoS format.
    Can these converted record be used at the same time with other WoS records?
    Thanks
    Miquel

     
  • Md. Mahiuddin Sabbir

    I suppose you can do so by putting both WoS and converted Scopus file in the data folder of .citespace (your input data folder for the current project).

     
  • Miquel Angel Plaza

    I will try to do a proof. Thanks!
    Miquel

     
  • Miquel Angel Plaza

    It seems that works, but I would like to know the opinion of Dr. Chen, if he recommends this option or if is it necessary to do something more before merge wos and scopus records
    Thanks

    Miquel

     
  • Stephan De Spiegeleire

    Any recommendations for better deduplication here? Scopus is all uppercase, and WoS is not, leading to duplicates staying in the final dataset. Example:

     
  • Miquel Angel Plaza

    An option is to build an alias field in CiteSpace where you can specify which duplicates merge . But this is easy to do if you have only a few cases, but if you have a lot of cases, then it is a lot of work. May be Dr Chen can suggest a better idea.

     
  • Stephan De Spiegeleire

    Yes, the idea is of course that this could be done algorithmically. E.g. Endnote does a pretty good job with this, but you can't export the deduped records in a bibliometric format. The most sophisticated genuinely bibliometric solution to the deduping nightmare I know is Vantagepoint. But that's a commercial solution. It's of course totally scandalous that the providers of the (very expensive) databases (especially Elsevier and Clarivate) get away with such lousy practices - not providing DOIs (which would make deduping a breeze ), the even poor INTERNAL consistency and quality of these datasets (titles sometime uppercase, sometimes now, inconsistent rules for first names etc. ) Anyway, thing ARE starting to look up with open abstracts and open metadata, but for the time being deduplication remains a massive pain in the rear end. But I am pretty confident Chaomei can 'fix' at least the matching algorithm to ignore the case. And I really wish that as a community we could then also come up with better selection algos - e.g. for abstracts, first names, etc. select the version with the most chars etc. AND also the ability to pull in metadata from crossref and other emerging solutions.
    Having said that, I realize that Chaomei has a lot of other items on his to-do list. And that this really an 'edge' case for people who are trying to take the recall issue seriously and therefore want to go the extra mile to get the full scholarly record into CiteSpace "No pub left behind!" :) But hopefully, at SOME point...

     
    • Chaomei Chen

      Chaomei Chen - 2021-11-13

      In general, integrating bibliographic records from distinct sources needs to match records at two levels: at the article level and at the cited reference level.
      At the article level, if DOIs are available, then it would be straightforward. If not, then sampling from different bits and pieces of a record is a good choice.
      At the cited reference level, things can differ wildly and the same reference could evolve beyond the recognition and the most creative imagination.
      If you are dealing with a small number of references, the easiest solution is to use the alias method in CiteSpace. You specify variants of the same reference one by one for CiteSpace to treat them as the same reference.
      Currently, CiteSpace provides another function to scale up the scale of the merge. However, this is not a determinist method and it may lead to mismatched cases, so a human inspection is strongly recommended. This is essentially the method I used in my 2018 paper on Gene Garfield's scholarly impact: https://link.springer.com/article/10.1007/s11192-017-2594-5
      I probably won't have time to revisit this until either Thanksgiving or Christmas, but in the meantime, you can try the following hybrid method, which requires a MySQL at localhost.
      Suppose you have a Scopus dataset A and a WoS dataset B.
      1. Load both A and B to the same project on the database through Data>Import/Export>Database
      2. Use References > resolve references
      3. Once it is completed, use the "Save Results" button to save the content to a file named citation.alias in the project folder for the merged data project.
      4. Here you may want to inspect the citespace.alias file manually in a text editor and make changes as you see fit. Each line contains two parts, separated by #. The first one is the primary alias, which is the one to retain. The second one is the variant that will be converted to the primary alias at the runtime in CiteSpace.
      5. Configure the project folder, which contains the citespace.alias file, and the data folder, which contains both A and B. Then press the GO! button.
      I will put this on my calendar to make a video on this during Thanksgiving/Christmas. Let me know if you have any feedback.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.