Dr. Chen,
I suppose this topic has been explained before but I haven't find it.
Is there any possibility to work simultaneously with records from WoS and Scopus?
I have seen that records from Scopus must be processed in CiteSpace to be converted to a WoS format.
Can these converted record be used at the same time with other WoS records?
Thanks
Miquel
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I suppose you can do so by putting both WoS and converted Scopus file in the data folder of .citespace (your input data folder for the current project).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It seems that works, but I would like to know the opinion of Dr. Chen, if he recommends this option or if is it necessary to do something more before merge wos and scopus records
Thanks
Miquel
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Any recommendations for better deduplication here? Scopus is all uppercase, and WoS is not, leading to duplicates staying in the final dataset. Example:
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
An option is to build an alias field in CiteSpace where you can specify which duplicates merge . But this is easy to do if you have only a few cases, but if you have a lot of cases, then it is a lot of work. May be Dr Chen can suggest a better idea.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, the idea is of course that this could be done algorithmically. E.g. Endnote does a pretty good job with this, but you can't export the deduped records in a bibliometric format. The most sophisticated genuinely bibliometric solution to the deduping nightmare I know is Vantagepoint. But that's a commercial solution. It's of course totally scandalous that the providers of the (very expensive) databases (especially Elsevier and Clarivate) get away with such lousy practices - not providing DOIs (which would make deduping a breeze ), the even poor INTERNAL consistency and quality of these datasets (titles sometime uppercase, sometimes now, inconsistent rules for first names etc. ) Anyway, thing ARE starting to look up with open abstracts and open metadata, but for the time being deduplication remains a massive pain in the rear end. But I am pretty confident Chaomei can 'fix' at least the matching algorithm to ignore the case. And I really wish that as a community we could then also come up with better selection algos - e.g. for abstracts, first names, etc. select the version with the most chars etc. AND also the ability to pull in metadata from crossref and other emerging solutions.
Having said that, I realize that Chaomei has a lot of other items on his to-do list. And that this really an 'edge' case for people who are trying to take the recall issue seriously and therefore want to go the extra mile to get the full scholarly record into CiteSpace "No pub left behind!" :) But hopefully, at SOME point...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In general, integrating bibliographic records from distinct sources needs to match records at two levels: at the article level and at the cited reference level.
At the article level, if DOIs are available, then it would be straightforward. If not, then sampling from different bits and pieces of a record is a good choice.
At the cited reference level, things can differ wildly and the same reference could evolve beyond the recognition and the most creative imagination.
If you are dealing with a small number of references, the easiest solution is to use the alias method in CiteSpace. You specify variants of the same reference one by one for CiteSpace to treat them as the same reference.
Currently, CiteSpace provides another function to scale up the scale of the merge. However, this is not a determinist method and it may lead to mismatched cases, so a human inspection is strongly recommended. This is essentially the method I used in my 2018 paper on Gene Garfield's scholarly impact: https://link.springer.com/article/10.1007/s11192-017-2594-5
I probably won't have time to revisit this until either Thanksgiving or Christmas, but in the meantime, you can try the following hybrid method, which requires a MySQL at localhost.
Suppose you have a Scopus dataset A and a WoS dataset B.
1. Load both A and B to the same project on the database through Data>Import/Export>Database
2. Use References > resolve references
3. Once it is completed, use the "Save Results" button to save the content to a file named citation.alias in the project folder for the merged data project.
4. Here you may want to inspect the citespace.alias file manually in a text editor and make changes as you see fit. Each line contains two parts, separated by #. The first one is the primary alias, which is the one to retain. The second one is the variant that will be converted to the primary alias at the runtime in CiteSpace.
5. Configure the project folder, which contains the citespace.alias file, and the data folder, which contains both A and B. Then press the GO! button.
I will put this on my calendar to make a video on this during Thanksgiving/Christmas. Let me know if you have any feedback.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Dr. Chen,
I suppose this topic has been explained before but I haven't find it.
Is there any possibility to work simultaneously with records from WoS and Scopus?
I have seen that records from Scopus must be processed in CiteSpace to be converted to a WoS format.
Can these converted record be used at the same time with other WoS records?
Thanks
Miquel
I suppose you can do so by putting both WoS and converted Scopus file in the data folder of .citespace (your input data folder for the current project).
I will try to do a proof. Thanks!
Miquel
It seems that works, but I would like to know the opinion of Dr. Chen, if he recommends this option or if is it necessary to do something more before merge wos and scopus records
Thanks
Miquel
Any recommendations for better deduplication here? Scopus is all uppercase, and WoS is not, leading to duplicates staying in the final dataset. Example:
An option is to build an alias field in CiteSpace where you can specify which duplicates merge . But this is easy to do if you have only a few cases, but if you have a lot of cases, then it is a lot of work. May be Dr Chen can suggest a better idea.
Yes, the idea is of course that this could be done algorithmically. E.g. Endnote does a pretty good job with this, but you can't export the deduped records in a bibliometric format. The most sophisticated genuinely bibliometric solution to the deduping nightmare I know is Vantagepoint. But that's a commercial solution. It's of course totally scandalous that the providers of the (very expensive) databases (especially Elsevier and Clarivate) get away with such lousy practices - not providing DOIs (which would make deduping a breeze ), the even poor INTERNAL consistency and quality of these datasets (titles sometime uppercase, sometimes now, inconsistent rules for first names etc. ) Anyway, thing ARE starting to look up with open abstracts and open metadata, but for the time being deduplication remains a massive pain in the rear end. But I am pretty confident Chaomei can 'fix' at least the matching algorithm to ignore the case. And I really wish that as a community we could then also come up with better selection algos - e.g. for abstracts, first names, etc. select the version with the most chars etc. AND also the ability to pull in metadata from crossref and other emerging solutions.
Having said that, I realize that Chaomei has a lot of other items on his to-do list. And that this really an 'edge' case for people who are trying to take the recall issue seriously and therefore want to go the extra mile to get the full scholarly record into CiteSpace "No pub left behind!" :) But hopefully, at SOME point...
In general, integrating bibliographic records from distinct sources needs to match records at two levels: at the article level and at the cited reference level.
At the article level, if DOIs are available, then it would be straightforward. If not, then sampling from different bits and pieces of a record is a good choice.
At the cited reference level, things can differ wildly and the same reference could evolve beyond the recognition and the most creative imagination.
If you are dealing with a small number of references, the easiest solution is to use the alias method in CiteSpace. You specify variants of the same reference one by one for CiteSpace to treat them as the same reference.
Currently, CiteSpace provides another function to scale up the scale of the merge. However, this is not a determinist method and it may lead to mismatched cases, so a human inspection is strongly recommended. This is essentially the method I used in my 2018 paper on Gene Garfield's scholarly impact: https://link.springer.com/article/10.1007/s11192-017-2594-5
I probably won't have time to revisit this until either Thanksgiving or Christmas, but in the meantime, you can try the following hybrid method, which requires a MySQL at localhost.
Suppose you have a Scopus dataset A and a WoS dataset B.
1. Load both A and B to the same project on the database through Data>Import/Export>Database
2. Use References > resolve references
3. Once it is completed, use the "Save Results" button to save the content to a file named citation.alias in the project folder for the merged data project.
4. Here you may want to inspect the citespace.alias file manually in a text editor and make changes as you see fit. Each line contains two parts, separated by #. The first one is the primary alias, which is the one to retain. The second one is the variant that will be converted to the primary alias at the runtime in CiteSpace.
5. Configure the project folder, which contains the citespace.alias file, and the data folder, which contains both A and B. Then press the GO! button.
I will put this on my calendar to make a video on this during Thanksgiving/Christmas. Let me know if you have any feedback.