One of the things I like about CiteSpace is that it goes beyond 'just' bibliometric analysis by also allowing us to do some analysis of fulltext files. What confuses me a bit,however, is the location of these functionalities within the program. So even without building any networks , there is a 'Text' option in the main interface . SOME of these functionalities allow us to select a single file or multiple text files from one or more folders of files and to work with those text files (e.g. extract terms) . OTHER of these functionalities use the folder from project properties to find already extracted terms and then work with those. But these projects require files in WoS format, and so will not work on, for instance, just one large text file.
But so my question is whether it is it possible, for instance, to generate n-grams from just one text file or a set of text-files.
Last edit: Stephan De Spiegeleire 2021-01-07
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I may not have explained this intelligibly enough. So here's a 'second take' :)
Whenever we use WoS files, we do get great terms through the 'List Ranked Terms' option , as you can see in the project panes on the left (and especially bottom left) of my screenshot. And these are bigrams (like Cold War) and trigrams (like nuclear deterrence morality) I'm not quite sure why no unigrams are included.
But whenever we use 'Extract Term from a FullText', we only get unigrams. Is there a setting somewhere that we are overlooking? Thanks...
Last edit: Stephan De Spiegeleire 2021-01-19
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Update - writing it up like this gave me an idea that I checked for why no unigrams in the networks, and I guess this is the answer to my own question .
But that still doesn't answer the question about how specify the minimum and maximum ngrams in the "Extract Terms from a FullText file"-option
Last edit: Stephan De Spiegeleire 2021-01-19
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
One of the things I like about CiteSpace is that it goes beyond 'just' bibliometric analysis by also allowing us to do some analysis of fulltext files. What confuses me a bit,however, is the location of these functionalities within the program. So even without building any networks , there is a 'Text' option in the main interface
. SOME of these functionalities allow us to select a single file or multiple text files from one or more folders of files and to work with those text files (e.g. extract terms) . OTHER of these functionalities use the folder from project properties to find already extracted terms and then work with those. But these projects require files in WoS format, and so will not work on, for instance, just one large text file.
But so my question is whether it is it possible, for instance, to generate n-grams from just one text file or a set of text-files.
Last edit: Stephan De Spiegeleire 2021-01-07
I may not have explained this intelligibly enough. So here's a 'second take' :)
Whenever we use WoS files, we do get great terms through the 'List Ranked Terms' option , as you can see in the project panes on the left (and especially bottom left) of my screenshot. And these are bigrams (like Cold War) and trigrams (like nuclear deterrence morality) I'm not quite sure why no unigrams are included.
But whenever we use 'Extract Term from a FullText', we only get unigrams. Is there a setting somewhere that we are overlooking? Thanks...
Last edit: Stephan De Spiegeleire 2021-01-19
Update - writing it up like this gave me an idea that I checked for why no unigrams in the networks, and I guess this is the answer to my own question
.
But that still doesn't answer the question about how specify the minimum and maximum ngrams in the "Extract Terms from a FullText file"-option
Last edit: Stephan De Spiegeleire 2021-01-19