I'm trying to learn how to use Galago, and I stumbed upon an issue that I cannot seem to solve on my own. I have a file with several trectext formatted documents
<DOC>
<DOCNO>ug7v899j</DOCNO>
<TITLE>Clinical features of culture-proven Mycoplasma pneumoniae infections at King Abdulaziz University Hospital, Jeddah, Saudi Arabia</TITLE>
<TEXT>OBJECTIVE: This retrospective chart review describes the epidemiology and clinical features of 40 patients with culture-proven Mycoplasma pneumoniae infections at King Abdulaziz University Hospital, Jeddah, Saudi Arabia. METHODS: Patients with positive M. pneumoniae cultures from respiratory specimens from January 1997 through December 1998 were identified through the Microbiology records. Charts of patients were reviewed. RESULTS: 40 patients were identified, 33 (82.5%) of whom required admission. Most infections (92.5%) were community-acquired. The infection affected all age groups but was most common in infants (32.5%) and pre-school children (22.5%). It occurred year-round but was most common in the fall (35%) and spring (30%). More than three-quarters of patients (77.5%) had comorbidities. Twenty-four isolates (60%) were associated with pneumonia, 14 (35%) with upper respiratory tract infections, and 2 (5%) with bronchiolitis. Cough (82.5%), fever (75%), and malaise (58.8%) were the most common symptoms, and crepitations (60%), and wheezes (40%) were the most common signs. Most patients with pneumonia had crepitations (79.2%) but only 25% had bronchial breathing. Immunocompromised patients were more likely than non-immunocompromised patients to present with pneumonia (8/9 versus 16/31, P = 0.05). Of the 24 patients with pneumonia, 14 (58.3%) had uneventful recovery, 4 (16.7%) recovered following some complications, 3 (12.5%) died because of M pneumoniae infection, and 3 (12.5%) died due to underlying comorbidities. The 3 patients who died of M pneumoniae pneumonia had other comorbidities. CONCLUSION: our results were similar to published data except for the finding that infections were more common in infants and preschool children and that the mortality rate of pneumonia in patients with comorbidities was high.</TEXT>
</DOC>
... several other documents go on on the same file
I ran the following
$ galago-3.19-bin/bin/galago build --inputPath=title-and-abstracts.gz --indexPath=./index --fileType=trectext --tokenizer/fields+TITLE --tokenizer/fields+TEXT --tokenizer/fields+DOCNO
The output of galago is:
Created executor: org.lemurproject.galago.tupleflow.execution.LocalCheckpointedStageExecutor@662f5666
Running without server!
Use --server=true to enable web-based status page.
/home/user/galago-experiments/title-and-abstracts.gz detected as trectext
Stage inputSplit completed with 0 errors.
May 26, 2021 6:22:04 PM org.lemurproject.galago.core.parse.UniversalParser process
INFO: Processing split: /home/user/galago-experiments/title-and-abstracts.gz with: org.lemurproject.galago.core.parse.TrecTextParser
May 26, 2021 6:22:09 PM org.lemurproject.galago.core.parse.UniversalParser process
INFO: Processed 0 total in split: /home/user/galago-experiments/title-and-abstracts.gz with class org.lemurproject.galago.core.parse.TrecTextParser
Stage parsePostings completed with 0 errors.
Stage writeExtentPostings completed with 0 errors.
Stage writeExtentPostings-krovetz completed with 0 errors.
Stage writeFields completed with 0 errors.
Stage writeExtents completed with 0 errors.
Stage writeNames completed with 0 errors.
Stage writeCorpusKeys completed with 0 errors.
Stage writeLengths completed with 0 errors.
Stage writePostings completed with 0 errors.
Stage writePostings-krovetz completed with 0 errors.
Stage writeNamesRev completed with 0 errors.
Done Indexing.
- 0.00 Hours
- 0.14 Minutes
- 8.48 Seconds
Documents Indexed: 0.
No matter what I do, I always get 0 documents indexed. I've tried supplying a single file with a single document, I tried with and without the --fileType flag, with the compressed file and the uncompressed, but always get 0 documents indexed. Can anyone point me in the right direction to index my collection? Thanks.