Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
medline_updates2017_from1052_to1071.tsv.gz | 2017-03-05 | 12.6 MB | |
medline_README.txt | 2017-03-05 | 2.0 kB | |
medline2017_base_and_updates_until_1051_part4.tsv.gz | 2017-03-04 | 66.1 MB | |
medline2017_base_and_updates_until_1051_part3.tsv.gz | 2017-03-04 | 170.8 MB | |
medline2017_base_and_updates_until_1051_part2.tsv.gz | 2017-03-04 | 167.8 MB | |
medline2017_base_and_updates_until_1051_part1.tsv.gz | 2017-03-04 | 163.8 MB | |
medline_updates2017_from0893_to1051.tsv.gz | 2017-03-01 | 37.4 MB | |
Totals: 7 Items | 618.5 MB | 0 |
This folder contains gene mentions detected by GNAT in Medline abstracts. The file name of each archive should indicate which chunk of Medline, obtained from either ftp://ftp.ncbi.nlm.nih.gov/pubmed/baseline/ or ftp://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/ was analyzed. The command which we ran is like follows: for f in /myfolder/pubmed/baseline/medline17n*.xml.gz ; do \ bash scripts/annotateMedline_toTSV.sh $f --outfile $f.tsv ; \ done You can find that script inside the GNAT distribution; it wraps a call to the class gnat.client.AnnotateMedline_TsvOutput. Format of results: #PMID Start Stop Mention Tool CandidateIds FinalId GeneSymbol Species Confidence 249317 65 67 CEN GNAT \N 1068 CETN1 9606 1.0 1038438 64 66 GSA GNAT \N 2778 GNAS 9606 1.0 1039012 40 43 ERIC GNATGM 104355217;10460 \N \N \N \N Start/stop: character offset in text. 0 refers to the first character. For Medline citations, we construct one chunk of text to analyze by GNAT per citation, as follows: 1) concat title + " " + abstract 2) if the title does not end in a punctuation mark . ! ? ; : then add a period to the end of a title 3) if the title is enclosed in square brackets (indicates that the original paper was published in a non-English journal), remove them Mention: copy of the snippet detected as a gene name GeneSymbol: official gene symbol (Entrez/HUGO) Candidate IDs: IDs of all genes that could potentially have the given 'Mention' as their name FinalID: if GNAT was able to decide on one ID among all candidates, that ID is listed here Species: 9606 for human. We ran GNAT on human genes names only; if this column is empty (NULL, \N), that means that while the mention matches a human gene name, GNAT found indications that the article talks about the gene in another species. Therefore, the gene ID columns will also be empty, since they would refer to human gene IDs. Tool: - R=GNATGM=NER, GNAT was not able to decide on final ID, only candidate IDs are given - N=GNAT=NEI, GNAT was able to decide on a final ID