refdb-devel Mailing List for RefDB (Page 2)
Status: Beta
Brought to you by:
mhoenicka
You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(14) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
|
Feb
|
Mar
|
Apr
(8) |
May
(1) |
Jun
(1) |
Jul
(1) |
Aug
(2) |
Sep
(1) |
Oct
|
Nov
|
Dec
(1) |
2003 |
Jan
|
Feb
(1) |
Mar
(5) |
Apr
(6) |
May
(6) |
Jun
(4) |
Jul
(11) |
Aug
|
Sep
(3) |
Oct
|
Nov
|
Dec
(174) |
2004 |
Jan
(10) |
Feb
(2) |
Mar
|
Apr
|
May
(2) |
Jun
(2) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2005 |
Jan
(2) |
Feb
(6) |
Mar
(11) |
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
(25) |
Oct
(18) |
Nov
(16) |
Dec
(19) |
2006 |
Jan
(6) |
Feb
|
Mar
|
Apr
(21) |
May
(9) |
Jun
(5) |
Jul
(51) |
Aug
(89) |
Sep
(42) |
Oct
(19) |
Nov
(47) |
Dec
(4) |
2007 |
Jan
(8) |
Feb
(1) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(4) |
Aug
(4) |
Sep
(5) |
Oct
|
Nov
(7) |
Dec
(4) |
2008 |
Jan
|
Feb
|
Mar
|
Apr
(14) |
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(2) |
2009 |
Jan
|
Feb
(21) |
Mar
(8) |
Apr
(5) |
May
(6) |
Jun
(2) |
Jul
(5) |
Aug
|
Sep
(3) |
Oct
(14) |
Nov
|
Dec
|
2010 |
Jan
(18) |
Feb
(5) |
Mar
|
Apr
|
May
(4) |
Jun
(3) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2011 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(4) |
Oct
|
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(9) |
Nov
|
Dec
|
From: SourceForge.net <no...@so...> - 2010-02-04 23:54:12
|
Bugs item #2945806, was opened at 2010-02-04 09:48 Message generated for change (Comment added) made by mhoenicka You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2945806&group_id=26091 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Torsten Bronger (bronger) Assigned to: Nobody/Anonymous (nobody) Summary: RISX of a conference entry should use "periodical name" Initial Comment: Currently the T2 field is used as publication title in RISX exports of CONF entries. According to Refman specs, this is the conference name, however, the proceedings title seems to be more appropriate which is in the JO/JF field. ---------------------------------------------------------------------- >Comment By: Markus Hoenicka (mhoenicka) Date: 2010-02-05 00:54 Message: I seem to be a bit dense here. Could you please provide an example RIS input file, along with the current risx output and the risx output that you'd prefer? I'll be happy to look into this. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2945806&group_id=26091 |
From: SourceForge.net <no...@so...> - 2010-02-04 08:48:23
|
Bugs item #2945806, was opened at 2010-02-04 09:48 Message generated for change (Tracker Item Submitted) made by bronger You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2945806&group_id=26091 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Torsten Bronger (bronger) Assigned to: Nobody/Anonymous (nobody) Summary: RISX of a conference entry should use "periodical name" Initial Comment: Currently the T2 field is used as publication title in RISX exports of CONF entries. According to Refman specs, this is the conference name, however, the proceedings title seems to be more appropriate which is in the JO/JF field. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2945806&group_id=26091 |
From: SourceForge.net <no...@so...> - 2010-01-30 09:02:10
|
Bugs item #2935197, was opened at 2010-01-19 19:59 Message generated for change (Comment added) made by akusmin You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2935197&group_id=26091 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: refdbd Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: akusmin (akusmin) Assigned to: Markus Hoenicka (mhoenicka) Summary: AU field: Once capitalized, always capitalized Initial Comment: I add citations to my RefDB databases using the command "addref foo.ris " where foo.ris is the file containing citations. When in one file there are two citations sharing the same author, and if in the first such citation the author name is capitalized (FOONAME,F.B while in the second it is not (Fooname,F.B) then, after citations are added, the command getref returns the *capitalized* author name for both citations. Moreover, if there are three citations sharing the same author, the first citation contains AU field Fooname,Frank B. the second FOONAME,F.B and the third Fooname,F.B, then, after adding references from this file, again, the command getref returns FOONAME,F.B. Why I am complaining: I get citations from ISI Web of Science, as you know, until 1995/1996 Author names and Titles are capitalized. Thus, some of citations have capitalized names and some not. When I make bibliography for my LaTeX document, sometimes I see capitalized author names. May be this is already fixed in the SVN version? could you please check? Thank you. P.S. Attached is the zip file with three ris files. In ex1.ris the citation with AU Stuhrmann, H.B, the second file contains one more citation (the very first) where AU has STUHRMANN,H.B, and the third files contains three citations, where the 1st has STUHRMANN,Heinrich.B. ---------------------------------------------------------------------- Comment By: akusmin (akusmin) Date: 2010-01-30 09:02 Message: I've just uploaded a zip file with samples and some extra information. ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-25 21:06 Message: This depends on the size of your data. The mailing list has an attachment size limit so I wouldn't send large files this way. Either use the file uploader in this bug tracker, or send the data to me off-list. Take your time if you think your data will be more representative in a couple of weeks. ---------------------------------------------------------------------- Comment By: akusmin (akusmin) Date: 2010-01-25 19:55 Message: ** That is, your suggested fix would catch only about 15% of the relevant cases, while not fixing the problem for the vast majority of cases ** Good point. OK, I give up. A filter is the best way. Concerning ISI input data: I created a more or less representative sample of citations from ISI (in my research field: physical chemistry and physics ). What's the best way to send you this file ? Mailing list? Of course I continue to add new citations, so in the next several weeks I'll have more citations for another sample file. ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-24 12:18 Message: ***The only problem is, of course, what if in the database there is only one citation with AU field FOONAME, we make a bibliography, and oops, we will have to correct the capitalization by hand.*** I've checked my database at work as a real-world example. It contains almost 1900 references and 6665 authors. Of these, 5560 (roughly 85%) occur only in a single reference. That is, your suggested fix would catch only about 15% of the relevant cases, while not fixing the problem for the vast majority of cases. I still maintain that the proper approach is to clean up the input files. I'll be happy to develop a script for this particular case, either as an ISI import filter, or as a general-purpose post-processor of RIS or risx data. I'd appreciate if you could send me representative samples of ISI data. ---------------------------------------------------------------------- Comment By: akusmin (akusmin) Date: 2010-01-23 18:00 Message: 1) *** Now assume that in a twist of fate you first add the all-caps version of the author and then the properly capitalized one. *** I thought about it; I just did not mention how to take it into account. Besides, it seems I was not clear. Inside a refdbd database, every citation will have two fields, one is AU field, another is capitalized Au field, let's call it AU-CAP. Suppose the first citation having an author Fooname had an AU field FOONAME. Then both AU and AU-CAP fields are set to FOONAME. However, if at some point a citation added where an AU field for Fooname is a properly capitalized, then AU fields for all citations with AU-CAP fields equal to FOONAME are updated with a new AU field which is Fooname. The only problem is, of course, what if in the database there is only one citation with AU field FOONAME, we make a bibliography, and oops, we will have to correct the capitalization by hand. 2) **You may have mistaken my suggestion about an input filter. I'm not asking every user to develop such scripts. *** No, I understood that you meant a script which will be included in RefDB sources. And I agree, this is a good idea. However, then, there will be one filter for Pubmed, another for ISI, one for something else etc. I agree, one can automatically detect the format (ISI, Pubmed etc) and invoke а suitable script. If the only problem is capitalization, this can be solved in a way (for example) similar to what I wrote above. By the way, you may know that ISI uses a bunch of non standard CY tags, e.g. "SO" means source, it's like JF but not always; CY means conference year etc. In principle, a script that converts ISI citations to RIS would be a nice add-on for RefDB users. I wrote such a script in shell + awk (pretty naive), but I guess you could write a better script using Perl. OK, here is the summary: I think in general, a scipt for ISI, script for PUBMED etc is the probably the best way because there are probably database-related differences other than capitalization. But capitalization problem could also be solved by changing the internal mechanisms of refdbd, and this method is source-of-citations-independent. ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-22 00:38 Message: Now assume that in a twist of fate you first add the all-caps version of the author and then the properly capitalized one. If refdbd was changed in the way you suggest, you'd end up having all appearances of this author in all-caps. I'm afraid there's no way around manual intervention. You may have mistaken my suggestion about an input filter. I'm not asking every user to develop such scripts. I was rather thinking about including such a script in the RefDB sources. It is fairly easy to automatically invoke import filters before adding your data. I personally use Makefiles to deal with Pubmed data. "make" converts existing Pubmed data to a ris file. "make edit" allows to enter the reprint status, path to an offprint etc in an editor. "make install" finally adds the data to the database. Once set up properly, you don't even have to know the names of your input filters. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2010-01-21 20:09 Message: Regarding the two drawbacks: 1) So far I dealt with this drawback by looking for a particular citation, fixing author names in this citation, and using updateref. Of course, in the long run, it is not as efficient as de-capitalizing all author names by a script prior using addref <filename> 2) I thougt that one could use as author name a non-capitalized name. Example: Suppose there is only one citation where author is Fooname,K. Associated with this citation is another field: capitalized author name. We use addref to add another citation , where the name is capitalized, FOONAME,K. In the process of adding this citation, its capitalized name is compared with available capitalize names, after the match is found, for the new citation the non-capitalized name is set to be the same as in the first citation. It seems to me that similar method is already used in RefDB for some other field (but I am not sure). Thus, the argument about internal normalization does not look convincing to me. *** Wouldn't it be more straightforward to develop a Perl script which touches up older data from ISI Web? *** Yes, it would. In fact, I have an awk script which does this for AU and JF fields. Of course, it is easy to write such a script. I am not arguing for the sake of argument; I am trying to express the point of view of some what lazy "average user", who can't write a Perl/Python/awk etc script, who just wants to add citations from some source and wants various fields to be decapitalized. I agree with Unix philosophy "One job - one tool", according to this philosophy, may be it is better to have some script, which is applied on a file with citations before this file is added by addref. However, maybe it is not too bad to use some mechanism like described in 2) Anyway, don't take it too seriously. It is not a big issue. regards, André ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-20 13:18 Message: Please don't take me wrong, I'd be happy to change the code if this fixed your problem once and for all. However, the obvious fix has two drawbacks: 1) some of your author names will still be all uppercase, which is simply not desirable. If you create bibliographies from such entries, they'll look odd. 2) it breaks the database normalization as it will allow to have the same author be represented by two separate entries which differ only in case. That is, the fix deals with one problem, leaves another one untouched and creates a different one. This is why I'm reluctant here. Wouldn't it be more straightforward to develop a Perl script which touches up older data from ISI Web? We have similar tools, e.g. for the broken RIS that EndNote exports. Feel free to post a bunch of sample reference data to check whether it is worth a try. ---------------------------------------------------------------------- Comment By: akusmin (akusmin) Date: 2010-01-20 11:58 Message: 1) Regarding example3.ris : I see your point and I agree. 2) Regarding capitalized name overriding the non-capitalized one. *** I still think it is wisest to clean up your input data, the additional effort notwithstanding.*** Again; I see your point, and I was thinking that I could that with all my *old* files with citations using awk/ sed and grep . (For *new* files I don't have this problem because when I add citation from ISI Web , I decapitalize author names). However, in general, if this problem can be fixed with a relatively small effort and without making addref significantly slow, it should be fixed. It is not a big issue, sure, may be we should call it a "feature request". Anyway: I use RefDB since 2006 and I like it. Thank you for your work! ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-19 22:57 Message: example3.ris is misleading in your case because "Stuhrmann,H.B." and "Stuhrmann,Heinrich B." must be treated as separate entries by RefDB because only a human being can decide whether these are the same persons, regardless of capitalization. If you expect the database to treat these strings as the same author, you'd have to maintain your input files accordingly and settle on one version, preferably the one with the first name spelled out. This leaves the problem of "STUHRMANN,H.B." apparently outsmarting "Stuhrmann.H.B.". When adding a reference entry, refdbd tries to find existing authors in the database. If the author already exists, refdbd simply sets a "link" to that author. Only if it doesn't find an existing author, a new author entry is created. Testing for duplicates is done by a SQL expression using the "=" operator. Apparently some database engines treat this as a case-insensitive string comparison. I can reproduce your results with MySQL, but not with PostgreSQL. I have to admit that I've never figured this might pose a problem. I still think it is wisest to clean up your input data, the additional effort notwithstanding. In order to fix this problem programmatically, I'd have to replace the "=" comparison with something that causes a case-sensitive comparison on all database engines. regards, Markus ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2935197&group_id=26091 |
From: SourceForge.net <no...@so...> - 2010-01-25 21:06:17
|
Bugs item #2935197, was opened at 2010-01-19 20:59 Message generated for change (Comment added) made by mhoenicka You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2935197&group_id=26091 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: refdbd Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: akusmin (akusmin) Assigned to: Markus Hoenicka (mhoenicka) Summary: AU field: Once capitalized, always capitalized Initial Comment: I add citations to my RefDB databases using the command "addref foo.ris " where foo.ris is the file containing citations. When in one file there are two citations sharing the same author, and if in the first such citation the author name is capitalized (FOONAME,F.B while in the second it is not (Fooname,F.B) then, after citations are added, the command getref returns the *capitalized* author name for both citations. Moreover, if there are three citations sharing the same author, the first citation contains AU field Fooname,Frank B. the second FOONAME,F.B and the third Fooname,F.B, then, after adding references from this file, again, the command getref returns FOONAME,F.B. Why I am complaining: I get citations from ISI Web of Science, as you know, until 1995/1996 Author names and Titles are capitalized. Thus, some of citations have capitalized names and some not. When I make bibliography for my LaTeX document, sometimes I see capitalized author names. May be this is already fixed in the SVN version? could you please check? Thank you. P.S. Attached is the zip file with three ris files. In ex1.ris the citation with AU Stuhrmann, H.B, the second file contains one more citation (the very first) where AU has STUHRMANN,H.B, and the third files contains three citations, where the 1st has STUHRMANN,Heinrich.B. ---------------------------------------------------------------------- >Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-25 22:06 Message: This depends on the size of your data. The mailing list has an attachment size limit so I wouldn't send large files this way. Either use the file uploader in this bug tracker, or send the data to me off-list. Take your time if you think your data will be more representative in a couple of weeks. ---------------------------------------------------------------------- Comment By: akusmin (akusmin) Date: 2010-01-25 20:55 Message: ** That is, your suggested fix would catch only about 15% of the relevant cases, while not fixing the problem for the vast majority of cases ** Good point. OK, I give up. A filter is the best way. Concerning ISI input data: I created a more or less representative sample of citations from ISI (in my research field: physical chemistry and physics ). What's the best way to send you this file ? Mailing list? Of course I continue to add new citations, so in the next several weeks I'll have more citations for another sample file. ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-24 13:18 Message: ***The only problem is, of course, what if in the database there is only one citation with AU field FOONAME, we make a bibliography, and oops, we will have to correct the capitalization by hand.*** I've checked my database at work as a real-world example. It contains almost 1900 references and 6665 authors. Of these, 5560 (roughly 85%) occur only in a single reference. That is, your suggested fix would catch only about 15% of the relevant cases, while not fixing the problem for the vast majority of cases. I still maintain that the proper approach is to clean up the input files. I'll be happy to develop a script for this particular case, either as an ISI import filter, or as a general-purpose post-processor of RIS or risx data. I'd appreciate if you could send me representative samples of ISI data. ---------------------------------------------------------------------- Comment By: akusmin (akusmin) Date: 2010-01-23 19:00 Message: 1) *** Now assume that in a twist of fate you first add the all-caps version of the author and then the properly capitalized one. *** I thought about it; I just did not mention how to take it into account. Besides, it seems I was not clear. Inside a refdbd database, every citation will have two fields, one is AU field, another is capitalized Au field, let's call it AU-CAP. Suppose the first citation having an author Fooname had an AU field FOONAME. Then both AU and AU-CAP fields are set to FOONAME. However, if at some point a citation added where an AU field for Fooname is a properly capitalized, then AU fields for all citations with AU-CAP fields equal to FOONAME are updated with a new AU field which is Fooname. The only problem is, of course, what if in the database there is only one citation with AU field FOONAME, we make a bibliography, and oops, we will have to correct the capitalization by hand. 2) **You may have mistaken my suggestion about an input filter. I'm not asking every user to develop such scripts. *** No, I understood that you meant a script which will be included in RefDB sources. And I agree, this is a good idea. However, then, there will be one filter for Pubmed, another for ISI, one for something else etc. I agree, one can automatically detect the format (ISI, Pubmed etc) and invoke а suitable script. If the only problem is capitalization, this can be solved in a way (for example) similar to what I wrote above. By the way, you may know that ISI uses a bunch of non standard CY tags, e.g. "SO" means source, it's like JF but not always; CY means conference year etc. In principle, a script that converts ISI citations to RIS would be a nice add-on for RefDB users. I wrote such a script in shell + awk (pretty naive), but I guess you could write a better script using Perl. OK, here is the summary: I think in general, a scipt for ISI, script for PUBMED etc is the probably the best way because there are probably database-related differences other than capitalization. But capitalization problem could also be solved by changing the internal mechanisms of refdbd, and this method is source-of-citations-independent. ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-22 01:38 Message: Now assume that in a twist of fate you first add the all-caps version of the author and then the properly capitalized one. If refdbd was changed in the way you suggest, you'd end up having all appearances of this author in all-caps. I'm afraid there's no way around manual intervention. You may have mistaken my suggestion about an input filter. I'm not asking every user to develop such scripts. I was rather thinking about including such a script in the RefDB sources. It is fairly easy to automatically invoke import filters before adding your data. I personally use Makefiles to deal with Pubmed data. "make" converts existing Pubmed data to a ris file. "make edit" allows to enter the reprint status, path to an offprint etc in an editor. "make install" finally adds the data to the database. Once set up properly, you don't even have to know the names of your input filters. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2010-01-21 21:09 Message: Regarding the two drawbacks: 1) So far I dealt with this drawback by looking for a particular citation, fixing author names in this citation, and using updateref. Of course, in the long run, it is not as efficient as de-capitalizing all author names by a script prior using addref <filename> 2) I thougt that one could use as author name a non-capitalized name. Example: Suppose there is only one citation where author is Fooname,K. Associated with this citation is another field: capitalized author name. We use addref to add another citation , where the name is capitalized, FOONAME,K. In the process of adding this citation, its capitalized name is compared with available capitalize names, after the match is found, for the new citation the non-capitalized name is set to be the same as in the first citation. It seems to me that similar method is already used in RefDB for some other field (but I am not sure). Thus, the argument about internal normalization does not look convincing to me. *** Wouldn't it be more straightforward to develop a Perl script which touches up older data from ISI Web? *** Yes, it would. In fact, I have an awk script which does this for AU and JF fields. Of course, it is easy to write such a script. I am not arguing for the sake of argument; I am trying to express the point of view of some what lazy "average user", who can't write a Perl/Python/awk etc script, who just wants to add citations from some source and wants various fields to be decapitalized. I agree with Unix philosophy "One job - one tool", according to this philosophy, may be it is better to have some script, which is applied on a file with citations before this file is added by addref. However, maybe it is not too bad to use some mechanism like described in 2) Anyway, don't take it too seriously. It is not a big issue. regards, André ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-20 14:18 Message: Please don't take me wrong, I'd be happy to change the code if this fixed your problem once and for all. However, the obvious fix has two drawbacks: 1) some of your author names will still be all uppercase, which is simply not desirable. If you create bibliographies from such entries, they'll look odd. 2) it breaks the database normalization as it will allow to have the same author be represented by two separate entries which differ only in case. That is, the fix deals with one problem, leaves another one untouched and creates a different one. This is why I'm reluctant here. Wouldn't it be more straightforward to develop a Perl script which touches up older data from ISI Web? We have similar tools, e.g. for the broken RIS that EndNote exports. Feel free to post a bunch of sample reference data to check whether it is worth a try. ---------------------------------------------------------------------- Comment By: akusmin (akusmin) Date: 2010-01-20 12:58 Message: 1) Regarding example3.ris : I see your point and I agree. 2) Regarding capitalized name overriding the non-capitalized one. *** I still think it is wisest to clean up your input data, the additional effort notwithstanding.*** Again; I see your point, and I was thinking that I could that with all my *old* files with citations using awk/ sed and grep . (For *new* files I don't have this problem because when I add citation from ISI Web , I decapitalize author names). However, in general, if this problem can be fixed with a relatively small effort and without making addref significantly slow, it should be fixed. It is not a big issue, sure, may be we should call it a "feature request". Anyway: I use RefDB since 2006 and I like it. Thank you for your work! ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-19 23:57 Message: example3.ris is misleading in your case because "Stuhrmann,H.B." and "Stuhrmann,Heinrich B." must be treated as separate entries by RefDB because only a human being can decide whether these are the same persons, regardless of capitalization. If you expect the database to treat these strings as the same author, you'd have to maintain your input files accordingly and settle on one version, preferably the one with the first name spelled out. This leaves the problem of "STUHRMANN,H.B." apparently outsmarting "Stuhrmann.H.B.". When adding a reference entry, refdbd tries to find existing authors in the database. If the author already exists, refdbd simply sets a "link" to that author. Only if it doesn't find an existing author, a new author entry is created. Testing for duplicates is done by a SQL expression using the "=" operator. Apparently some database engines treat this as a case-insensitive string comparison. I can reproduce your results with MySQL, but not with PostgreSQL. I have to admit that I've never figured this might pose a problem. I still think it is wisest to clean up your input data, the additional effort notwithstanding. In order to fix this problem programmatically, I'd have to replace the "=" comparison with something that causes a case-sensitive comparison on all database engines. regards, Markus ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2935197&group_id=26091 |
From: SourceForge.net <no...@so...> - 2010-01-25 19:55:21
|
Bugs item #2935197, was opened at 2010-01-19 19:59 Message generated for change (Comment added) made by akusmin You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2935197&group_id=26091 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: refdbd Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: akusmin (akusmin) Assigned to: Markus Hoenicka (mhoenicka) Summary: AU field: Once capitalized, always capitalized Initial Comment: I add citations to my RefDB databases using the command "addref foo.ris " where foo.ris is the file containing citations. When in one file there are two citations sharing the same author, and if in the first such citation the author name is capitalized (FOONAME,F.B while in the second it is not (Fooname,F.B) then, after citations are added, the command getref returns the *capitalized* author name for both citations. Moreover, if there are three citations sharing the same author, the first citation contains AU field Fooname,Frank B. the second FOONAME,F.B and the third Fooname,F.B, then, after adding references from this file, again, the command getref returns FOONAME,F.B. Why I am complaining: I get citations from ISI Web of Science, as you know, until 1995/1996 Author names and Titles are capitalized. Thus, some of citations have capitalized names and some not. When I make bibliography for my LaTeX document, sometimes I see capitalized author names. May be this is already fixed in the SVN version? could you please check? Thank you. P.S. Attached is the zip file with three ris files. In ex1.ris the citation with AU Stuhrmann, H.B, the second file contains one more citation (the very first) where AU has STUHRMANN,H.B, and the third files contains three citations, where the 1st has STUHRMANN,Heinrich.B. ---------------------------------------------------------------------- Comment By: akusmin (akusmin) Date: 2010-01-25 19:55 Message: ** That is, your suggested fix would catch only about 15% of the relevant cases, while not fixing the problem for the vast majority of cases ** Good point. OK, I give up. A filter is the best way. Concerning ISI input data: I created a more or less representative sample of citations from ISI (in my research field: physical chemistry and physics ). What's the best way to send you this file ? Mailing list? Of course I continue to add new citations, so in the next several weeks I'll have more citations for another sample file. ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-24 12:18 Message: ***The only problem is, of course, what if in the database there is only one citation with AU field FOONAME, we make a bibliography, and oops, we will have to correct the capitalization by hand.*** I've checked my database at work as a real-world example. It contains almost 1900 references and 6665 authors. Of these, 5560 (roughly 85%) occur only in a single reference. That is, your suggested fix would catch only about 15% of the relevant cases, while not fixing the problem for the vast majority of cases. I still maintain that the proper approach is to clean up the input files. I'll be happy to develop a script for this particular case, either as an ISI import filter, or as a general-purpose post-processor of RIS or risx data. I'd appreciate if you could send me representative samples of ISI data. ---------------------------------------------------------------------- Comment By: akusmin (akusmin) Date: 2010-01-23 18:00 Message: 1) *** Now assume that in a twist of fate you first add the all-caps version of the author and then the properly capitalized one. *** I thought about it; I just did not mention how to take it into account. Besides, it seems I was not clear. Inside a refdbd database, every citation will have two fields, one is AU field, another is capitalized Au field, let's call it AU-CAP. Suppose the first citation having an author Fooname had an AU field FOONAME. Then both AU and AU-CAP fields are set to FOONAME. However, if at some point a citation added where an AU field for Fooname is a properly capitalized, then AU fields for all citations with AU-CAP fields equal to FOONAME are updated with a new AU field which is Fooname. The only problem is, of course, what if in the database there is only one citation with AU field FOONAME, we make a bibliography, and oops, we will have to correct the capitalization by hand. 2) **You may have mistaken my suggestion about an input filter. I'm not asking every user to develop such scripts. *** No, I understood that you meant a script which will be included in RefDB sources. And I agree, this is a good idea. However, then, there will be one filter for Pubmed, another for ISI, one for something else etc. I agree, one can automatically detect the format (ISI, Pubmed etc) and invoke а suitable script. If the only problem is capitalization, this can be solved in a way (for example) similar to what I wrote above. By the way, you may know that ISI uses a bunch of non standard CY tags, e.g. "SO" means source, it's like JF but not always; CY means conference year etc. In principle, a script that converts ISI citations to RIS would be a nice add-on for RefDB users. I wrote such a script in shell + awk (pretty naive), but I guess you could write a better script using Perl. OK, here is the summary: I think in general, a scipt for ISI, script for PUBMED etc is the probably the best way because there are probably database-related differences other than capitalization. But capitalization problem could also be solved by changing the internal mechanisms of refdbd, and this method is source-of-citations-independent. ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-22 00:38 Message: Now assume that in a twist of fate you first add the all-caps version of the author and then the properly capitalized one. If refdbd was changed in the way you suggest, you'd end up having all appearances of this author in all-caps. I'm afraid there's no way around manual intervention. You may have mistaken my suggestion about an input filter. I'm not asking every user to develop such scripts. I was rather thinking about including such a script in the RefDB sources. It is fairly easy to automatically invoke import filters before adding your data. I personally use Makefiles to deal with Pubmed data. "make" converts existing Pubmed data to a ris file. "make edit" allows to enter the reprint status, path to an offprint etc in an editor. "make install" finally adds the data to the database. Once set up properly, you don't even have to know the names of your input filters. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2010-01-21 20:09 Message: Regarding the two drawbacks: 1) So far I dealt with this drawback by looking for a particular citation, fixing author names in this citation, and using updateref. Of course, in the long run, it is not as efficient as de-capitalizing all author names by a script prior using addref <filename> 2) I thougt that one could use as author name a non-capitalized name. Example: Suppose there is only one citation where author is Fooname,K. Associated with this citation is another field: capitalized author name. We use addref to add another citation , where the name is capitalized, FOONAME,K. In the process of adding this citation, its capitalized name is compared with available capitalize names, after the match is found, for the new citation the non-capitalized name is set to be the same as in the first citation. It seems to me that similar method is already used in RefDB for some other field (but I am not sure). Thus, the argument about internal normalization does not look convincing to me. *** Wouldn't it be more straightforward to develop a Perl script which touches up older data from ISI Web? *** Yes, it would. In fact, I have an awk script which does this for AU and JF fields. Of course, it is easy to write such a script. I am not arguing for the sake of argument; I am trying to express the point of view of some what lazy "average user", who can't write a Perl/Python/awk etc script, who just wants to add citations from some source and wants various fields to be decapitalized. I agree with Unix philosophy "One job - one tool", according to this philosophy, may be it is better to have some script, which is applied on a file with citations before this file is added by addref. However, maybe it is not too bad to use some mechanism like described in 2) Anyway, don't take it too seriously. It is not a big issue. regards, André ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-20 13:18 Message: Please don't take me wrong, I'd be happy to change the code if this fixed your problem once and for all. However, the obvious fix has two drawbacks: 1) some of your author names will still be all uppercase, which is simply not desirable. If you create bibliographies from such entries, they'll look odd. 2) it breaks the database normalization as it will allow to have the same author be represented by two separate entries which differ only in case. That is, the fix deals with one problem, leaves another one untouched and creates a different one. This is why I'm reluctant here. Wouldn't it be more straightforward to develop a Perl script which touches up older data from ISI Web? We have similar tools, e.g. for the broken RIS that EndNote exports. Feel free to post a bunch of sample reference data to check whether it is worth a try. ---------------------------------------------------------------------- Comment By: akusmin (akusmin) Date: 2010-01-20 11:58 Message: 1) Regarding example3.ris : I see your point and I agree. 2) Regarding capitalized name overriding the non-capitalized one. *** I still think it is wisest to clean up your input data, the additional effort notwithstanding.*** Again; I see your point, and I was thinking that I could that with all my *old* files with citations using awk/ sed and grep . (For *new* files I don't have this problem because when I add citation from ISI Web , I decapitalize author names). However, in general, if this problem can be fixed with a relatively small effort and without making addref significantly slow, it should be fixed. It is not a big issue, sure, may be we should call it a "feature request". Anyway: I use RefDB since 2006 and I like it. Thank you for your work! ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-19 22:57 Message: example3.ris is misleading in your case because "Stuhrmann,H.B." and "Stuhrmann,Heinrich B." must be treated as separate entries by RefDB because only a human being can decide whether these are the same persons, regardless of capitalization. If you expect the database to treat these strings as the same author, you'd have to maintain your input files accordingly and settle on one version, preferably the one with the first name spelled out. This leaves the problem of "STUHRMANN,H.B." apparently outsmarting "Stuhrmann.H.B.". When adding a reference entry, refdbd tries to find existing authors in the database. If the author already exists, refdbd simply sets a "link" to that author. Only if it doesn't find an existing author, a new author entry is created. Testing for duplicates is done by a SQL expression using the "=" operator. Apparently some database engines treat this as a case-insensitive string comparison. I can reproduce your results with MySQL, but not with PostgreSQL. I have to admit that I've never figured this might pose a problem. I still think it is wisest to clean up your input data, the additional effort notwithstanding. In order to fix this problem programmatically, I'd have to replace the "=" comparison with something that causes a case-sensitive comparison on all database engines. regards, Markus ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2935197&group_id=26091 |
From: SourceForge.net <no...@so...> - 2010-01-24 12:29:38
|
Bugs item #2935197, was opened at 2010-01-19 20:59 Message generated for change (Comment added) made by mhoenicka You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2935197&group_id=26091 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: refdbd Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: akusmin (akusmin) Assigned to: Markus Hoenicka (mhoenicka) Summary: AU field: Once capitalized, always capitalized Initial Comment: I add citations to my RefDB databases using the command "addref foo.ris " where foo.ris is the file containing citations. When in one file there are two citations sharing the same author, and if in the first such citation the author name is capitalized (FOONAME,F.B while in the second it is not (Fooname,F.B) then, after citations are added, the command getref returns the *capitalized* author name for both citations. Moreover, if there are three citations sharing the same author, the first citation contains AU field Fooname,Frank B. the second FOONAME,F.B and the third Fooname,F.B, then, after adding references from this file, again, the command getref returns FOONAME,F.B. Why I am complaining: I get citations from ISI Web of Science, as you know, until 1995/1996 Author names and Titles are capitalized. Thus, some of citations have capitalized names and some not. When I make bibliography for my LaTeX document, sometimes I see capitalized author names. May be this is already fixed in the SVN version? could you please check? Thank you. P.S. Attached is the zip file with three ris files. In ex1.ris the citation with AU Stuhrmann, H.B, the second file contains one more citation (the very first) where AU has STUHRMANN,H.B, and the third files contains three citations, where the 1st has STUHRMANN,Heinrich.B. ---------------------------------------------------------------------- >Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-24 13:18 Message: ***The only problem is, of course, what if in the database there is only one citation with AU field FOONAME, we make a bibliography, and oops, we will have to correct the capitalization by hand.*** I've checked my database at work as a real-world example. It contains almost 1900 references and 6665 authors. Of these, 5560 (roughly 85%) occur only in a single reference. That is, your suggested fix would catch only about 15% of the relevant cases, while not fixing the problem for the vast majority of cases. I still maintain that the proper approach is to clean up the input files. I'll be happy to develop a script for this particular case, either as an ISI import filter, or as a general-purpose post-processor of RIS or risx data. I'd appreciate if you could send me representative samples of ISI data. ---------------------------------------------------------------------- Comment By: akusmin (akusmin) Date: 2010-01-23 19:00 Message: 1) *** Now assume that in a twist of fate you first add the all-caps version of the author and then the properly capitalized one. *** I thought about it; I just did not mention how to take it into account. Besides, it seems I was not clear. Inside a refdbd database, every citation will have two fields, one is AU field, another is capitalized Au field, let's call it AU-CAP. Suppose the first citation having an author Fooname had an AU field FOONAME. Then both AU and AU-CAP fields are set to FOONAME. However, if at some point a citation added where an AU field for Fooname is a properly capitalized, then AU fields for all citations with AU-CAP fields equal to FOONAME are updated with a new AU field which is Fooname. The only problem is, of course, what if in the database there is only one citation with AU field FOONAME, we make a bibliography, and oops, we will have to correct the capitalization by hand. 2) **You may have mistaken my suggestion about an input filter. I'm not asking every user to develop such scripts. *** No, I understood that you meant a script which will be included in RefDB sources. And I agree, this is a good idea. However, then, there will be one filter for Pubmed, another for ISI, one for something else etc. I agree, one can automatically detect the format (ISI, Pubmed etc) and invoke а suitable script. If the only problem is capitalization, this can be solved in a way (for example) similar to what I wrote above. By the way, you may know that ISI uses a bunch of non standard CY tags, e.g. "SO" means source, it's like JF but not always; CY means conference year etc. In principle, a script that converts ISI citations to RIS would be a nice add-on for RefDB users. I wrote such a script in shell + awk (pretty naive), but I guess you could write a better script using Perl. OK, here is the summary: I think in general, a scipt for ISI, script for PUBMED etc is the probably the best way because there are probably database-related differences other than capitalization. But capitalization problem could also be solved by changing the internal mechanisms of refdbd, and this method is source-of-citations-independent. ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-22 01:38 Message: Now assume that in a twist of fate you first add the all-caps version of the author and then the properly capitalized one. If refdbd was changed in the way you suggest, you'd end up having all appearances of this author in all-caps. I'm afraid there's no way around manual intervention. You may have mistaken my suggestion about an input filter. I'm not asking every user to develop such scripts. I was rather thinking about including such a script in the RefDB sources. It is fairly easy to automatically invoke import filters before adding your data. I personally use Makefiles to deal with Pubmed data. "make" converts existing Pubmed data to a ris file. "make edit" allows to enter the reprint status, path to an offprint etc in an editor. "make install" finally adds the data to the database. Once set up properly, you don't even have to know the names of your input filters. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2010-01-21 21:09 Message: Regarding the two drawbacks: 1) So far I dealt with this drawback by looking for a particular citation, fixing author names in this citation, and using updateref. Of course, in the long run, it is not as efficient as de-capitalizing all author names by a script prior using addref <filename> 2) I thougt that one could use as author name a non-capitalized name. Example: Suppose there is only one citation where author is Fooname,K. Associated with this citation is another field: capitalized author name. We use addref to add another citation , where the name is capitalized, FOONAME,K. In the process of adding this citation, its capitalized name is compared with available capitalize names, after the match is found, for the new citation the non-capitalized name is set to be the same as in the first citation. It seems to me that similar method is already used in RefDB for some other field (but I am not sure). Thus, the argument about internal normalization does not look convincing to me. *** Wouldn't it be more straightforward to develop a Perl script which touches up older data from ISI Web? *** Yes, it would. In fact, I have an awk script which does this for AU and JF fields. Of course, it is easy to write such a script. I am not arguing for the sake of argument; I am trying to express the point of view of some what lazy "average user", who can't write a Perl/Python/awk etc script, who just wants to add citations from some source and wants various fields to be decapitalized. I agree with Unix philosophy "One job - one tool", according to this philosophy, may be it is better to have some script, which is applied on a file with citations before this file is added by addref. However, maybe it is not too bad to use some mechanism like described in 2) Anyway, don't take it too seriously. It is not a big issue. regards, André ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-20 14:18 Message: Please don't take me wrong, I'd be happy to change the code if this fixed your problem once and for all. However, the obvious fix has two drawbacks: 1) some of your author names will still be all uppercase, which is simply not desirable. If you create bibliographies from such entries, they'll look odd. 2) it breaks the database normalization as it will allow to have the same author be represented by two separate entries which differ only in case. That is, the fix deals with one problem, leaves another one untouched and creates a different one. This is why I'm reluctant here. Wouldn't it be more straightforward to develop a Perl script which touches up older data from ISI Web? We have similar tools, e.g. for the broken RIS that EndNote exports. Feel free to post a bunch of sample reference data to check whether it is worth a try. ---------------------------------------------------------------------- Comment By: akusmin (akusmin) Date: 2010-01-20 12:58 Message: 1) Regarding example3.ris : I see your point and I agree. 2) Regarding capitalized name overriding the non-capitalized one. *** I still think it is wisest to clean up your input data, the additional effort notwithstanding.*** Again; I see your point, and I was thinking that I could that with all my *old* files with citations using awk/ sed and grep . (For *new* files I don't have this problem because when I add citation from ISI Web , I decapitalize author names). However, in general, if this problem can be fixed with a relatively small effort and without making addref significantly slow, it should be fixed. It is not a big issue, sure, may be we should call it a "feature request". Anyway: I use RefDB since 2006 and I like it. Thank you for your work! ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-19 23:57 Message: example3.ris is misleading in your case because "Stuhrmann,H.B." and "Stuhrmann,Heinrich B." must be treated as separate entries by RefDB because only a human being can decide whether these are the same persons, regardless of capitalization. If you expect the database to treat these strings as the same author, you'd have to maintain your input files accordingly and settle on one version, preferably the one with the first name spelled out. This leaves the problem of "STUHRMANN,H.B." apparently outsmarting "Stuhrmann.H.B.". When adding a reference entry, refdbd tries to find existing authors in the database. If the author already exists, refdbd simply sets a "link" to that author. Only if it doesn't find an existing author, a new author entry is created. Testing for duplicates is done by a SQL expression using the "=" operator. Apparently some database engines treat this as a case-insensitive string comparison. I can reproduce your results with MySQL, but not with PostgreSQL. I have to admit that I've never figured this might pose a problem. I still think it is wisest to clean up your input data, the additional effort notwithstanding. In order to fix this problem programmatically, I'd have to replace the "=" comparison with something that causes a case-sensitive comparison on all database engines. regards, Markus ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2935197&group_id=26091 |
From: SourceForge.net <no...@so...> - 2010-01-23 18:00:21
|
Bugs item #2935197, was opened at 2010-01-19 19:59 Message generated for change (Comment added) made by akusmin You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2935197&group_id=26091 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: refdbd Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: akusmin (akusmin) Assigned to: Markus Hoenicka (mhoenicka) Summary: AU field: Once capitalized, always capitalized Initial Comment: I add citations to my RefDB databases using the command "addref foo.ris " where foo.ris is the file containing citations. When in one file there are two citations sharing the same author, and if in the first such citation the author name is capitalized (FOONAME,F.B while in the second it is not (Fooname,F.B) then, after citations are added, the command getref returns the *capitalized* author name for both citations. Moreover, if there are three citations sharing the same author, the first citation contains AU field Fooname,Frank B. the second FOONAME,F.B and the third Fooname,F.B, then, after adding references from this file, again, the command getref returns FOONAME,F.B. Why I am complaining: I get citations from ISI Web of Science, as you know, until 1995/1996 Author names and Titles are capitalized. Thus, some of citations have capitalized names and some not. When I make bibliography for my LaTeX document, sometimes I see capitalized author names. May be this is already fixed in the SVN version? could you please check? Thank you. P.S. Attached is the zip file with three ris files. In ex1.ris the citation with AU Stuhrmann, H.B, the second file contains one more citation (the very first) where AU has STUHRMANN,H.B, and the third files contains three citations, where the 1st has STUHRMANN,Heinrich.B. ---------------------------------------------------------------------- Comment By: akusmin (akusmin) Date: 2010-01-23 18:00 Message: 1) *** Now assume that in a twist of fate you first add the all-caps version of the author and then the properly capitalized one. *** I thought about it; I just did not mention how to take it into account. Besides, it seems I was not clear. Inside a refdbd database, every citation will have two fields, one is AU field, another is capitalized Au field, let's call it AU-CAP. Suppose the first citation having an author Fooname had an AU field FOONAME. Then both AU and AU-CAP fields are set to FOONAME. However, if at some point a citation added where an AU field for Fooname is a properly capitalized, then AU fields for all citations with AU-CAP fields equal to FOONAME are updated with a new AU field which is Fooname. The only problem is, of course, what if in the database there is only one citation with AU field FOONAME, we make a bibliography, and oops, we will have to correct the capitalization by hand. 2) **You may have mistaken my suggestion about an input filter. I'm not asking every user to develop such scripts. *** No, I understood that you meant a script which will be included in RefDB sources. And I agree, this is a good idea. However, then, there will be one filter for Pubmed, another for ISI, one for something else etc. I agree, one can automatically detect the format (ISI, Pubmed etc) and invoke а suitable script. If the only problem is capitalization, this can be solved in a way (for example) similar to what I wrote above. By the way, you may know that ISI uses a bunch of non standard CY tags, e.g. "SO" means source, it's like JF but not always; CY means conference year etc. In principle, a script that converts ISI citations to RIS would be a nice add-on for RefDB users. I wrote such a script in shell + awk (pretty naive), but I guess you could write a better script using Perl. OK, here is the summary: I think in general, a scipt for ISI, script for PUBMED etc is the probably the best way because there are probably database-related differences other than capitalization. But capitalization problem could also be solved by changing the internal mechanisms of refdbd, and this method is source-of-citations-independent. ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-22 00:38 Message: Now assume that in a twist of fate you first add the all-caps version of the author and then the properly capitalized one. If refdbd was changed in the way you suggest, you'd end up having all appearances of this author in all-caps. I'm afraid there's no way around manual intervention. You may have mistaken my suggestion about an input filter. I'm not asking every user to develop such scripts. I was rather thinking about including such a script in the RefDB sources. It is fairly easy to automatically invoke import filters before adding your data. I personally use Makefiles to deal with Pubmed data. "make" converts existing Pubmed data to a ris file. "make edit" allows to enter the reprint status, path to an offprint etc in an editor. "make install" finally adds the data to the database. Once set up properly, you don't even have to know the names of your input filters. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2010-01-21 20:09 Message: Regarding the two drawbacks: 1) So far I dealt with this drawback by looking for a particular citation, fixing author names in this citation, and using updateref. Of course, in the long run, it is not as efficient as de-capitalizing all author names by a script prior using addref <filename> 2) I thougt that one could use as author name a non-capitalized name. Example: Suppose there is only one citation where author is Fooname,K. Associated with this citation is another field: capitalized author name. We use addref to add another citation , where the name is capitalized, FOONAME,K. In the process of adding this citation, its capitalized name is compared with available capitalize names, after the match is found, for the new citation the non-capitalized name is set to be the same as in the first citation. It seems to me that similar method is already used in RefDB for some other field (but I am not sure). Thus, the argument about internal normalization does not look convincing to me. *** Wouldn't it be more straightforward to develop a Perl script which touches up older data from ISI Web? *** Yes, it would. In fact, I have an awk script which does this for AU and JF fields. Of course, it is easy to write such a script. I am not arguing for the sake of argument; I am trying to express the point of view of some what lazy "average user", who can't write a Perl/Python/awk etc script, who just wants to add citations from some source and wants various fields to be decapitalized. I agree with Unix philosophy "One job - one tool", according to this philosophy, may be it is better to have some script, which is applied on a file with citations before this file is added by addref. However, maybe it is not too bad to use some mechanism like described in 2) Anyway, don't take it too seriously. It is not a big issue. regards, André ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-20 13:18 Message: Please don't take me wrong, I'd be happy to change the code if this fixed your problem once and for all. However, the obvious fix has two drawbacks: 1) some of your author names will still be all uppercase, which is simply not desirable. If you create bibliographies from such entries, they'll look odd. 2) it breaks the database normalization as it will allow to have the same author be represented by two separate entries which differ only in case. That is, the fix deals with one problem, leaves another one untouched and creates a different one. This is why I'm reluctant here. Wouldn't it be more straightforward to develop a Perl script which touches up older data from ISI Web? We have similar tools, e.g. for the broken RIS that EndNote exports. Feel free to post a bunch of sample reference data to check whether it is worth a try. ---------------------------------------------------------------------- Comment By: akusmin (akusmin) Date: 2010-01-20 11:58 Message: 1) Regarding example3.ris : I see your point and I agree. 2) Regarding capitalized name overriding the non-capitalized one. *** I still think it is wisest to clean up your input data, the additional effort notwithstanding.*** Again; I see your point, and I was thinking that I could that with all my *old* files with citations using awk/ sed and grep . (For *new* files I don't have this problem because when I add citation from ISI Web , I decapitalize author names). However, in general, if this problem can be fixed with a relatively small effort and without making addref significantly slow, it should be fixed. It is not a big issue, sure, may be we should call it a "feature request". Anyway: I use RefDB since 2006 and I like it. Thank you for your work! ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-19 22:57 Message: example3.ris is misleading in your case because "Stuhrmann,H.B." and "Stuhrmann,Heinrich B." must be treated as separate entries by RefDB because only a human being can decide whether these are the same persons, regardless of capitalization. If you expect the database to treat these strings as the same author, you'd have to maintain your input files accordingly and settle on one version, preferably the one with the first name spelled out. This leaves the problem of "STUHRMANN,H.B." apparently outsmarting "Stuhrmann.H.B.". When adding a reference entry, refdbd tries to find existing authors in the database. If the author already exists, refdbd simply sets a "link" to that author. Only if it doesn't find an existing author, a new author entry is created. Testing for duplicates is done by a SQL expression using the "=" operator. Apparently some database engines treat this as a case-insensitive string comparison. I can reproduce your results with MySQL, but not with PostgreSQL. I have to admit that I've never figured this might pose a problem. I still think it is wisest to clean up your input data, the additional effort notwithstanding. In order to fix this problem programmatically, I'd have to replace the "=" comparison with something that causes a case-sensitive comparison on all database engines. regards, Markus ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2935197&group_id=26091 |
From: SourceForge.net <no...@so...> - 2010-01-22 01:06:13
|
Bugs item #2935197, was opened at 2010-01-19 20:59 Message generated for change (Comment added) made by mhoenicka You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2935197&group_id=26091 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: refdbd Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: akusmin (akusmin) Assigned to: Markus Hoenicka (mhoenicka) Summary: AU field: Once capitalized, always capitalized Initial Comment: I add citations to my RefDB databases using the command "addref foo.ris " where foo.ris is the file containing citations. When in one file there are two citations sharing the same author, and if in the first such citation the author name is capitalized (FOONAME,F.B while in the second it is not (Fooname,F.B) then, after citations are added, the command getref returns the *capitalized* author name for both citations. Moreover, if there are three citations sharing the same author, the first citation contains AU field Fooname,Frank B. the second FOONAME,F.B and the third Fooname,F.B, then, after adding references from this file, again, the command getref returns FOONAME,F.B. Why I am complaining: I get citations from ISI Web of Science, as you know, until 1995/1996 Author names and Titles are capitalized. Thus, some of citations have capitalized names and some not. When I make bibliography for my LaTeX document, sometimes I see capitalized author names. May be this is already fixed in the SVN version? could you please check? Thank you. P.S. Attached is the zip file with three ris files. In ex1.ris the citation with AU Stuhrmann, H.B, the second file contains one more citation (the very first) where AU has STUHRMANN,H.B, and the third files contains three citations, where the 1st has STUHRMANN,Heinrich.B. ---------------------------------------------------------------------- >Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-22 01:38 Message: Now assume that in a twist of fate you first add the all-caps version of the author and then the properly capitalized one. If refdbd was changed in the way you suggest, you'd end up having all appearances of this author in all-caps. I'm afraid there's no way around manual intervention. You may have mistaken my suggestion about an input filter. I'm not asking every user to develop such scripts. I was rather thinking about including such a script in the RefDB sources. It is fairly easy to automatically invoke import filters before adding your data. I personally use Makefiles to deal with Pubmed data. "make" converts existing Pubmed data to a ris file. "make edit" allows to enter the reprint status, path to an offprint etc in an editor. "make install" finally adds the data to the database. Once set up properly, you don't even have to know the names of your input filters. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2010-01-21 21:09 Message: Regarding the two drawbacks: 1) So far I dealt with this drawback by looking for a particular citation, fixing author names in this citation, and using updateref. Of course, in the long run, it is not as efficient as de-capitalizing all author names by a script prior using addref <filename> 2) I thougt that one could use as author name a non-capitalized name. Example: Suppose there is only one citation where author is Fooname,K. Associated with this citation is another field: capitalized author name. We use addref to add another citation , where the name is capitalized, FOONAME,K. In the process of adding this citation, its capitalized name is compared with available capitalize names, after the match is found, for the new citation the non-capitalized name is set to be the same as in the first citation. It seems to me that similar method is already used in RefDB for some other field (but I am not sure). Thus, the argument about internal normalization does not look convincing to me. *** Wouldn't it be more straightforward to develop a Perl script which touches up older data from ISI Web? *** Yes, it would. In fact, I have an awk script which does this for AU and JF fields. Of course, it is easy to write such a script. I am not arguing for the sake of argument; I am trying to express the point of view of some what lazy "average user", who can't write a Perl/Python/awk etc script, who just wants to add citations from some source and wants various fields to be decapitalized. I agree with Unix philosophy "One job - one tool", according to this philosophy, may be it is better to have some script, which is applied on a file with citations before this file is added by addref. However, maybe it is not too bad to use some mechanism like described in 2) Anyway, don't take it too seriously. It is not a big issue. regards, André ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-20 14:18 Message: Please don't take me wrong, I'd be happy to change the code if this fixed your problem once and for all. However, the obvious fix has two drawbacks: 1) some of your author names will still be all uppercase, which is simply not desirable. If you create bibliographies from such entries, they'll look odd. 2) it breaks the database normalization as it will allow to have the same author be represented by two separate entries which differ only in case. That is, the fix deals with one problem, leaves another one untouched and creates a different one. This is why I'm reluctant here. Wouldn't it be more straightforward to develop a Perl script which touches up older data from ISI Web? We have similar tools, e.g. for the broken RIS that EndNote exports. Feel free to post a bunch of sample reference data to check whether it is worth a try. ---------------------------------------------------------------------- Comment By: akusmin (akusmin) Date: 2010-01-20 12:58 Message: 1) Regarding example3.ris : I see your point and I agree. 2) Regarding capitalized name overriding the non-capitalized one. *** I still think it is wisest to clean up your input data, the additional effort notwithstanding.*** Again; I see your point, and I was thinking that I could that with all my *old* files with citations using awk/ sed and grep . (For *new* files I don't have this problem because when I add citation from ISI Web , I decapitalize author names). However, in general, if this problem can be fixed with a relatively small effort and without making addref significantly slow, it should be fixed. It is not a big issue, sure, may be we should call it a "feature request". Anyway: I use RefDB since 2006 and I like it. Thank you for your work! ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-19 23:57 Message: example3.ris is misleading in your case because "Stuhrmann,H.B." and "Stuhrmann,Heinrich B." must be treated as separate entries by RefDB because only a human being can decide whether these are the same persons, regardless of capitalization. If you expect the database to treat these strings as the same author, you'd have to maintain your input files accordingly and settle on one version, preferably the one with the first name spelled out. This leaves the problem of "STUHRMANN,H.B." apparently outsmarting "Stuhrmann.H.B.". When adding a reference entry, refdbd tries to find existing authors in the database. If the author already exists, refdbd simply sets a "link" to that author. Only if it doesn't find an existing author, a new author entry is created. Testing for duplicates is done by a SQL expression using the "=" operator. Apparently some database engines treat this as a case-insensitive string comparison. I can reproduce your results with MySQL, but not with PostgreSQL. I have to admit that I've never figured this might pose a problem. I still think it is wisest to clean up your input data, the additional effort notwithstanding. In order to fix this problem programmatically, I'd have to replace the "=" comparison with something that causes a case-sensitive comparison on all database engines. regards, Markus ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2935197&group_id=26091 |
From: SourceForge.net <no...@so...> - 2010-01-21 20:10:00
|
Bugs item #2935197, was opened at 2010-01-19 19:59 Message generated for change (Comment added) made by nobody You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2935197&group_id=26091 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: refdbd Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: akusmin (akusmin) Assigned to: Markus Hoenicka (mhoenicka) Summary: AU field: Once capitalized, always capitalized Initial Comment: I add citations to my RefDB databases using the command "addref foo.ris " where foo.ris is the file containing citations. When in one file there are two citations sharing the same author, and if in the first such citation the author name is capitalized (FOONAME,F.B while in the second it is not (Fooname,F.B) then, after citations are added, the command getref returns the *capitalized* author name for both citations. Moreover, if there are three citations sharing the same author, the first citation contains AU field Fooname,Frank B. the second FOONAME,F.B and the third Fooname,F.B, then, after adding references from this file, again, the command getref returns FOONAME,F.B. Why I am complaining: I get citations from ISI Web of Science, as you know, until 1995/1996 Author names and Titles are capitalized. Thus, some of citations have capitalized names and some not. When I make bibliography for my LaTeX document, sometimes I see capitalized author names. May be this is already fixed in the SVN version? could you please check? Thank you. P.S. Attached is the zip file with three ris files. In ex1.ris the citation with AU Stuhrmann, H.B, the second file contains one more citation (the very first) where AU has STUHRMANN,H.B, and the third files contains three citations, where the 1st has STUHRMANN,Heinrich.B. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2010-01-21 20:09 Message: Regarding the two drawbacks: 1) So far I dealt with this drawback by looking for a particular citation, fixing author names in this citation, and using updateref. Of course, in the long run, it is not as efficient as de-capitalizing all author names by a script prior using addref <filename> 2) I thougt that one could use as author name a non-capitalized name. Example: Suppose there is only one citation where author is Fooname,K. Associated with this citation is another field: capitalized author name. We use addref to add another citation , where the name is capitalized, FOONAME,K. In the process of adding this citation, its capitalized name is compared with available capitalize names, after the match is found, for the new citation the non-capitalized name is set to be the same as in the first citation. It seems to me that similar method is already used in RefDB for some other field (but I am not sure). Thus, the argument about internal normalization does not look convincing to me. *** Wouldn't it be more straightforward to develop a Perl script which touches up older data from ISI Web? *** Yes, it would. In fact, I have an awk script which does this for AU and JF fields. Of course, it is easy to write such a script. I am not arguing for the sake of argument; I am trying to express the point of view of some what lazy "average user", who can't write a Perl/Python/awk etc script, who just wants to add citations from some source and wants various fields to be decapitalized. I agree with Unix philosophy "One job - one tool", according to this philosophy, may be it is better to have some script, which is applied on a file with citations before this file is added by addref. However, maybe it is not too bad to use some mechanism like described in 2) Anyway, don't take it too seriously. It is not a big issue. regards, André ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-20 13:18 Message: Please don't take me wrong, I'd be happy to change the code if this fixed your problem once and for all. However, the obvious fix has two drawbacks: 1) some of your author names will still be all uppercase, which is simply not desirable. If you create bibliographies from such entries, they'll look odd. 2) it breaks the database normalization as it will allow to have the same author be represented by two separate entries which differ only in case. That is, the fix deals with one problem, leaves another one untouched and creates a different one. This is why I'm reluctant here. Wouldn't it be more straightforward to develop a Perl script which touches up older data from ISI Web? We have similar tools, e.g. for the broken RIS that EndNote exports. Feel free to post a bunch of sample reference data to check whether it is worth a try. ---------------------------------------------------------------------- Comment By: akusmin (akusmin) Date: 2010-01-20 11:58 Message: 1) Regarding example3.ris : I see your point and I agree. 2) Regarding capitalized name overriding the non-capitalized one. *** I still think it is wisest to clean up your input data, the additional effort notwithstanding.*** Again; I see your point, and I was thinking that I could that with all my *old* files with citations using awk/ sed and grep . (For *new* files I don't have this problem because when I add citation from ISI Web , I decapitalize author names). However, in general, if this problem can be fixed with a relatively small effort and without making addref significantly slow, it should be fixed. It is not a big issue, sure, may be we should call it a "feature request". Anyway: I use RefDB since 2006 and I like it. Thank you for your work! ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-19 22:57 Message: example3.ris is misleading in your case because "Stuhrmann,H.B." and "Stuhrmann,Heinrich B." must be treated as separate entries by RefDB because only a human being can decide whether these are the same persons, regardless of capitalization. If you expect the database to treat these strings as the same author, you'd have to maintain your input files accordingly and settle on one version, preferably the one with the first name spelled out. This leaves the problem of "STUHRMANN,H.B." apparently outsmarting "Stuhrmann.H.B.". When adding a reference entry, refdbd tries to find existing authors in the database. If the author already exists, refdbd simply sets a "link" to that author. Only if it doesn't find an existing author, a new author entry is created. Testing for duplicates is done by a SQL expression using the "=" operator. Apparently some database engines treat this as a case-insensitive string comparison. I can reproduce your results with MySQL, but not with PostgreSQL. I have to admit that I've never figured this might pose a problem. I still think it is wisest to clean up your input data, the additional effort notwithstanding. In order to fix this problem programmatically, I'd have to replace the "=" comparison with something that causes a case-sensitive comparison on all database engines. regards, Markus ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2935197&group_id=26091 |
From: SourceForge.net <no...@so...> - 2010-01-20 13:18:37
|
Bugs item #2935197, was opened at 2010-01-19 20:59 Message generated for change (Comment added) made by mhoenicka You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2935197&group_id=26091 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: refdbd Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: akusmin (akusmin) Assigned to: Markus Hoenicka (mhoenicka) Summary: AU field: Once capitalized, always capitalized Initial Comment: I add citations to my RefDB databases using the command "addref foo.ris " where foo.ris is the file containing citations. When in one file there are two citations sharing the same author, and if in the first such citation the author name is capitalized (FOONAME,F.B while in the second it is not (Fooname,F.B) then, after citations are added, the command getref returns the *capitalized* author name for both citations. Moreover, if there are three citations sharing the same author, the first citation contains AU field Fooname,Frank B. the second FOONAME,F.B and the third Fooname,F.B, then, after adding references from this file, again, the command getref returns FOONAME,F.B. Why I am complaining: I get citations from ISI Web of Science, as you know, until 1995/1996 Author names and Titles are capitalized. Thus, some of citations have capitalized names and some not. When I make bibliography for my LaTeX document, sometimes I see capitalized author names. May be this is already fixed in the SVN version? could you please check? Thank you. P.S. Attached is the zip file with three ris files. In ex1.ris the citation with AU Stuhrmann, H.B, the second file contains one more citation (the very first) where AU has STUHRMANN,H.B, and the third files contains three citations, where the 1st has STUHRMANN,Heinrich.B. ---------------------------------------------------------------------- >Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-20 14:18 Message: Please don't take me wrong, I'd be happy to change the code if this fixed your problem once and for all. However, the obvious fix has two drawbacks: 1) some of your author names will still be all uppercase, which is simply not desirable. If you create bibliographies from such entries, they'll look odd. 2) it breaks the database normalization as it will allow to have the same author be represented by two separate entries which differ only in case. That is, the fix deals with one problem, leaves another one untouched and creates a different one. This is why I'm reluctant here. Wouldn't it be more straightforward to develop a Perl script which touches up older data from ISI Web? We have similar tools, e.g. for the broken RIS that EndNote exports. Feel free to post a bunch of sample reference data to check whether it is worth a try. ---------------------------------------------------------------------- Comment By: akusmin (akusmin) Date: 2010-01-20 12:58 Message: 1) Regarding example3.ris : I see your point and I agree. 2) Regarding capitalized name overriding the non-capitalized one. *** I still think it is wisest to clean up your input data, the additional effort notwithstanding.*** Again; I see your point, and I was thinking that I could that with all my *old* files with citations using awk/ sed and grep . (For *new* files I don't have this problem because when I add citation from ISI Web , I decapitalize author names). However, in general, if this problem can be fixed with a relatively small effort and without making addref significantly slow, it should be fixed. It is not a big issue, sure, may be we should call it a "feature request". Anyway: I use RefDB since 2006 and I like it. Thank you for your work! ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-19 23:57 Message: example3.ris is misleading in your case because "Stuhrmann,H.B." and "Stuhrmann,Heinrich B." must be treated as separate entries by RefDB because only a human being can decide whether these are the same persons, regardless of capitalization. If you expect the database to treat these strings as the same author, you'd have to maintain your input files accordingly and settle on one version, preferably the one with the first name spelled out. This leaves the problem of "STUHRMANN,H.B." apparently outsmarting "Stuhrmann.H.B.". When adding a reference entry, refdbd tries to find existing authors in the database. If the author already exists, refdbd simply sets a "link" to that author. Only if it doesn't find an existing author, a new author entry is created. Testing for duplicates is done by a SQL expression using the "=" operator. Apparently some database engines treat this as a case-insensitive string comparison. I can reproduce your results with MySQL, but not with PostgreSQL. I have to admit that I've never figured this might pose a problem. I still think it is wisest to clean up your input data, the additional effort notwithstanding. In order to fix this problem programmatically, I'd have to replace the "=" comparison with something that causes a case-sensitive comparison on all database engines. regards, Markus ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2935197&group_id=26091 |
From: SourceForge.net <no...@so...> - 2010-01-20 11:58:03
|
Bugs item #2935197, was opened at 2010-01-19 19:59 Message generated for change (Comment added) made by akusmin You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2935197&group_id=26091 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: refdbd Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: akusmin (akusmin) Assigned to: Markus Hoenicka (mhoenicka) Summary: AU field: Once capitalized, always capitalized Initial Comment: I add citations to my RefDB databases using the command "addref foo.ris " where foo.ris is the file containing citations. When in one file there are two citations sharing the same author, and if in the first such citation the author name is capitalized (FOONAME,F.B while in the second it is not (Fooname,F.B) then, after citations are added, the command getref returns the *capitalized* author name for both citations. Moreover, if there are three citations sharing the same author, the first citation contains AU field Fooname,Frank B. the second FOONAME,F.B and the third Fooname,F.B, then, after adding references from this file, again, the command getref returns FOONAME,F.B. Why I am complaining: I get citations from ISI Web of Science, as you know, until 1995/1996 Author names and Titles are capitalized. Thus, some of citations have capitalized names and some not. When I make bibliography for my LaTeX document, sometimes I see capitalized author names. May be this is already fixed in the SVN version? could you please check? Thank you. P.S. Attached is the zip file with three ris files. In ex1.ris the citation with AU Stuhrmann, H.B, the second file contains one more citation (the very first) where AU has STUHRMANN,H.B, and the third files contains three citations, where the 1st has STUHRMANN,Heinrich.B. ---------------------------------------------------------------------- Comment By: akusmin (akusmin) Date: 2010-01-20 11:58 Message: 1) Regarding example3.ris : I see your point and I agree. 2) Regarding capitalized name overriding the non-capitalized one. *** I still think it is wisest to clean up your input data, the additional effort notwithstanding.*** Again; I see your point, and I was thinking that I could that with all my *old* files with citations using awk/ sed and grep . (For *new* files I don't have this problem because when I add citation from ISI Web , I decapitalize author names). However, in general, if this problem can be fixed with a relatively small effort and without making addref significantly slow, it should be fixed. It is not a big issue, sure, may be we should call it a "feature request". Anyway: I use RefDB since 2006 and I like it. Thank you for your work! ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-19 22:57 Message: example3.ris is misleading in your case because "Stuhrmann,H.B." and "Stuhrmann,Heinrich B." must be treated as separate entries by RefDB because only a human being can decide whether these are the same persons, regardless of capitalization. If you expect the database to treat these strings as the same author, you'd have to maintain your input files accordingly and settle on one version, preferably the one with the first name spelled out. This leaves the problem of "STUHRMANN,H.B." apparently outsmarting "Stuhrmann.H.B.". When adding a reference entry, refdbd tries to find existing authors in the database. If the author already exists, refdbd simply sets a "link" to that author. Only if it doesn't find an existing author, a new author entry is created. Testing for duplicates is done by a SQL expression using the "=" operator. Apparently some database engines treat this as a case-insensitive string comparison. I can reproduce your results with MySQL, but not with PostgreSQL. I have to admit that I've never figured this might pose a problem. I still think it is wisest to clean up your input data, the additional effort notwithstanding. In order to fix this problem programmatically, I'd have to replace the "=" comparison with something that causes a case-sensitive comparison on all database engines. regards, Markus ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2935197&group_id=26091 |
From: SourceForge.net <no...@so...> - 2010-01-19 22:57:44
|
Bugs item #2935197, was opened at 2010-01-19 20:59 Message generated for change (Comment added) made by mhoenicka You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2935197&group_id=26091 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. >Category: refdbd Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: akusmin (akusmin) >Assigned to: Markus Hoenicka (mhoenicka) Summary: AU field: Once capitalized, always capitalized Initial Comment: I add citations to my RefDB databases using the command "addref foo.ris " where foo.ris is the file containing citations. When in one file there are two citations sharing the same author, and if in the first such citation the author name is capitalized (FOONAME,F.B while in the second it is not (Fooname,F.B) then, after citations are added, the command getref returns the *capitalized* author name for both citations. Moreover, if there are three citations sharing the same author, the first citation contains AU field Fooname,Frank B. the second FOONAME,F.B and the third Fooname,F.B, then, after adding references from this file, again, the command getref returns FOONAME,F.B. Why I am complaining: I get citations from ISI Web of Science, as you know, until 1995/1996 Author names and Titles are capitalized. Thus, some of citations have capitalized names and some not. When I make bibliography for my LaTeX document, sometimes I see capitalized author names. May be this is already fixed in the SVN version? could you please check? Thank you. P.S. Attached is the zip file with three ris files. In ex1.ris the citation with AU Stuhrmann, H.B, the second file contains one more citation (the very first) where AU has STUHRMANN,H.B, and the third files contains three citations, where the 1st has STUHRMANN,Heinrich.B. ---------------------------------------------------------------------- >Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-19 23:57 Message: example3.ris is misleading in your case because "Stuhrmann,H.B." and "Stuhrmann,Heinrich B." must be treated as separate entries by RefDB because only a human being can decide whether these are the same persons, regardless of capitalization. If you expect the database to treat these strings as the same author, you'd have to maintain your input files accordingly and settle on one version, preferably the one with the first name spelled out. This leaves the problem of "STUHRMANN,H.B." apparently outsmarting "Stuhrmann.H.B.". When adding a reference entry, refdbd tries to find existing authors in the database. If the author already exists, refdbd simply sets a "link" to that author. Only if it doesn't find an existing author, a new author entry is created. Testing for duplicates is done by a SQL expression using the "=" operator. Apparently some database engines treat this as a case-insensitive string comparison. I can reproduce your results with MySQL, but not with PostgreSQL. I have to admit that I've never figured this might pose a problem. I still think it is wisest to clean up your input data, the additional effort notwithstanding. In order to fix this problem programmatically, I'd have to replace the "=" comparison with something that causes a case-sensitive comparison on all database engines. regards, Markus ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2935197&group_id=26091 |
From: SourceForge.net <no...@so...> - 2010-01-19 19:59:34
|
Bugs item #2935197, was opened at 2010-01-19 19:59 Message generated for change (Tracker Item Submitted) made by akusmin You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2935197&group_id=26091 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: akusmin (akusmin) Assigned to: Nobody/Anonymous (nobody) Summary: AU field: Once capitalized, always capitalized Initial Comment: I add citations to my RefDB databases using the command "addref foo.ris " where foo.ris is the file containing citations. When in one file there are two citations sharing the same author, and if in the first such citation the author name is capitalized (FOONAME,F.B while in the second it is not (Fooname,F.B) then, after citations are added, the command getref returns the *capitalized* author name for both citations. Moreover, if there are three citations sharing the same author, the first citation contains AU field Fooname,Frank B. the second FOONAME,F.B and the third Fooname,F.B, then, after adding references from this file, again, the command getref returns FOONAME,F.B. Why I am complaining: I get citations from ISI Web of Science, as you know, until 1995/1996 Author names and Titles are capitalized. Thus, some of citations have capitalized names and some not. When I make bibliography for my LaTeX document, sometimes I see capitalized author names. May be this is already fixed in the SVN version? could you please check? Thank you. P.S. Attached is the zip file with three ris files. In ex1.ris the citation with AU Stuhrmann, H.B, the second file contains one more citation (the very first) where AU has STUHRMANN,H.B, and the third files contains three citations, where the 1st has STUHRMANN,Heinrich.B. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2935197&group_id=26091 |
From: SourceForge.net <no...@so...> - 2010-01-19 19:33:41
|
Bugs item #2927787, was opened at 2010-01-07 20:09 Message generated for change (Comment added) made by akusmin You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2927787&group_id=26091 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: refdbd Group: None Status: Open Resolution: Fixed Priority: 5 Private: No Submitted By: akusmin (akusmin) Assigned to: Markus Hoenicka (mhoenicka) Summary: L1 - L4 fields are not displayed in HTML output Initial Comment: Until recently I used pdfroot and AV field to store the location of my articles in pdf. The command refdbc -C getref '(:AU:~Mason) -t html -o my.html -S PY -d test1' would produce my.html where for all citations there was a link to a pdf file Now the article locations are stored in L1, L2, L3 , L4 fields. The same command produces HTML file, where there is a line REPRINT: IN FILE but there are no links to the file. But when I run the comman without HTML output, I see refdbc -C getref '(:AU:~Mason) -S PY -d test1' .. .. REPRINT: IN FILE PDF: PATH:work/lit/ms/S/SteinerSaenger1991.pdf FULLTEXT: PATH:work/lit/ms/S/Ste991.pdf RELATED: PATH:work/lit/ms/S/Ste99aaaaa1.pdf IMAGE: PATH:work/lit/ms/S/eaaaa991.pdf ---------------------------------------------------------------------- >Comment By: akusmin (akusmin) Date: 2010-01-19 19:33 Message: Right. I am sorry, I read about it in the RefDB manual, but it slipped my mind. Thank you. ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-17 17:04 Message: If you continue to use data from previous RefDB releases, you'll have to migrate these data properly in order to get rid of the "PATH:" prefix. The exact procedure is described in the file UPGRADING in the top level directory of the RefDB sources. In brief, you'll have to substitute the string "PATH:" by a valid URI prefix like "file://" to make the links work on your box. UPGRADING contains an example of a sed command which can do this conversion on your exported data. After the substitution, re-import your data again. ---------------------------------------------------------------------- Comment By: akusmin (akusmin) Date: 2010-01-17 14:20 Message: Hi! Thanks for a quick response. Yes, adding UR into refdbcrc did result in displaying the links in a HTML output. But there is one problem left (and I had to say about it before): the links (with or without HTML output, see the example in the original bug description) still contain the word PATH which is not substituted by the pdfroot . Thus, I can not open a pdf file simply by clicking on the link in a browser. Did you fix it as well ? Should this problem be reported as a separate bug ? best regards, A. Kusmin ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-09 01:01 Message: Ok, I forgot to mention that the proper way to request L1-L4 output using the current SVN revision (i.e. after fixing the bug) is to use "-s LX" or an appropriate entry in your refdbc config file. ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-09 00:58 Message: Thanks for reporting this problem. There was indeed an inconsistency in evaluating the string which specifies optional components of the screen and html outputs. I've fixed this problem in SVN. For the time being, please use "-s UR" or an equivalent entry in your refdbc config file to explicitly request L1-L4 output from the html backend. This solution works at least with the previous svn revision. Please shout again if this workaround doesn't help in 0.9.9-1. ---------------------------------------------------------------------- Comment By: akusmin (akusmin) Date: 2010-01-07 20:15 Message: Some more info on the version of RefDB I use: refdba: viewstat You are served by: refdb 0.9.9-1 SVN revision: 531 Client IP: 127.0.0.1 Connected via mysql driver (dbd_mysql v0.8.3-1) to: 5.0.84 db version: 3 serverip: localhost timeout: 180 dbs_port: 3306 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2927787&group_id=26091 |
From: SourceForge.net <no...@so...> - 2010-01-17 17:04:37
|
Bugs item #2927787, was opened at 2010-01-07 21:09 Message generated for change (Comment added) made by mhoenicka You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2927787&group_id=26091 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: refdbd Group: None Status: Open Resolution: Fixed Priority: 5 Private: No Submitted By: akusmin (akusmin) Assigned to: Markus Hoenicka (mhoenicka) Summary: L1 - L4 fields are not displayed in HTML output Initial Comment: Until recently I used pdfroot and AV field to store the location of my articles in pdf. The command refdbc -C getref '(:AU:~Mason) -t html -o my.html -S PY -d test1' would produce my.html where for all citations there was a link to a pdf file Now the article locations are stored in L1, L2, L3 , L4 fields. The same command produces HTML file, where there is a line REPRINT: IN FILE but there are no links to the file. But when I run the comman without HTML output, I see refdbc -C getref '(:AU:~Mason) -S PY -d test1' .. .. REPRINT: IN FILE PDF: PATH:work/lit/ms/S/SteinerSaenger1991.pdf FULLTEXT: PATH:work/lit/ms/S/Ste991.pdf RELATED: PATH:work/lit/ms/S/Ste99aaaaa1.pdf IMAGE: PATH:work/lit/ms/S/eaaaa991.pdf ---------------------------------------------------------------------- >Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-17 18:04 Message: If you continue to use data from previous RefDB releases, you'll have to migrate these data properly in order to get rid of the "PATH:" prefix. The exact procedure is described in the file UPGRADING in the top level directory of the RefDB sources. In brief, you'll have to substitute the string "PATH:" by a valid URI prefix like "file://" to make the links work on your box. UPGRADING contains an example of a sed command which can do this conversion on your exported data. After the substitution, re-import your data again. ---------------------------------------------------------------------- Comment By: akusmin (akusmin) Date: 2010-01-17 15:20 Message: Hi! Thanks for a quick response. Yes, adding UR into refdbcrc did result in displaying the links in a HTML output. But there is one problem left (and I had to say about it before): the links (with or without HTML output, see the example in the original bug description) still contain the word PATH which is not substituted by the pdfroot . Thus, I can not open a pdf file simply by clicking on the link in a browser. Did you fix it as well ? Should this problem be reported as a separate bug ? best regards, A. Kusmin ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-09 02:01 Message: Ok, I forgot to mention that the proper way to request L1-L4 output using the current SVN revision (i.e. after fixing the bug) is to use "-s LX" or an appropriate entry in your refdbc config file. ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-09 01:58 Message: Thanks for reporting this problem. There was indeed an inconsistency in evaluating the string which specifies optional components of the screen and html outputs. I've fixed this problem in SVN. For the time being, please use "-s UR" or an equivalent entry in your refdbc config file to explicitly request L1-L4 output from the html backend. This solution works at least with the previous svn revision. Please shout again if this workaround doesn't help in 0.9.9-1. ---------------------------------------------------------------------- Comment By: akusmin (akusmin) Date: 2010-01-07 21:15 Message: Some more info on the version of RefDB I use: refdba: viewstat You are served by: refdb 0.9.9-1 SVN revision: 531 Client IP: 127.0.0.1 Connected via mysql driver (dbd_mysql v0.8.3-1) to: 5.0.84 db version: 3 serverip: localhost timeout: 180 dbs_port: 3306 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2927787&group_id=26091 |
From: SourceForge.net <no...@so...> - 2010-01-17 14:20:01
|
Bugs item #2927787, was opened at 2010-01-07 20:09 Message generated for change (Comment added) made by akusmin You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2927787&group_id=26091 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: refdbd Group: None Status: Closed Resolution: Fixed Priority: 5 Private: No Submitted By: akusmin (akusmin) Assigned to: Markus Hoenicka (mhoenicka) Summary: L1 - L4 fields are not displayed in HTML output Initial Comment: Until recently I used pdfroot and AV field to store the location of my articles in pdf. The command refdbc -C getref '(:AU:~Mason) -t html -o my.html -S PY -d test1' would produce my.html where for all citations there was a link to a pdf file Now the article locations are stored in L1, L2, L3 , L4 fields. The same command produces HTML file, where there is a line REPRINT: IN FILE but there are no links to the file. But when I run the comman without HTML output, I see refdbc -C getref '(:AU:~Mason) -S PY -d test1' .. .. REPRINT: IN FILE PDF: PATH:work/lit/ms/S/SteinerSaenger1991.pdf FULLTEXT: PATH:work/lit/ms/S/Ste991.pdf RELATED: PATH:work/lit/ms/S/Ste99aaaaa1.pdf IMAGE: PATH:work/lit/ms/S/eaaaa991.pdf ---------------------------------------------------------------------- Comment By: akusmin (akusmin) Date: 2010-01-17 14:20 Message: Hi! Thanks for a quick response. Yes, adding UR into refdbcrc did result in displaying the links in a HTML output. But there is one problem left (and I had to say about it before): the links (with or without HTML output, see the example in the original bug description) still contain the word PATH which is not substituted by the pdfroot . Thus, I can not open a pdf file simply by clicking on the link in a browser. Did you fix it as well ? Should this problem be reported as a separate bug ? best regards, A. Kusmin ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-09 01:01 Message: Ok, I forgot to mention that the proper way to request L1-L4 output using the current SVN revision (i.e. after fixing the bug) is to use "-s LX" or an appropriate entry in your refdbc config file. ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-09 00:58 Message: Thanks for reporting this problem. There was indeed an inconsistency in evaluating the string which specifies optional components of the screen and html outputs. I've fixed this problem in SVN. For the time being, please use "-s UR" or an equivalent entry in your refdbc config file to explicitly request L1-L4 output from the html backend. This solution works at least with the previous svn revision. Please shout again if this workaround doesn't help in 0.9.9-1. ---------------------------------------------------------------------- Comment By: akusmin (akusmin) Date: 2010-01-07 20:15 Message: Some more info on the version of RefDB I use: refdba: viewstat You are served by: refdb 0.9.9-1 SVN revision: 531 Client IP: 127.0.0.1 Connected via mysql driver (dbd_mysql v0.8.3-1) to: 5.0.84 db version: 3 serverip: localhost timeout: 180 dbs_port: 3306 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2927787&group_id=26091 |
From: SourceForge.net <no...@so...> - 2010-01-09 02:16:26
|
Bugs item #2927787, was opened at 2010-01-07 21:09 Message generated for change (Comment added) made by mhoenicka You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2927787&group_id=26091 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: refdbd Group: None Status: Closed Resolution: Fixed Priority: 5 Private: No Submitted By: akusmin (akusmin) Assigned to: Markus Hoenicka (mhoenicka) Summary: L1 - L4 fields are not displayed in HTML output Initial Comment: Until recently I used pdfroot and AV field to store the location of my articles in pdf. The command refdbc -C getref '(:AU:~Mason) -t html -o my.html -S PY -d test1' would produce my.html where for all citations there was a link to a pdf file Now the article locations are stored in L1, L2, L3 , L4 fields. The same command produces HTML file, where there is a line REPRINT: IN FILE but there are no links to the file. But when I run the comman without HTML output, I see refdbc -C getref '(:AU:~Mason) -S PY -d test1' .. .. REPRINT: IN FILE PDF: PATH:work/lit/ms/S/SteinerSaenger1991.pdf FULLTEXT: PATH:work/lit/ms/S/Ste991.pdf RELATED: PATH:work/lit/ms/S/Ste99aaaaa1.pdf IMAGE: PATH:work/lit/ms/S/eaaaa991.pdf ---------------------------------------------------------------------- >Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-09 02:01 Message: Ok, I forgot to mention that the proper way to request L1-L4 output using the current SVN revision (i.e. after fixing the bug) is to use "-s LX" or an appropriate entry in your refdbc config file. ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-09 01:58 Message: Thanks for reporting this problem. There was indeed an inconsistency in evaluating the string which specifies optional components of the screen and html outputs. I've fixed this problem in SVN. For the time being, please use "-s UR" or an equivalent entry in your refdbc config file to explicitly request L1-L4 output from the html backend. This solution works at least with the previous svn revision. Please shout again if this workaround doesn't help in 0.9.9-1. ---------------------------------------------------------------------- Comment By: akusmin (akusmin) Date: 2010-01-07 21:15 Message: Some more info on the version of RefDB I use: refdba: viewstat You are served by: refdb 0.9.9-1 SVN revision: 531 Client IP: 127.0.0.1 Connected via mysql driver (dbd_mysql v0.8.3-1) to: 5.0.84 db version: 3 serverip: localhost timeout: 180 dbs_port: 3306 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2927787&group_id=26091 |
From: SourceForge.net <no...@so...> - 2010-01-09 00:58:40
|
Bugs item #2927787, was opened at 2010-01-07 21:09 Message generated for change (Settings changed) made by mhoenicka You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2927787&group_id=26091 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. >Category: refdbd Group: None >Status: Closed >Resolution: Fixed Priority: 5 Private: No Submitted By: akusmin (akusmin) >Assigned to: Markus Hoenicka (mhoenicka) Summary: L1 - L4 fields are not displayed in HTML output Initial Comment: Until recently I used pdfroot and AV field to store the location of my articles in pdf. The command refdbc -C getref '(:AU:~Mason) -t html -o my.html -S PY -d test1' would produce my.html where for all citations there was a link to a pdf file Now the article locations are stored in L1, L2, L3 , L4 fields. The same command produces HTML file, where there is a line REPRINT: IN FILE but there are no links to the file. But when I run the comman without HTML output, I see refdbc -C getref '(:AU:~Mason) -S PY -d test1' .. .. REPRINT: IN FILE PDF: PATH:work/lit/ms/S/SteinerSaenger1991.pdf FULLTEXT: PATH:work/lit/ms/S/Ste991.pdf RELATED: PATH:work/lit/ms/S/Ste99aaaaa1.pdf IMAGE: PATH:work/lit/ms/S/eaaaa991.pdf ---------------------------------------------------------------------- >Comment By: Markus Hoenicka (mhoenicka) Date: 2010-01-09 01:58 Message: Thanks for reporting this problem. There was indeed an inconsistency in evaluating the string which specifies optional components of the screen and html outputs. I've fixed this problem in SVN. For the time being, please use "-s UR" or an equivalent entry in your refdbc config file to explicitly request L1-L4 output from the html backend. This solution works at least with the previous svn revision. Please shout again if this workaround doesn't help in 0.9.9-1. ---------------------------------------------------------------------- Comment By: akusmin (akusmin) Date: 2010-01-07 21:15 Message: Some more info on the version of RefDB I use: refdba: viewstat You are served by: refdb 0.9.9-1 SVN revision: 531 Client IP: 127.0.0.1 Connected via mysql driver (dbd_mysql v0.8.3-1) to: 5.0.84 db version: 3 serverip: localhost timeout: 180 dbs_port: 3306 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2927787&group_id=26091 |
From: SourceForge.net <no...@so...> - 2010-01-07 20:15:45
|
Bugs item #2927787, was opened at 2010-01-07 20:09 Message generated for change (Comment added) made by akusmin You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2927787&group_id=26091 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: akusmin (akusmin) Assigned to: Nobody/Anonymous (nobody) Summary: L1 - L4 fields are not displayed in HTML output Initial Comment: Until recently I used pdfroot and AV field to store the location of my articles in pdf. The command refdbc -C getref '(:AU:~Mason) -t html -o my.html -S PY -d test1' would produce my.html where for all citations there was a link to a pdf file Now the article locations are stored in L1, L2, L3 , L4 fields. The same command produces HTML file, where there is a line REPRINT: IN FILE but there are no links to the file. But when I run the comman without HTML output, I see refdbc -C getref '(:AU:~Mason) -S PY -d test1' .. .. REPRINT: IN FILE PDF: PATH:work/lit/ms/S/SteinerSaenger1991.pdf FULLTEXT: PATH:work/lit/ms/S/Ste991.pdf RELATED: PATH:work/lit/ms/S/Ste99aaaaa1.pdf IMAGE: PATH:work/lit/ms/S/eaaaa991.pdf ---------------------------------------------------------------------- >Comment By: akusmin (akusmin) Date: 2010-01-07 20:15 Message: Some more info on the version of RefDB I use: refdba: viewstat You are served by: refdb 0.9.9-1 SVN revision: 531 Client IP: 127.0.0.1 Connected via mysql driver (dbd_mysql v0.8.3-1) to: 5.0.84 db version: 3 serverip: localhost timeout: 180 dbs_port: 3306 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2927787&group_id=26091 |
From: SourceForge.net <no...@so...> - 2010-01-07 20:09:51
|
Bugs item #2927787, was opened at 2010-01-07 20:09 Message generated for change (Tracker Item Submitted) made by akusmin You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2927787&group_id=26091 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: akusmin (akusmin) Assigned to: Nobody/Anonymous (nobody) Summary: L1 - L4 fields are not displayed in HTML output Initial Comment: Until recently I used pdfroot and AV field to store the location of my articles in pdf. The command refdbc -C getref '(:AU:~Mason) -t html -o my.html -S PY -d test1' would produce my.html where for all citations there was a link to a pdf file Now the article locations are stored in L1, L2, L3 , L4 fields. The same command produces HTML file, where there is a line REPRINT: IN FILE but there are no links to the file. But when I run the comman without HTML output, I see refdbc -C getref '(:AU:~Mason) -S PY -d test1' .. .. REPRINT: IN FILE PDF: PATH:work/lit/ms/S/SteinerSaenger1991.pdf FULLTEXT: PATH:work/lit/ms/S/Ste991.pdf RELATED: PATH:work/lit/ms/S/Ste99aaaaa1.pdf IMAGE: PATH:work/lit/ms/S/eaaaa991.pdf ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2927787&group_id=26091 |
From: SourceForge.net <no...@so...> - 2009-10-14 22:40:25
|
Bugs item #2877685, was opened at 2009-10-13 08:58 Message generated for change (Settings changed) made by mhoenicka You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2877685&group_id=26091 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: refdbd Group: None Status: Open >Resolution: Fixed Priority: 5 Private: No Submitted By: Torsten Bronger (bronger) Assigned to: Markus Hoenicka (mhoenicka) Summary: Note links of foreign user don't work properly Initial Comment: If I'd like to retrieve all references connected with a public extended note of another user, it fails: refdbc: getnote :CK:=Juka2001 AND :NCK:=django-refdb-global-pdfs Note ID:51 (Mon Oct 05 2009) Key: django-refdb-global-pdfs 999:1 retrieved:0 failed refdbc: getref :NCK:=django-refdb-global-pdfs 999:0 retrieved:0 failed ---------------------------------------------------------------------- >Comment By: Markus Hoenicka (mhoenicka) Date: 2009-10-15 00:40 Message: This is supposed to be fixed in revision 705. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2877685&group_id=26091 |
From: SourceForge.net <no...@so...> - 2009-10-13 06:58:53
|
Bugs item #2877685, was opened at 2009-10-13 08:58 Message generated for change (Tracker Item Submitted) made by bronger You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2877685&group_id=26091 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: refdbd Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Torsten Bronger (bronger) Assigned to: Markus Hoenicka (mhoenicka) Summary: Note links of foreign user don't work properly Initial Comment: If I'd like to retrieve all references connected with a public extended note of another user, it fails: refdbc: getnote :CK:=Juka2001 AND :NCK:=django-refdb-global-pdfs Note ID:51 (Mon Oct 05 2009) Key: django-refdb-global-pdfs 999:1 retrieved:0 failed refdbc: getref :NCK:=django-refdb-global-pdfs 999:0 retrieved:0 failed ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2877685&group_id=26091 |
From: SourceForge.net <no...@so...> - 2009-10-05 20:40:28
|
Feature Requests item #1836589, was opened at 2007-11-22 18:41 Message generated for change (Settings changed) made by mhoenicka You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385994&aid=1836589&group_id=26091 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: web interface Group: None >Status: Closed Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Markus Hoenicka (mhoenicka) Summary: show all references feature Initial Comment: It would be very useful to have a 'show all references' feature to be able to look at the full list of references in your bibliography. ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2007-11-23 14:20 Message: Logged In: YES user_id=85809 Originator: NO Something along these lines will be available as soon as the query results can be chunked. Currently all matching datasets are returned in one long list, which causes PHP to max out at a certain number of references (approx. 300 on my box). Therefore, in the current implementation a "show all" button would be equivalent to a "crash me" button for every serious database. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385994&aid=1836589&group_id=26091 |
From: SourceForge.net <no...@so...> - 2009-10-05 20:38:30
|
Feature Requests item #2872243, was opened at 2009-10-03 16:36 Message generated for change (Settings changed) made by mhoenicka You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385994&aid=2872243&group_id=26091 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: refdbd Group: None >Status: Closed Priority: 5 Private: No Submitted By: Torsten Bronger (bronger) Assigned to: Markus Hoenicka (mhoenicka) Summary: Get IDs fast Initial Comment: Currently, it takes 40ms per reference to get the ID of a found reference: $ time refdbc -u refdb -w Sonne -d biblio -C getref -s ID -t ris ":ID:>0" > /dev/null 999:96 retrieved:0 failed real 0m4.026s user 0m0.000s sys 0m0.004s This is problematic for a web frontend because even if you work with aggressive caching, you have to know at least the IDs of found references. Therefore, I request to optimise the ID-only request. ---------------------------------------------------------------------- Comment By: Torsten Bronger (bronger) Date: 2009-10-05 18:03 Message: No further configuration is necessary. In my test case, the time dropped from 3.8 seconds to 0.16 seconds. Now, caching is real fun. Great! ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2009-10-04 23:36 Message: I've tried to track down where refdbd spends its time returning the ID list. Looks like lots of time are wasted doing the client/server messaging as refdbd, by default, returns reference data one dataset at a time. If you return ID lists, which consist of RIS datasets with 4 lines each, the overhead is out of proportion. Please have a look at refdbdgetref.c as of revision 703. There is a tunable at line 2841 which is set to default values according to the type of query a few lines further down. The idea is to group references before sending them to the client. This requires more memory, but reduces the overhead of client/server messaging. I've arrived at values of 100 for ID queries and 10 for other queries empirically, looking only at RIS data. These values certainly depend on the speed and memory of the machine refdbd runs on. Feel free to play with these numbers and see if it helps. If it does, I could turn this into configurable parameters. I've managed to reduce the time for retrieving 100 IDs to 0.732s from 10.66s and the time for retrieving 100 RIS datasets to 3.96s from 12.42s using the current defaults. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385994&aid=2872243&group_id=26091 |
From: SourceForge.net <no...@so...> - 2009-10-05 19:47:28
|
Bugs item #2872544, was opened at 2009-10-04 14:37 Message generated for change (Settings changed) made by mhoenicka You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2872544&group_id=26091 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: refdbd Group: None >Status: Closed >Resolution: Fixed Priority: 5 Private: No Submitted By: Torsten Bronger (bronger) Assigned to: Markus Hoenicka (mhoenicka) Summary: Docs: xnote's "target" attribute only accepts citation keys Initial Comment: On refdb.sourceforge.net/xnote/elements/link.html it says that you can use the ID to denote references in the "target" attribute of xnote's "link" element. However, my observation is that you must use citation keys (which makes things simpler for me by the way). ---------------------------------------------------------------------- >Comment By: Markus Hoenicka (mhoenicka) Date: 2009-10-05 21:47 Message: I see. I've updated the xnote DTD to allow "refid" as a type to specify IDs, in addition to "reference" which is used for citation keys. The documentation was updated accordingly, see revision 704. ---------------------------------------------------------------------- Comment By: Torsten Bronger (bronger) Date: 2009-10-05 11:14 Message: Sorry, my mistake. The link should have been http://refdb.sourceforge.net/manual/ch07s05.html (the bottom of the page). ---------------------------------------------------------------------- Comment By: Markus Hoenicka (mhoenicka) Date: 2009-10-05 10:25 Message: Unless I'm dense, refdb.sourceforge.net/xnote/elements/link.html does not explicitly claim that you can use the ID. It talks about the "name" of a reference, which should certainly be rephrased to say "citation key" explicitly in order to avoid confusion. However, internally refdbd is supposed to be able to use ID values as well. It is only the xnote DTD which currently does not allow the "refid" type value at this point ("reference" is treated as citation key). I'll add this to the DTD shortly. As for addlink, IDs should be supported since revision 696. If that still does not work, let me know. ---------------------------------------------------------------------- Comment By: Torsten Bronger (bronger) Date: 2009-10-04 16:44 Message: Apparently, even with "addlink" you can't use IDs: refdbc: deletelink :NCK:=refdb-refdb :ID:=1 419:refdb-refdb -> REFERENCE:1 999:1 removed:0 skipped:0 failed refdbc: addlink :NCK:=refdb-refdb :ID:=1 414:refdb-refdb -> REFERENCE:1 999:0 added:0 skipped:1 failed ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=385991&aid=2872544&group_id=26091 |