Thread: [Treebase-devel] study_id vs submission_id issues

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi all,

I'm a bit confused by an issue that has cropped up. 

The data model contains two tables, study and submission in which the submission table contains a study_id column as FK. I cannot fathom why one study could have two or more submissions, so I think we are to assume a one-to-one connection between these tables. I'm not clear why we need a separate study and submission table at all -- possibly it is because an early iteration envisioned a scenario where new submissions are tracked with a submission_id, and the submitting person is not informed of the study_id until the data get vetted. Or perhaps we envisioned that the editing of a new submission gets taken over by a co-author on the same existing study_id record. At any rate, for whatever reason, we have study_ids and submission_ids to contend with. 

To date, all migrated data, and all new test submissions created at SDSC, have their study_id equal to their submission_id. So up until now, I hadn't really noticed an issue because study number and submission number were completely interchangeable.  I had assumed that when a new submission is created, and the submission_id says "1234", then if it ever gets published, the study_id will also be "1234".  Now, however, this has changed -- the two ids have become out of sync. And once we change over to a common sequence table, there is no hope of the two being in sync ever again. 

In terms of database integrity, there's probably no problem.  In terms of confusion with our users, however, we may have problems. 

For example, I think it was Rutger who kindly added a radio button to the searchBySubmissionID.html page so that an Admin person can find a submission/study to edit it (see attached image). The radio button seems to say "search by TreeBASE2 Id" -- but in fact, it is really searching by submission_id, which happens to work for 99% of studies (because the two ids are in sync), but will not work from hereafter. For this to be useful, the dialog box should be changed to offer three choices: TreeBASE2 study_id, TreeBASE2 submission_id, and TreeBASE1 legacy study_id. 

When a user creates a new submission record, a new study record is also instantly created. The submission summary page lists the submission_id (e.g. 10050, below) as well as the study_id imbedded in the "right-click and copy me" link (in this case "S10060" -- out of sync by a small but significant value from 10050). When an editor clicks the "Contact Submitter" link, an email with the subject header "TreeBASE Submission S10050" is generated on a local email client. So, in all email discussions between submitter and editor, the submission_id is being used to make it clear what the discussion is about. The risk here is that some submitters will go ahead and cite the meaningless "S10050" number in their manuscripts instead of the "S10060" number (or a phylows URL containing that number). 

I guess we have a couple of solutions:

1. mechanically make it so that submission_id and study_id are always in sync, thereby making them interchangeable and the whole issue moot. The problem is that this may be difficult to fix without having to change lots of code. 

2. hide the submission_id from *everyone* -- submitters, users, and editors alike. Wherever the submission_id is shown, have it show the study_id instead. In email discussions between submitter and editor, only the study_id is cited. The downside is that this will probably take a lot of work. 

3. fix the searchBySubmissionID.html page so that there are three radio buttons to choose from (study_id, submission_id, and legacy_id). Then add a "Study Accession URL" to the submission summary page (in addition to the Reviewer Access URL) so that the user knows exactly what to cite and does not confuse submission_id with study_id. 

I guess I vote for solution 3, since that requires the least amount of coding. The risk is that users will contact editors with questions like "what happened to the data in S1234" and we won't know if they mean submission S1234 or study S1234 -- the result is an additional back-and-forth of emails to clarify.  What might mitigate this possibility, is if everywhere that the submission_id appears on our web pages, we write it like "Sub1234" -- making it more likely that in future communication we know what the integer means. 

Bill

This is the Search Submission page which is used by editors and admin users. Currently the "TreeBASE2 Id" actually searches the submission_id.  Either both ids should be totally in sync or we should add another radio button to distinguish the two.

Summary for current study page for submitting users and for admin editors. Here the reviewer access URL correctly uses the study_id, whereas the submission_id is presented to the user. Either both ids should be totally in sync, or we should be clear on what the final "Study Accession URL" will look like (with the study_id embedded therein). 

Thread: [Treebase-devel] study_id vs submission_id issues

treebase-devel