Menu

#408 Handle missing input columns in assign-confidence

post v2.0
open
Kaipo
None
2016-07-02
2016-07-02
No

Feb 25:

Kaipo,

There is a bug in assign-confidence where, if the input tab-delimited file is missing some columns, the program will happily print those columns in the output, filling them in with missing junk. I would like to fix this, but I'm not sure the best way to do it, so I thought I'd see if you have a suggestion. Basically, AssignConfidence.cpp currently contains a big chunk of code enclosed in an "if" statement that sets the values of

vector<bool> cols_to_print(NUMBER_MATCH_COLUMNS);</bool>

Most of these values are hard-coded to true. I would like, instead, to set many of them based on whether the value exists in the matches stored in the collection. The only way I can see to do this would be to get the first match out of the collection and then see which fields it contains. This is difficult, though, because many of the required get/set functions are missing in the interface to Match.h. So I was thinking instead of writing a function in MatchCollection.cpp that takes a match collection and returns a vector of Booleans of length NUMBER_MATCH_COLUMNS, where the value is true or false depending on whether the matches in the collection have had the corresponding field filled in. Do you see any drawback to doing it this way?

Thanks.
Bill


Hi Bill,
That sounds like a reasonable way to do it, though it might be good to implement the function at the Match level, and then just call it for the first Match?


On second thought, is there any reason not to have all of this logic hidden inside the MatchCollection and print module? It seems like we should just keep track of which fields have been set and only print those. The user shouldn't have to specify which fields get printed. Do you agree?

Then the question is how to store this information. Essentially, we need a Boolean vector like

vector<bool> cols_to_print(NUMBER_MATCH_COLUMNS);</bool>

associated with each match collection. One problem, I guess, is that the NUMBER_MATCH_COLUMNS is inside io/MatchColumns.h.

What do you think?


Hi Bill,
Do we need to keep track of what fields have been set? I think that we can determine it just by inspecting the other fields in a MatchCollection, using the getScoredType function for most of the columns in question.
Kaipo

Discussion


Log in to post a comment.

MongoDB Logo MongoDB