svn+ssh://bugman@.../svn/relax/trunk
........
r27246 | bugman | 2015-01-20 15:44:29 +0100 (Tue, 20 Jan 2015) | 3 lines
Fix for the Relax_disp.test_bug_23186_cluster_error_calc_dw system test on 32-bit and Python <= 2.5 systems.
........
r27248 | bugman | 2015-01-21 09:54:28 +0100 (Wed, 21 Jan 2015) | 6 lines
Better error handling in the structure.align user function.
If no common atoms can be found between the structures, a RelaxError is now raised for better user
feedback.
........
r27249 | bugman | 2015-01-21 10:07:23 +0100 (Wed, 21 Jan 2015) | 6 lines
Created an empty lib.sequence_alignment relax library package.
This may be used in the future for implementing more advanced structural alignments (the current
method is simply to skip missing atoms, sequence numbering changes are not handled).
........
r27250 | bugman | 2015-01-21 11:23:41 +0100 (Wed, 21 Jan 2015) | 3 lines
Added the sequence_alignment package to the lib package __all__ list.
........
r27251 | bugman | 2015-01-21 11:25:26 +0100 (Wed, 21 Jan 2015) | 3 lines
Added the unit testing infrastructure for the new lib.sequence_alignment package.
........
r27252 | bugman | 2015-01-21 11:37:37 +0100 (Wed, 21 Jan 2015) | 6 lines
Implementation of the Needleman-Wunsch sequence alignment algorithm.
This is located in the lib.sequence_alignment.needleman_wunsch module. This is implemented as
described in the Wikipedia article https://en.wikipedia.org/wiki/Needleman%E2%80%93Wunsch_algorithm.
........
r27253 | bugman | 2015-01-21 11:39:24 +0100 (Wed, 21 Jan 2015) | 8 lines
Created a unit test for checking the Needleman-Wunsch sequence alignment algorithm.
This uses the DNA data from the example in the Wikipedia article at
https://en.wikipedia.org/wiki/Needleman%E2%80%93Wunsch_algorithm. The test shows that the
implementation of the lib.sequence_alignment.needleman_wunsch.needleman_wunsch_align() function is
correct.
........
r27254 | bugman | 2015-01-21 12:15:53 +0100 (Wed, 21 Jan 2015) | 6 lines
Created the lib.sequence_alignment.substitution_matrices module.
This is for storing substitution matrices for use in sequence alignment. The module currently only
includes the BLOSSUM62 matrix.
........
r27255 | bugman | 2015-01-21 12:21:53 +0100 (Wed, 21 Jan 2015) | 3 lines
Corrected the spelling of the BLOSUM62 matrix in lib.sequence_alignment.substitution_matrices.
........
r27256 | bugman | 2015-01-21 14:03:24 +0100 (Wed, 21 Jan 2015) | 3 lines
Fix for the lib.sequence_alignment.substitution_matrices.BLOSUM62_SEQ string.
........
r27257 | bugman | 2015-01-21 15:36:43 +0100 (Wed, 21 Jan 2015) | 7 lines
Modification of the Needleman-Wunsch sequence alignment algorithm implementation.
This is in the lib.sequence_alignment.needleman_wunsch functions. Scoring matrices are now
supported, as well as a user supplied non-integer gap penalty. The algorithm for walking through
the traceback matrix has been fixed for a bug under certain conditions.
........
r27258 | bugman | 2015-01-21 15:40:56 +0100 (Wed, 21 Jan 2015) | 9 lines
Created the lib.sequence_alignment.align_protein module for the sequence alignment of proteins.
This general module currently implements the align_pairwise() function for the pairwise alignment of
protein sequences. It provides the infrastructure for specifying gap starting and extension
penalties, choosing the alignment algorithm (currently only the Needleman-Wunsch sequence alignment
algorithm as 'NW70'), and choosing the substitution matrix (currently only BLOSUM62). The function
provides lots of printouts for user feedback.
........
r27259 | bugman | 2015-01-21 15:52:03 +0100 (Wed, 21 Jan 2015) | 6 lines
Created a unit test for lib.sequence_alignment.align_protein.align_pairwise().
This is to test the pairwise alignment of two protein sequences using the Needleman-Wunsch sequence
alignment algorithm, BLOSUM62 substitution matrix, and gap penalty of 10.0.
........
r27260 | bugman | 2015-01-21 15:58:43 +0100 (Wed, 21 Jan 2015) | 5 lines
Added more printouts to the Test_align_protein.test_align_pairwise unit test.
This is the test of the module _lib._sequence_alignment.test_align_protein.
........
r27261 | bugman | 2015-01-21 15:59:28 +0100 (Wed, 21 Jan 2015) | 3 lines
Fix for the Needleman-Wunsch sequence alignment algorithm when the substitution matrix is absent.
........
r27262 | bugman | 2015-01-21 16:01:27 +0100 (Wed, 21 Jan 2015) | 5 lines
The lib.sequence_alignment.align_protein.align_pairwise() function now returns data.
This includes both alignment strings as well as the gap matrix.
........
r27263 | bugman | 2015-01-22 15:45:25 +0100 (Thu, 22 Jan 2015) | 3 lines
Annotated the BLOSUM62 substitution matrix with the amino acid codes for easy reading.
........
r27264 | bugman | 2015-01-22 15:53:43 +0100 (Thu, 22 Jan 2015) | 5 lines
Updated the gap penalties in the Test_align_protein.test_align_pairwise unit test.
This is from the unit test module _lib._sequence_alignment.test_align_protein.
........
r27265 | bugman | 2015-01-22 15:57:15 +0100 (Thu, 22 Jan 2015) | 8 lines
Modified the Needleman-Wunsch sequence alignment algorithm.
The previous attempt was buggy. The algorithm has been modified to match the logic of the GPL
licenced EMBOSS software (http://emboss.sourceforge.net/) to allow for gap opening and extension
penalties, as well as end penalties. No code was copied, rather the algorithm for creating the
scoring and penalty matrices, as well as the traceback matrix.
........
r27266 | bugman | 2015-01-22 16:01:39 +0100 (Thu, 22 Jan 2015) | 3 lines
Added a DNA similarity matrix to lib.sequence_alignment.substitution_matrices.
........
r27267 | bugman | 2015-01-22 16:09:34 +0100 (Thu, 22 Jan 2015) | 6 lines
Added sanity checks to the Needleman-Wunsch sequence alignment algorithm.
The residues of both sequences are now checked in needleman_wunsch_align() to make sure that they
are present in the substitution matrix.
........
r27268 | bugman | 2015-01-22 16:34:06 +0100 (Thu, 22 Jan 2015) | 5 lines
Added the NUC 4.4 nucleotide substitution matrix from ftp://ftp.ncbi.nih.gov/blast/matrices/.
Uracil was added to the table as a copy to T.
........
r27269 | bugman | 2015-01-22 16:35:49 +0100 (Thu, 22 Jan 2015) | 5 lines
Added the header from ftp://ftp.ncbi.nih.gov/blast/matrices/BLOSUM62.
This is to document the BLOSUM62 substitution matrix.
........
r27270 | bugman | 2015-01-22 16:39:34 +0100 (Thu, 22 Jan 2015) | 6 lines
Added the PAM 250 amino acid substitution matrix.
This was taken from ftp://ftp.ncbi.nih.gov/blast/matrices/PAM250 and added to
lib.sequence_alignment.substitution_matrices.PAM250.
........
r27271 | bugman | 2015-01-22 16:55:21 +0100 (Thu, 22 Jan 2015) | 6 lines
Modified the Test_needleman_wunsch.test_needleman_wunsch_align_DNA unit test to pass.
This is from the unit test module _lib._sequence_alignment.test_needleman_wunsch. The DNA sequences
were simplified so that the behaviour can be better predicted.
........
r27272 | bugman | 2015-01-22 16:56:35 +0100 (Thu, 22 Jan 2015) | 6 lines
Created the Test_needleman_wunsch.test_needleman_wunsch_align_NUC_4_4 unit test.
This is in the unit test module _lib._sequence_alignment.test_needleman_wunsch. This tests the
Needleman-Wunsch sequence alignment for two DNA sequences using the NUC 4.4 matrix.
........
r27273 | bugman | 2015-01-22 17:00:03 +0100 (Thu, 22 Jan 2015) | 7 lines
Created a unit test for demonstrating a failure in the Needleman-Wunsch sequence alignment algorithm.
The test is Test_needleman_wunsch.test_needleman_wunsch_align_NUC_4_4b from the
_lib._sequence_alignment.test_needleman_wunsch module. The problem is that the start of the
alignment is truncated if any gaps are present.
........
r27274 | bugman | 2015-01-22 17:07:58 +0100 (Thu, 22 Jan 2015) | 5 lines
Fix for the Needleman-Wunsch sequence alignment algorithm.
The start of the sequences are no longer truncated when starting gaps are encountered.
........
r27275 | bugman | 2015-01-22 17:20:07 +0100 (Thu, 22 Jan 2015) | 5 lines
The needleman_wunsch_align() function now accepts the end gap penalty arguments.
These are passed onto the needleman_wunsch_matrix() function.
........
r27276 | bugman | 2015-01-22 17:21:34 +0100 (Thu, 22 Jan 2015) | 3 lines
Added the end gap penalty arguments to lib.sequence_alignment.align_protein.align_pairwise().
........
r27277 | bugman | 2015-01-22 17:30:20 +0100 (Thu, 22 Jan 2015) | 9 lines
Created the Structure.test_align_CaM_BLOSUM62 system test.
This will be used for expanding the functionality of the structure.align user function to perform
true sequence alignment via the new lib.sequence_alignment package. The test aligns 3 calmodulin
(CaM) structures from different organisms, hence the sequence numbering is different and the current
structure.align user function design fails. The structure.align user function has been expanded in
the test to include a number of arguments for advanced sequence alignment.
........
r27278 | bugman | 2015-01-22 18:01:40 +0100 (Thu, 22 Jan 2015) | 5 lines
Added support for the PAM250 substitution matrix to the protein pairwise sequence alignment function.
This is the function lib.sequence_alignment.align_protein.align_pairwise().
........
r27279 | bugman | 2015-01-22 19:54:17 +0100 (Thu, 22 Jan 2015) | 6 lines
Bug fix for the Needleman-Wunsch sequence alignment algorithm.
Part of the scoring system was functioning incorrectly when the gap penalty scores were non-integer,
as some scores were being stored in an integer array. Now the array is a float array.
........
r27280 | bugman | 2015-01-22 19:55:49 +0100 (Thu, 22 Jan 2015) | 7 lines
Created the Test_align_protein.test_align_pairwise_PAM250 unit test.
This is in the unit test module _lib._sequence_alignment.test_align_protein. It check the protein
alignment function lib.sequence_alignment.align_protein.align_pairwise() together with the PAM250
substitution matrix.
........
r27281 | bugman | 2015-01-23 09:38:45 +0100 (Fri, 23 Jan 2015) | 3 lines
Small docstring expansion for lib.sequence_alignment.align_protein.align_pairwise().
........
r27282 | bugman | 2015-01-23 09:40:55 +0100 (Fri, 23 Jan 2015) | 8 lines
Added the sequence alignment arguments to the structure.align user function front end.
This includes the 'matrix', 'gap_open_penalty', 'gap_extend_penalty', 'end_gap_open_penalty', and
'end_gap_extend_penalty' arguments. The 'algorithm' argument has not been added to save room, as
there is only one choice of 'NW70'. A paragraph has been added to the user function description to
explain the sequence alignment part of the user function.
........
r27283 | bugman | 2015-01-23 09:42:22 +0100 (Fri, 23 Jan 2015) | 6 lines
Added the sequence alignment arguments to the back end of the structure.align user function.
This is to allow the code in trunk to be functional before the sequence alignment before
superimposition has been implemented.
........
r27284 | bugman | 2015-01-23 09:46:40 +0100 (Fri, 23 Jan 2015) | 6 lines
Removed the 'algorithm' argument from the Structure.test_align_CaM_BLOSUM62 system test script.
This is for the structure.align user function. The argument has not been implemented to save room
in the GUI, and as 'NW70' is currently the only choice.
........
r27285 | bugman | 2015-01-23 10:05:12 +0100 (Fri, 23 Jan 2015) | 5 lines
The sequence alignment arguments are now passed all the way to the internal structural object backend.
These are the arguments of the structure.align user function.
........
r27286 | bugman | 2015-01-23 10:45:59 +0100 (Fri, 23 Jan 2015) | 3 lines
Copyright notice updates to 2015.
........
r27287 | bugman | 2015-01-23 11:02:05 +0100 (Fri, 23 Jan 2015) | 7 lines
Created the lib.sequence.aa_codes_three_to_one() function.
The lib.sequence module now contains the AA_CODES dictionary which is a translation table for the 3
letter amino acid codes to the one letter codes. The new aa_codes_three_to_one() function performs
the conversion.
........
r27288 | bugman | 2015-01-23 11:03:35 +0100 (Fri, 23 Jan 2015) | 5 lines
Implemented the internal structural object MolContainer.loop_residues() method.
This generator method is used to quickly loop over all residues of the molecule.
........
r27289 | bugman | 2015-01-23 11:06:53 +0100 (Fri, 23 Jan 2015) | 7 lines
Implemented the internal structural object one_letter_codes() method.
This will create a string of one letter residue codes for the given molecule. Only proteins are
currently supported. This method uses the new lib.sequence.aa_codes_three_to_one() relax library
function.
........
r27290 | bugman | 2015-01-23 11:09:41 +0100 (Fri, 23 Jan 2015) | 7 lines
Sequence alignment is now performed in lib.structure.internal.coordinates.assemble_coord_array().
This is a pairwise alignment to the first molecule of the list. The alignments are not yet used for
anything. The assemble_coord_array() function is used by the structure.align user function, as well
as a few other structure user functions.
........
r27291 | bugman | 2015-01-23 15:38:21 +0100 (Fri, 23 Jan 2015) | 7 lines
Fix for the lib.sequence.aa_codes_three_to_one() function.
Non-standard residues are now converted to the '*' code. The value of 'X' prevents any type of
alignment of a stretch of X residues as X to X in both the BLOSUM62 and PAM250 substitution matrices
are set to -1.
........
r27293 | bugman | 2015-01-23 17:49:29 +0100 (Fri, 23 Jan 2015) | 6 lines
Modified the gap penalty arguments for the structure.align user function.
These now must always be supplied, as None is not handled by the backend
lib.sequence_alignment.needleman_wunsch module. The previous defaults of None are now set to 0.0.
........
r27294 | bugman | 2015-01-26 10:47:26 +0100 (Mon, 26 Jan 2015) | 7 lines
Updated the artificial diffusion tensor test suite data.
This is the data in test_suite/shared_data/diffusion_tensor. The residues in the PDB files are now
proper amino acids, so the HETATM records are now ATOM records, and the CONECT records have been
eliminated.
........
r27295 | bugman | 2015-01-26 10:50:14 +0100 (Mon, 26 Jan 2015) | 6 lines
Another update for the artificial diffusion tensor test suite data.
The number of increments on the sphere has been increased from 5 to 6, to make the vector
distribution truly uniform. All PDB files and relaxation data has been updated.
........
r27296 | bugman | 2015-01-26 11:06:30 +0100 (Mon, 26 Jan 2015) | 7 lines
Bug fix for the printouts from the relax_data.read user function.
This problem was introduced in the last relax release (at r26588). The problem is that the spin ID
in the loaded relaxation data printout is the same for all data, being the spin ID of the first
spin. This has no effect on how relax runs, it is only incorrect feedback.
........
r27297 | bugman | 2015-01-26 11:26:15 +0100 (Mon, 26 Jan 2015) | 7 lines
Changed the synthetic PDB for the artificial diffusion tensor test suite data.
The nitrogen and proton positions are now shifted 10 Angstrom along the distribution vectors. This
is to avoid having all nitrogens positioned at the origin which causes the internal structural
object algorithm for determining which atoms are connected to fail.
........
r27298 | bugman | 2015-01-26 11:29:38 +0100 (Mon, 26 Jan 2015) | 7 lines
Reintroduced the CONECT PDB records into the artificial diffusion tensor test suite data.
The uniform vector distributions have overlapping vectors. This causes the internal structural
object atom connection determining algorithm to fail, as this is distance-based rather than using
the PDB amino acid definitions for now.
........
r27299 | bugman | 2015-01-26 11:45:40 +0100 (Mon, 26 Jan 2015) | 7 lines
Bug fix for the structure.read_pdb user function parsing of CONECT records.
CONECT records pointing to ATOM records were not being read by the user function. As ATOM records
should not require CONECT records by their definition, this is only a minor problem affecting
synthetic edge cases.
........
r27300 | bugman | 2015-01-26 14:25:42 +0100 (Mon, 26 Jan 2015) | 6 lines
Updates for the Structure.test_create_diff_tensor_pdb_sphere system test.
The test now uses the sphere synthetic relaxation data rather than the ellipsoid data, and the PDB
checking has been updated for the new data.
........
r27301 | bugman | 2015-01-26 14:33:32 +0100 (Mon, 26 Jan 2015) | 6 lines
Updates for the Structure.test_create_diff_tensor_pdb_prolate system test.
The test now uses the spheroid synthetic relaxation data rather than the ellipsoid data, and the PDB
checking has been updated for the new data.
........
r27302 | bugman | 2015-01-26 14:42:15 +0100 (Mon, 26 Jan 2015) | 7 lines
Updates for the Structure.test_create_diff_tensor_pdb_oblate system test.
The test now uses the spheroid synthetic relaxation data rather than the ellipsoid data, and the PDB
checking has been updated for the new data. The oblate tensor is now forced in the system test
script.
........
r27303 | bugman | 2015-01-26 14:48:45 +0100 (Mon, 26 Jan 2015) | 5 lines
Updates for the Structure.test_create_diff_tensor_pdb_ellipsoid system test.
The PDB checking has been updated for the new data.
........
r27304 | bugman | 2015-01-26 14:58:55 +0100 (Mon, 26 Jan 2015) | 6 lines
Updated the Structure.test_delete_atom system test for the changed PDB structures.
The test_suite/shared_data/diffusion_tensor/spheroid/uniform.pdb file now has more residues, and the
atomic positions are different.
........
r27305 | bugman | 2015-01-26 15:03:00 +0100 (Mon, 26 Jan 2015) | 6 lines
Updated the Structure.test_align system testt for the changed PDB structures.
The test_suite/shared_data/diffusion_tensor/spheroid/uniform.pdb file now has more residues, and the
atomic positions are different.
........
r27306 | bugman | 2015-01-26 15:04:12 +0100 (Mon, 26 Jan 2015) | 6 lines
Updated the Structure.test_align_molecules system test for the changed PDB structures.
The test_suite/shared_data/diffusion_tensor/spheroid/uniform.pdb file now has more residues, and the
atomic positions are different.
........
r27307 | bugman | 2015-01-26 15:19:18 +0100 (Mon, 26 Jan 2015) | 5 lines
Python 3 fix for the lib.sequence module.
The string.upper() function no longer exists.
........
r27308 | bugman | 2015-01-26 15:20:09 +0100 (Mon, 26 Jan 2015) | 5 lines
Python 3 fix for the lib.sequence_alignment.align_protein module.
The string.upper() function no longer exists.
........
r27309 | bugman | 2015-01-26 15:22:12 +0100 (Mon, 26 Jan 2015) | 5 lines
Modified the generate_data.py diffusion tensor to relaxation data creation script.
The NH vectors are no longer truncated to match the PDB.
........
r27310 | bugman | 2015-01-26 15:22:43 +0100 (Mon, 26 Jan 2015) | 5 lines
Python 3 fix for the generate_data.py diffusion tensor to relaxation data creation script.
The string.upper() function no longer exists.
........
r27311 | bugman | 2015-01-26 16:10:14 +0100 (Mon, 26 Jan 2015) | 6 lines
Reintroduced the simulated PDB truncation into the artificial diffusion tensor test suite data.
This is different to the previous implementation which was deleted recently. It now simulates the
truncation of both the N and H positions in the PDB and reconstructs the expected vector.
........
r27312 | bugman | 2015-01-26 16:52:56 +0100 (Mon, 26 Jan 2015) | 8 lines
Updates for some of the Structure.test_create_diff_tensor_pdb_* system tests.
This includes Structure.test_create_diff_tensor_pdb_ellipsoid,
Structure.test_create_diff_tensor_pdb_oblate, and Structure.test_create_diff_tensor_pdb_prolate.
The new simulated PDB truncation in the test data causes the PDB files created in these tests to be
slightly different.
........
r27313 | bugman | 2015-01-26 17:44:46 +0100 (Mon, 26 Jan 2015) | 21 lines
The pairwise sequence alignment is now active in the structure.align user function.
This is implemented in the lib.structure.internal.coordinates.assemble_coord_array() function for
assembling atomic coordinates. It will also automatically be used by many of the structure user
functions which operate on multiple structures.
The atomic coordinate assembly logic has been completely changed. Instead of grouping atomic
information by the molecule, it is now grouped per residue. This allows the residue based sequence
alignments to find matching coordinate information.
The assemble_coord_array() function will also handle the algorithm argument set to None and assume
that the residue sequences are identical between the structures, but this should be avoided.
A new function, common_residues() has been created as a work-around for not having a multiple
sequence alignment implementation. It will take the pairwise sequence alignment information and
construct a special data structure specifying which residues are present in all structures.
The logic for skipping missing atoms remains in place, but it now operates on the residue rather
than molecule level and simply uses the atom name rather than atom ID to identify common atoms.
........
r27314 | bugman | 2015-01-26 17:45:27 +0100 (Mon, 26 Jan 2015) | 3 lines
Changed the gap opening penalty to 10 in the N-state model structure_align.py system test script.
........
r27315 | bugman | 2015-01-26 17:46:13 +0100 (Mon, 26 Jan 2015) | 5 lines
Docstring update for the pipe_control.structure.main.assemble_coordinates() function.
This is for the algorithm argument which can now be set to None.
........
r27316 | bugman | 2015-01-26 18:29:06 +0100 (Mon, 26 Jan 2015) | 8 lines
Fix for the sequence alignment for assembling atomic coordinates.
This caused the Structure.test_superimpose_fit_to_mean system test to fail. The problem was in the
new logic of the lib.structure.internal.coordinates.assemble_coord_array() function. The coordinate
assembly now terminates when either the end of the first molecule or the current molecule is
reached.
........
r27317 | bugman | 2015-01-26 19:11:26 +0100 (Mon, 26 Jan 2015) | 6 lines
Bug fixes for the new lib.structure.internal.coordinates.common_residues() function.
This function for determining the common residues between multiple sets of pairwise alignments was
failing in quite a number of cases. The logic has been updated to handle these.
........
r27318 | bugman | 2015-01-26 19:38:48 +0100 (Mon, 26 Jan 2015) | 5 lines
Another fix for the lib.structure.internal.coordinates.common_residues() function.
The wrong index was being used to skip residues in the second sequence.
........
r27319 | bugman | 2015-01-26 19:39:36 +0100 (Mon, 26 Jan 2015) | 3 lines
Removal of debugging printouts.
........