#46 issues with large PDB files > 99.999 atoms

open
5
2010-08-24
2010-08-24
Raik Gruenberg
No

1) Reading large PDB files

PDB files with more than 99.999 atoms may not be loaded correctly (full example pending).

2) Writing large PDB files.

Works, but the atom IDs above 99.999 are reset to 0, 1, 2, etc.

Example::
>>> m1 = PDBModel('1ffk') ## ribosome structure with 64000 atoms
>>> m2 = m1.clone()
>>> m3 = m1.concat( m2 ) ## works, 'serial_number' fields remain at original

>>> import numpy as N
>>> m3['serial_number'] = N.arange( len(m3) ) + 1 ## consecutive numbering
>>> m3.writePdb('~/test_large.pdb')

>>> m4 = PDBModel('~/test_large.pdb') ## serial_number profile is now DIFFERENT from m3
>>> m4['serial_number'] == m3['serial_number']
array([ True, True, True, ..., False, False, False], dtype=bool)

Diagnosis

This must be a limitation in the Scientific.IO.PDBFile parser module. They are using Fortran format parsing and may have fixed the serial field to 4 digits.

Discussion