From: John M. <joh...@gm...> - 2013-10-16 13:02:40
|
> What does this do? And how will this make things better/faster? Micro optimisation but - http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#intern%28%29 Basically allows you to have a single reference for the same string. The compiler does this for inline strings but not when reading from IO. This does use permgen space - but permgen is becoming metaspace in Java 1.8 - http://java.dzone.com/articles/java-8-permgen-metaspace so don't worry about that. Example shows that different reference - gets replaced by the same reference: String a = new String("Carbon"); String b = new String("Carbon"): a == b : false a.intern() == b.intern() : true > Yeah, some more indices could make sense, but particularly if the > class is a singleton, so that the indices get reused when ever the > factory is used. Or not? Even if not - the indices are relatively quite small. > OK, another corner of Java I do not know. What a fixed precision > decimal? How do I use that? Using arbitrary precession 1.0 - 0.9 = 0.09999999999999998 Fixed precision means - I am accurate to a fixed precession in this case we would need 1 decimal place. To work to 1 decimal place we multiple by a factor 10 and can use integers. 10 - 9 = 1 10/10 - 9/10 = 1/10 Depends what you want, how accurate the masses need to be? This fixed precessions is really only good if you need to do numerical operations. > Good idea. I have little experience with binary formats, but worth learning... Yep, no need for record separators either. Using streams: IsotopeFactory isotopeFactory = IsotopeFactory.getInstance(SilentChemObjectBuilder.getInstance()); String path = System.getProperty("user.home") + "/bodr-isotopes"; FileOutputStream fos = new FileOutputStream(path); DataOutput dout = new DataOutputStream(fos); IIsotope[] isotopes = isotopeFactory.getIsotopes(); dout.writeInt(isotopes.length); for (IIsotope isotope : isotopes) { dout.writeUTF(isotope.getSymbol()); dout.writeInt(isotope.getAtomicNumber()); dout.writeInt(isotope.getMassNumber()); dout.writeDouble(isotope.getExactMass()); dout.writeDouble(isotope.getNaturalAbundance()); } fos.close(); FileInputStream fin = new FileInputStream(path); DataInput din = new DataInputStream(fin); int n = din.readInt(); for (int i = 0; i < n; i++) { String symbol = din.readUTF().intern(); int elem = din.readInt(); int mass = din.readInt(); double exactMass = din.readDouble(); double natAbund = din.readDouble(); } fin.close(); or using buffers - strings are a little tricky but actually you can just omit them and load the symbols elsewhere. Note the buffers + memory mapping is really really fast :-). File size is a bout the same as the text as '0.0' takes up 8 bytes when written as binary. IsotopeFactory isotopeFactory = IsotopeFactory.getInstance(SilentChemObjectBuilder.getInstance()); String path = System.getProperty("user.home") + "/bodr-isotopes"; IIsotope[] isotopes = isotopeFactory.getIsotopes(); ByteBuffer bout = ByteBuffer.allocate(100000); bout.putInt(isotopes.length); for (IIsotope isotope : isotopes) { // chars a little more tricky bout.putInt(isotope.getAtomicNumber()); bout.putInt(isotope.getMassNumber()); bout.putDouble(isotope.getExactMass()); bout.putDouble(isotope.getNaturalAbundance()); } bout.limit(bout.position()).position(0); FileChannel fc = new FileOutputStream(path).getChannel(); fc.write(bout); fc.close(); FileChannel fcIn = new FileInputStream(path).getChannel(); ByteBuffer bin = fcIn.map(FileChannel.MapMode.READ_ONLY, 0, new File(path).length()); int n = bin.getInt(); for (int i = 0; i < n; i++) { int elem = bin.getInt(); int mass = bin.getInt(); double exactMass = bin.getDouble(); double natAbund = bin.getDouble(); } fcIn.close(); On 16 Oct 2013, at 11:52, Egon Willighagen <ego...@gm...> wrote: > John, > > On Wed, Oct 16, 2013 at 12:45 PM, John May <joh...@gm...> wrote: >> Do you want me to patch now? Changes suggested below can be done afterwards > > I like to do them now. So that I learn, but please educate me a bit... > >> - intern the string symbol, 'fields[0].trim().intern()' > > What does this do? And how will this make things better/faster? > >> - HashMaps for symbol/element lookup, TreeMap for the range lookups. > > Yeah, some more indices could make sense, but particularly if the > class is a singleton, so that the indices get reused when ever the > factory is used. Or not? > >> - could store the decimal numbers as fixed precision rather than arbitrary precision (floating point). Probably not worth it though. > > OK, another corner of Java I do not know. What a fixed precision > decimal? How do I use that? > >> - I don't think there is much benefit to have it as a singleton, if it loads faster enough let the invokee decided when to >> keep it around. > > Possibly. What about indices? See above... > >> - unsupported methods could throw UnsupportedOperationException > > Yeah, I have considered that... I think post 1.5 I will propose my > patch to split mutable/immutable CDK interfaces... > >> - if the code generate the file from the XML maybe writing in binary instead - smaller, faster to read, can only be changed using the BODR xml? > > Good idea. I have little experience with binary formats, but worth learning... > > Egon > > > -- > Dr E.L. Willighagen > Postdoctoral Researcher > Department of Bioinformatics - BiGCaT > Maastricht University (http://www.bigcat.unimaas.nl/) > Homepage: http://egonw.github.com/ > LinkedIn: http://se.linkedin.com/in/egonw > Blog: http://chem-bla-ics.blogspot.com/ > PubList: http://www.citeulike.org/user/egonw/tag/papers > ORCID: 0000-0001-7542-0286 > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk > _______________________________________________ > Cdk-devel mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-devel |