<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Recent changes to Blog</title><link>https://sourceforge.net/p/nmrshiftdb2/wiki/Blog/</link><description>Recent changes to Blog</description><atom:link href="https://sourceforge.net/p/nmrshiftdb2/wiki/Blog/feed" rel="self"/><language>en</language><lastBuildDate>Tue, 24 Sep 2013 20:39:31 -0000</lastBuildDate><atom:link href="https://sourceforge.net/p/nmrshiftdb2/wiki/Blog/feed" rel="self" type="application/rss+xml"/><item><title>Blog modified by Stefan Kuhn</title><link>https://sourceforge.net/p/nmrshiftdb2/wiki/Blog/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v18
+++ v19
@@ -1,84 +1 @@
-# 2013-9-14 javadbchem release
-
-A short message only indirectly related to nmrshiftdb2 - I made the chemical database system behind nmrshiftdb2 a project in it's own, to be found on [sourceforge.net](http://sourceforge.net/projects/javadbchem/). This can serve as a chemical database "cartridge", but works a bit differently from the common systems. It is possible to assign properties to bonds and atoms on the database level using referential integrities. If anybody is interested in using this, I would provide any assistence which might be necessary. The principles are tested, since used in nmrshiftdb2, but the build system of the new project might need some tuning.
-
-# 2013-8-31 On licencing
-
-A did some brainstorming in nmrshiftdb2 software licencing (the licence for the data is a different thing). Until now, the software is licenced under an "artistic licence", which you can read [here](https://sourceforge.net/p/nmrshiftdb2/code/520/tree/trunk/nmrshiftdb2/License.txt). This was chosen 11 years ago when the NMRShiftDB project started because it was considered a liberal licence. Whilst there have not been real problems so far, there are at least two potential pitfalls: a) This licence is [not considered a free licence by the Free Software Foundation](http://www.gnu.org/licenses/license-list.html#ArtisticLicense) b) Integration of GPL Software is not possible.
-
-Therefore, I am thinking about a relicencing. I think the new licence will be the GPL, since I like the idea and I don't think that the "viral" effect is a problem here. Now the whole issue of relicencing is tricky and I can't give legal advise, but for my understanding the situation is as follows:
-
-* I (Stefan Kuhn) can license the material I wrote under as many licences I like
-* Contributions by other people cannot change their licence without consent of authors
-* Old versions of the software will always be available under the conditions aka licence under which they where originally release
-
-So if there are no other ideas nmrshiftb2 from the next release (coming soon hopefully) will be licenced under GPL. The old releases will stay as they are and material included in the new releases by third people will still retain it's original artistic licence.
-
-EDIT: I just noticed the OSI approves the Artistic Licence 1.0 (http://opensource.org/licenses/Artistic-1.0). I still like the idea of using a more popular licence.
-
-# 2013-2-13 Removal of Duplicates
-
-Since last year, I did some work on removing duplicate structures from the database (the spectra were kept, of course, and all assigned to one structure). Just in case somebody wonders what happened so that duplicate structures occured, here is the main reason (the other reason are simple software bugs e. g. in SMILES generation or in the code doing the save to the database): nmrshiftdb2 requires information about which configurations around double bonds are actual E/Z-configurations as drawn and which are unspecified and just drawn randomly in one way (I still think that without this information, real decisions about structure identity cannot be made). In the past, this information was also asked in small rings where actual E/Z-configurations cannot occur. If in one case the bond was declared as "unspecified" and in the other as "as drawn", two entries with different IDs for the same structure were made. A typical example is [molecule ID 2505](http://nmrshiftdb.nmr.uni-koeln.de/portal/js_pane/P-Results/nmrshiftdbaction/showDetailsFromHome/molNumber/2505) \- this existed twice (the other ID was 20039308, but this has disappeared, really the only case where ID disappear) with the bond from 5 to 6 being considered in Z configuration in one of them, unspecified in the other. These have now been merged and 2505 has two spectra (both 13C here, but many of the merges lead to structures now having 13C and 1H spectra). 90 such merges happened. 
-
-# 2012-11-28 Thomson Reuters co-operation
-
-As announced earlier in the news, nmrshiftb2 is part of the Thomson Reuters Data Citation Indes (DCI). I did not yet have a chance to test it myself (no subscription at the local university), but I can show some screenshots showing the entry for nmrshiftdb2 (top) and two examples molecules with links to nmrshiftdb2. I think CDI is a good start to extend bibliographies beyond "old style" literature. Click on thumbnails below for example screenshots.
-
-[&lt;img src="http://sourceforge.net/p/nmrshiftdb2/wiki/Blog/attachment/Nmrshiftdb2_dci.jpg" width="200" /&gt;](http://sourceforge.net/p/nmrshiftdb2/wiki/Blog/attachment/Nmrshiftdb2_dci.jpg) [&lt;img src="https://sourceforge.net/p/nmrshiftdb2/wiki/Blog/attachment/Nmrshiftdb2_dci_example1.jpg" width="200" /&gt;](https://sourceforge.net/p/nmrshiftdb2/wiki/Blog/attachment/Nmrshiftdb2_dci_example1.jpg) [&lt;img src="http://sourceforge.net/p/nmrshiftdb2/wiki/Blog/attachment/Nmrshiftdb2_dci_example1.jpg" width="200" /&gt;](http://sourceforge.net/p/nmrshiftdb2/wiki/Blog/attachment/Nmrshiftdb2_dci_example1.jpg)
-
-
-# 2012-11-26 Improved prediction in nmrshiftdb2 1.4.2
-
-A new release was done over the weekend. The [changelog](http://nmrshiftdb2.svn.sourceforge.net/viewvc/nmrshiftdb2/trunk/nmrshiftdb2/CHANGELOG?revision=365&amp;amp;view=markup) gives details. I want to emphasize a bit the improved prediction. This is an extended HOSE code system, where chiral configurations are coded into the HOSE code. They are given relative to the chiral centre or the double bond in case of E/Z configurations, so it does not depend on 3d coordinates. I think this overcomes a major problem with schemes based on 3d coordinates, where it is difficult to establish the actual coordinates measured (of course it's easy to establish some coordinates). Examples of what the prediction can do are shown in the [help](http://www.nmrshiftdb.org/portal/js_pane/P-Help?URL=using.html#predict). Of course quality depends on what is in the database and chiral centres must be specified by wedge bonds. So, what does this give altogether? Doing a 10 fold cross validation gives the following results for the old prediction: 
-
-Atoms with prediction error &amp;gt;10 ppm: 16382
-10&amp;gt;prediction error&amp;gt;=5: 35675
-prediction error &amp;lt;5: 280206 
-
-Using the new prediction: 
-
-Atoms with prediction error &amp;gt;10 ppm: 15106
-10&amp;gt;prediction error&amp;gt;=5: 34553
-prediction error &amp;lt;5: 282448
-
-So we have a clear improvement, but it does not look too much. This confirms that 3d configurations do overall not play a major role for 13C shifts, but of course in some compounds they do and if we want to distinguish stereoisomers we need to consider stereo configurations. So I think this an improvement and a good step forward.
-
-# 2012-8-10 Bibliographic data in nmrshiftdb2
-
-Over the past few days, I reworked the bibliographic data in nmrshiftdb2 for our new collaboration with Thomson-Reuters (see the [nmrshiftdb2 news](https://sourceforge.net/news/?group_id=348458)). Some of the older literature references were freetext fields and therefore not in a specific format. These have all been changed to the newer model where authors, title etc. are saved separately. Plus some obvious errors have been corrected. It is now possible to export in any desired format, e. g. bibtex. The file download at [nmrshiftdb2 download page](http://nmrshiftdb2.svn.sourceforge.net/viewvc/nmrshiftdb2/trunk/snapshots/nmrshiftdb2.xml) actually contains bibtex, so an automated processing is now possible. Of course this is not a guarantee that there are no typos etc., but it definitely is a step forward.
-
-# 2012-6-28 Chemaxon NMR predictor based on nmrshiftdb2 data
-
-[Chemaxon](http://www.chemaxon.com) recently announced an NMR predictor for their Marvin sketcher. This is based on nmrshiftdb2 data. I came across it at the really great User Group meeting in Budapest and I give my impressions here. Many thanks to Chemaxon for properly acknowledging nmrshiftdb2 and this criticism is meant as an encouragement, of course.
-
-So Chemaxon, according to their presentation, took the nmrshiftdb2 data and calculated physicochemical and topological descriptors for the atoms. They use these to train multilinear least-squares regression (MLR) and support vector machine (SVM) models and do predictions based on these. So this is, for my understanding, an artificial intelligence approach.
-
-They give this quality measurement for carbon shifts:
-
-Error &amp;lt; 5 ppm for 77% of all atoms
-
-Error &amp;lt; 10 ppm for 93% of all atoms
-
-On first sight, I would say this is not overwhelming. In order to see better how this compares, I did a 100 fold crossvalidation of the current nmrshiftdb2 database using the ordinary HOSE code prediction. What I get is this:
-
-Error &amp;lt; 5 ppm for 84% of all atoms
-
-Error &amp;lt; 10 ppm for 95% of all atoms
-
-So it looks like the HOSE code prediction actually does better. Notice that this is not strictly true, since the evaluations might be different (Chemaxon doesn't say if they do crossvalidation, but I suppose they do. Even then it matters how the junks are defined etc.). Still, I would say the overall impression is that this is roughly as good as HOSE codes.
-
-Some particular observations (this was done using NMR Prediction Beta 5.8):
-
-* Doing a prediction for Pyrolle gives shifts 113.92 and and 122.53. Pyrolle of course is in the data and its shifts are 116.5/117.3 and 106.5/107.6 (two carbon spectra exist). So Marvin here is more than 5 ppm off for a known structure. This is, according to my experience, a major trouble with AI methods: They can go wrong on data which are part of the training set (HOSE codes never can).
-* I tried a structure which is not in nmrshiftdb2 and where the HOSE code prediction gives very good results because 4 or more spheres are used everywhere. Here, the Chemaxon results are mixed: Some shifts are good, some rather bad.
-* Another structure which is not in nmrshiftdb2 and where the HOSE code prediction performs very badly on some atoms gives better (but still not good) results for the same atoms in Marvin and good results for those where the HOSE code prediction does well. So it looks like the strengths and the weaknesses are similar. On the other hand, there are improvements, which is, I think, due to AI methods doing better "interpolation".
-* Stereochemistry seems not to be used in Marvin. This is also not done with the current HOSE code, but it would be something a physicochemical model could or should do.
-* The Marvin interface says "Frequency 500.0". Considering that for many (probably most) spectra in nmrshiftdb2, we do not have the frequencies (unfortunately), it seems rather bold to say a prediction is for a particular frequency (plus I can't find how to change it).
-
-A nice feature is that they give a quality mark for each predicted shift.
-
-Generally, this confirms my old impression that AI methods have the advantage of better "interpolation", but they also can give bad results in cases where they should work. As said, this is not meant to be negative, just my observations. It's good to see the field is worked on. Chemaxon might improve, and so will nmrshiftdb2 - an improved prediction is in the pipeline, restricted unfortunately by my time constraints.
-
-# 2012-6-28 Introduction
-
-This pages is used as a sort of blog by me (that is Stefan Kuhn). I will post occasional observations, remarks and thoughts about nmrshiftdb2 or its environment here. These should neither be news (which are on the [nmrshiftdb home page](http://www.nmrshiftdb.org)) nor help topics (which are in the [help pages](http://www.nmrshiftdb.org/portal/media-type/html/user/anon/page/default.psml/js_pane/P-Help)) nor pure technical topics, which are elsewhere in this wiki - really all the rest goes in here. I will see if this becomes a proper blog or something else over time.
+In the past, I maintained a pseudo blog here. Since sourceforge now offers a blog software, I started a real blog, and also put in the old posts there. The new blog is at [https://sourceforge.net/p/nmrshiftdb2/blog/](https://sourceforge.net/p/nmrshiftdb2/blog/).
&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Stefan Kuhn</dc:creator><pubDate>Tue, 24 Sep 2013 20:39:31 -0000</pubDate><guid>https://sourceforge.net576232698edbc34be5a4ebaa8cf65fbc88459c1d</guid></item><item><title>Blog modified by Stefan Kuhn</title><link>https://sourceforge.net/p/nmrshiftdb2/wiki/Blog/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v17
+++ v18
@@ -1,3 +1,7 @@
+# 2013-9-14 javadbchem release
+
+A short message only indirectly related to nmrshiftdb2 - I made the chemical database system behind nmrshiftdb2 a project in it's own, to be found on [sourceforge.net](http://sourceforge.net/projects/javadbchem/). This can serve as a chemical database "cartridge", but works a bit differently from the common systems. It is possible to assign properties to bonds and atoms on the database level using referential integrities. If anybody is interested in using this, I would provide any assistence which might be necessary. The principles are tested, since used in nmrshiftdb2, but the build system of the new project might need some tuning.
+
 # 2013-8-31 On licencing

 A did some brainstorming in nmrshiftdb2 software licencing (the licence for the data is a different thing). Until now, the software is licenced under an "artistic licence", which you can read [here](https://sourceforge.net/p/nmrshiftdb2/code/520/tree/trunk/nmrshiftdb2/License.txt). This was chosen 11 years ago when the NMRShiftDB project started because it was considered a liberal licence. Whilst there have not been real problems so far, there are at least two potential pitfalls: a) This licence is [not considered a free licence by the Free Software Foundation](http://www.gnu.org/licenses/license-list.html#ArtisticLicense) b) Integration of GPL Software is not possible.
&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Stefan Kuhn</dc:creator><pubDate>Sat, 14 Sep 2013 20:03:49 -0000</pubDate><guid>https://sourceforge.netb974e5cc138e2c746137e143dd67d6ea0aca9d28</guid></item><item><title>Blog modified by Stefan Kuhn</title><link>https://sourceforge.net/p/nmrshiftdb2/wiki/Blog/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v16
+++ v17
@@ -9,6 +9,8 @@
 * Old versions of the software will always be available under the conditions aka licence under which they where originally release

 So if there are no other ideas nmrshiftb2 from the next release (coming soon hopefully) will be licenced under GPL. The old releases will stay as they are and material included in the new releases by third people will still retain it's original artistic licence.
+
+EDIT: I just noticed the OSI approves the Artistic Licence 1.0 (http://opensource.org/licenses/Artistic-1.0). I still like the idea of using a more popular licence.

 # 2013-2-13 Removal of Duplicates

&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Stefan Kuhn</dc:creator><pubDate>Mon, 09 Sep 2013 19:20:26 -0000</pubDate><guid>https://sourceforge.net8e203930d41759041b501ae6da586973c09606b6</guid></item><item><title>Blog modified by Stefan Kuhn</title><link>https://sourceforge.net/p/nmrshiftdb2/wiki/Blog/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v15
+++ v16
@@ -4,9 +4,9 @@

 Therefore, I am thinking about a relicencing. I think the new licence will be the GPL, since I like the idea and I don't think that the "viral" effect is a problem here. Now the whole issue of relicencing is tricky and I can't give legal advise, but for my understanding the situation is as follows:

-*I (Stefan Kuhn) can license the material I wrote under as many licences I like
-*Contributions by other people cannot change their licence without consent of authors
-*Old versions of the software will always be available under the conditions aka licence under which they where originally release
+* I (Stefan Kuhn) can license the material I wrote under as many licences I like
+* Contributions by other people cannot change their licence without consent of authors
+* Old versions of the software will always be available under the conditions aka licence under which they where originally release

 So if there are no other ideas nmrshiftb2 from the next release (coming soon hopefully) will be licenced under GPL. The old releases will stay as they are and material included in the new releases by third people will still retain it's original artistic licence.

&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Stefan Kuhn</dc:creator><pubDate>Sat, 31 Aug 2013 12:56:50 -0000</pubDate><guid>https://sourceforge.net6a8e61a543ab0d398f63617069f26c87c20cd8af</guid></item><item><title>Blog modified by Stefan Kuhn</title><link>https://sourceforge.net/p/nmrshiftdb2/wiki/Blog/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v14
+++ v15
@@ -1,3 +1,15 @@
+# 2013-8-31 On licencing
+
+A did some brainstorming in nmrshiftdb2 software licencing (the licence for the data is a different thing). Until now, the software is licenced under an "artistic licence", which you can read [here](https://sourceforge.net/p/nmrshiftdb2/code/520/tree/trunk/nmrshiftdb2/License.txt). This was chosen 11 years ago when the NMRShiftDB project started because it was considered a liberal licence. Whilst there have not been real problems so far, there are at least two potential pitfalls: a) This licence is [not considered a free licence by the Free Software Foundation](http://www.gnu.org/licenses/license-list.html#ArtisticLicense) b) Integration of GPL Software is not possible.
+
+Therefore, I am thinking about a relicencing. I think the new licence will be the GPL, since I like the idea and I don't think that the "viral" effect is a problem here. Now the whole issue of relicencing is tricky and I can't give legal advise, but for my understanding the situation is as follows:
+
+*I (Stefan Kuhn) can license the material I wrote under as many licences I like
+*Contributions by other people cannot change their licence without consent of authors
+*Old versions of the software will always be available under the conditions aka licence under which they where originally release
+
+So if there are no other ideas nmrshiftb2 from the next release (coming soon hopefully) will be licenced under GPL. The old releases will stay as they are and material included in the new releases by third people will still retain it's original artistic licence.
+
 # 2013-2-13 Removal of Duplicates

 Since last year, I did some work on removing duplicate structures from the database (the spectra were kept, of course, and all assigned to one structure). Just in case somebody wonders what happened so that duplicate structures occured, here is the main reason (the other reason are simple software bugs e. g. in SMILES generation or in the code doing the save to the database): nmrshiftdb2 requires information about which configurations around double bonds are actual E/Z-configurations as drawn and which are unspecified and just drawn randomly in one way (I still think that without this information, real decisions about structure identity cannot be made). In the past, this information was also asked in small rings where actual E/Z-configurations cannot occur. If in one case the bond was declared as "unspecified" and in the other as "as drawn", two entries with different IDs for the same structure were made. A typical example is [molecule ID 2505](http://nmrshiftdb.nmr.uni-koeln.de/portal/js_pane/P-Results/nmrshiftdbaction/showDetailsFromHome/molNumber/2505) \- this existed twice (the other ID was 20039308, but this has disappeared, really the only case where ID disappear) with the bond from 5 to 6 being considered in Z configuration in one of them, unspecified in the other. These have now been merged and 2505 has two spectra (both 13C here, but many of the merges lead to structures now having 13C and 1H spectra). 90 such merges happened. 
&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Stefan Kuhn</dc:creator><pubDate>Sat, 31 Aug 2013 12:56:21 -0000</pubDate><guid>https://sourceforge.netc69d59329fa0d4f78cd00ab0f32c0d4bc1e31fd9</guid></item><item><title>Blog modified by Stefan Kuhn</title><link>https://sourceforge.net/p/nmrshiftdb2/wiki/Blog/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v13
+++ v14
@@ -4,7 +4,10 @@

 # 2012-11-28 Thomson Reuters co-operation

-As announced earlier in the news, nmrshiftb2 is part of the Thomson Reuters Data Citation Indes (DCI). I did not yet have a chance to test it myself (no subscription at the local university), but I can show some screenshots showing the entry for nmrshiftdb2 (top) and two examples molecules with links to nmrshiftdb2. I think CDI is a good start to extend bibliographies beyond "old style" literature. See the attachments for example screenshots.
+As announced earlier in the news, nmrshiftb2 is part of the Thomson Reuters Data Citation Indes (DCI). I did not yet have a chance to test it myself (no subscription at the local university), but I can show some screenshots showing the entry for nmrshiftdb2 (top) and two examples molecules with links to nmrshiftdb2. I think CDI is a good start to extend bibliographies beyond "old style" literature. Click on thumbnails below for example screenshots.
+
+[&lt;img src="http://sourceforge.net/p/nmrshiftdb2/wiki/Blog/attachment/Nmrshiftdb2_dci.jpg" width="200" /&gt;](http://sourceforge.net/p/nmrshiftdb2/wiki/Blog/attachment/Nmrshiftdb2_dci.jpg) [&lt;img src="https://sourceforge.net/p/nmrshiftdb2/wiki/Blog/attachment/Nmrshiftdb2_dci_example1.jpg" width="200" /&gt;](https://sourceforge.net/p/nmrshiftdb2/wiki/Blog/attachment/Nmrshiftdb2_dci_example1.jpg) [&lt;img src="http://sourceforge.net/p/nmrshiftdb2/wiki/Blog/attachment/Nmrshiftdb2_dci_example1.jpg" width="200" /&gt;](http://sourceforge.net/p/nmrshiftdb2/wiki/Blog/attachment/Nmrshiftdb2_dci_example1.jpg)
+

 # 2012-11-26 Improved prediction in nmrshiftdb2 1.4.2

&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Stefan Kuhn</dc:creator><pubDate>Tue, 25 Jun 2013 17:51:06 -0000</pubDate><guid>https://sourceforge.net193fecbafcf1e59ed028d5a3f025d75525edf4ca</guid></item><item><title>Blog modified by Stefan Kuhn</title><link>https://sourceforge.net/p/nmrshiftdb2/wiki/Blog/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v12
+++ v13
@@ -24,7 +24,7 @@

 # 2012-8-10 Bibliographic data in nmrshiftdb2

-Over the past few days, I reworked the bibliographic data in nmrshiftdb2 for our new collaboration with Thomson-Reuters (see the [nmrshiftdb2 news](https://sourceforge.net/news/?group_id=348458)). Some of the older literature references where freetext fields and therefore not in a specific format. These have all been changed to the newer model where authors, title etc. are saved separately. Plus some obvious errors have been corrected. It is now possible to export in any desired format, e. g. bibtex. The file download at [nmrshiftdb2 download page](http://nmrshiftdb2.svn.sourceforge.net/viewvc/nmrshiftdb2/trunk/snapshots/nmrshiftdb2.xml) actually contains bibtex, so an automated processing is now possible. Of course this is not a guarantee that there are no typos etc., but it definitely is a step forward.
+Over the past few days, I reworked the bibliographic data in nmrshiftdb2 for our new collaboration with Thomson-Reuters (see the [nmrshiftdb2 news](https://sourceforge.net/news/?group_id=348458)). Some of the older literature references were freetext fields and therefore not in a specific format. These have all been changed to the newer model where authors, title etc. are saved separately. Plus some obvious errors have been corrected. It is now possible to export in any desired format, e. g. bibtex. The file download at [nmrshiftdb2 download page](http://nmrshiftdb2.svn.sourceforge.net/viewvc/nmrshiftdb2/trunk/snapshots/nmrshiftdb2.xml) actually contains bibtex, so an automated processing is now possible. Of course this is not a guarantee that there are no typos etc., but it definitely is a step forward.

 # 2012-6-28 Chemaxon NMR predictor based on nmrshiftdb2 data

&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Stefan Kuhn</dc:creator><pubDate>Tue, 25 Jun 2013 14:06:08 -0000</pubDate><guid>https://sourceforge.netf7930ed0fdb356b442736741bb3b491cefa37b0d</guid></item><item><title>Blog modified by Stefan Kuhn</title><link>https://sourceforge.net/p/nmrshiftdb2/wiki/Blog/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v11
+++ v12
@@ -4,14 +4,60 @@

 # 2012-11-28 Thomson Reuters co-operation

-As announced earlier in the news, nmrshiftb2 is part of the Thomson Reuters Data Citation Indes (DCI). I did not yet have a chance to test it myself (no subscription at the local university), but I can show some screenshots showing the entry for nmrshiftdb2 (top) and two examples molecules with links to nmrshiftdb2. I think CDI is a good start to extend bibliographies beyond "old style" literature. &amp;lt;gallery&amp;gt; File:Nmrshiftdb2_dci.jpg|nmrshiftdb dci entry File:Nmrshiftdb2_dci_example1.jpg|example 1 File:Nmrshiftdb2_dci_example2.jpg|example 2 &amp;lt;/gallery&amp;gt;
+As announced earlier in the news, nmrshiftb2 is part of the Thomson Reuters Data Citation Indes (DCI). I did not yet have a chance to test it myself (no subscription at the local university), but I can show some screenshots showing the entry for nmrshiftdb2 (top) and two examples molecules with links to nmrshiftdb2. I think CDI is a good start to extend bibliographies beyond "old style" literature. See the attachments for example screenshots.

 # 2012-11-26 Improved prediction in nmrshiftdb2 1.4.2

 A new release was done over the weekend. The [changelog](http://nmrshiftdb2.svn.sourceforge.net/viewvc/nmrshiftdb2/trunk/nmrshiftdb2/CHANGELOG?revision=365&amp;amp;view=markup) gives details. I want to emphasize a bit the improved prediction. This is an extended HOSE code system, where chiral configurations are coded into the HOSE code. They are given relative to the chiral centre or the double bond in case of E/Z configurations, so it does not depend on 3d coordinates. I think this overcomes a major problem with schemes based on 3d coordinates, where it is difficult to establish the actual coordinates measured (of course it's easy to establish some coordinates). Examples of what the prediction can do are shown in the [help](http://www.nmrshiftdb.org/portal/js_pane/P-Help?URL=using.html#predict). Of course quality depends on what is in the database and chiral centres must be specified by wedge bonds. So, what does this give altogether? Doing a 10 fold cross validation gives the following results for the old prediction: 

-Atoms with prediction error &amp;gt;10 ppm: 16382 10&amp;gt;prediction error&amp;gt;=5: 35675 prediction error &amp;lt;5: 280206 
+Atoms with prediction error &amp;gt;10 ppm: 16382
+10&amp;gt;prediction error&amp;gt;=5: 35675
+prediction error &amp;lt;5: 280206 

 Using the new prediction: 

-Atoms with prediction error &amp;gt;10 ppm: 15106 10&amp;gt;prediction error&amp;gt;=5: 34553 prediction error 
+Atoms with prediction error &amp;gt;10 ppm: 15106
+10&amp;gt;prediction error&amp;gt;=5: 34553
+prediction error &amp;lt;5: 282448
+
+So we have a clear improvement, but it does not look too much. This confirms that 3d configurations do overall not play a major role for 13C shifts, but of course in some compounds they do and if we want to distinguish stereoisomers we need to consider stereo configurations. So I think this an improvement and a good step forward.
+
+# 2012-8-10 Bibliographic data in nmrshiftdb2
+
+Over the past few days, I reworked the bibliographic data in nmrshiftdb2 for our new collaboration with Thomson-Reuters (see the [nmrshiftdb2 news](https://sourceforge.net/news/?group_id=348458)). Some of the older literature references where freetext fields and therefore not in a specific format. These have all been changed to the newer model where authors, title etc. are saved separately. Plus some obvious errors have been corrected. It is now possible to export in any desired format, e. g. bibtex. The file download at [nmrshiftdb2 download page](http://nmrshiftdb2.svn.sourceforge.net/viewvc/nmrshiftdb2/trunk/snapshots/nmrshiftdb2.xml) actually contains bibtex, so an automated processing is now possible. Of course this is not a guarantee that there are no typos etc., but it definitely is a step forward.
+
+# 2012-6-28 Chemaxon NMR predictor based on nmrshiftdb2 data
+
+[Chemaxon](http://www.chemaxon.com) recently announced an NMR predictor for their Marvin sketcher. This is based on nmrshiftdb2 data. I came across it at the really great User Group meeting in Budapest and I give my impressions here. Many thanks to Chemaxon for properly acknowledging nmrshiftdb2 and this criticism is meant as an encouragement, of course.
+
+So Chemaxon, according to their presentation, took the nmrshiftdb2 data and calculated physicochemical and topological descriptors for the atoms. They use these to train multilinear least-squares regression (MLR) and support vector machine (SVM) models and do predictions based on these. So this is, for my understanding, an artificial intelligence approach.
+
+They give this quality measurement for carbon shifts:
+
+Error &amp;lt; 5 ppm for 77% of all atoms
+
+Error &amp;lt; 10 ppm for 93% of all atoms
+
+On first sight, I would say this is not overwhelming. In order to see better how this compares, I did a 100 fold crossvalidation of the current nmrshiftdb2 database using the ordinary HOSE code prediction. What I get is this:
+
+Error &amp;lt; 5 ppm for 84% of all atoms
+
+Error &amp;lt; 10 ppm for 95% of all atoms
+
+So it looks like the HOSE code prediction actually does better. Notice that this is not strictly true, since the evaluations might be different (Chemaxon doesn't say if they do crossvalidation, but I suppose they do. Even then it matters how the junks are defined etc.). Still, I would say the overall impression is that this is roughly as good as HOSE codes.
+
+Some particular observations (this was done using NMR Prediction Beta 5.8):
+
+* Doing a prediction for Pyrolle gives shifts 113.92 and and 122.53. Pyrolle of course is in the data and its shifts are 116.5/117.3 and 106.5/107.6 (two carbon spectra exist). So Marvin here is more than 5 ppm off for a known structure. This is, according to my experience, a major trouble with AI methods: They can go wrong on data which are part of the training set (HOSE codes never can).
+* I tried a structure which is not in nmrshiftdb2 and where the HOSE code prediction gives very good results because 4 or more spheres are used everywhere. Here, the Chemaxon results are mixed: Some shifts are good, some rather bad.
+* Another structure which is not in nmrshiftdb2 and where the HOSE code prediction performs very badly on some atoms gives better (but still not good) results for the same atoms in Marvin and good results for those where the HOSE code prediction does well. So it looks like the strengths and the weaknesses are similar. On the other hand, there are improvements, which is, I think, due to AI methods doing better "interpolation".
+* Stereochemistry seems not to be used in Marvin. This is also not done with the current HOSE code, but it would be something a physicochemical model could or should do.
+* The Marvin interface says "Frequency 500.0". Considering that for many (probably most) spectra in nmrshiftdb2, we do not have the frequencies (unfortunately), it seems rather bold to say a prediction is for a particular frequency (plus I can't find how to change it).
+
+A nice feature is that they give a quality mark for each predicted shift.
+
+Generally, this confirms my old impression that AI methods have the advantage of better "interpolation", but they also can give bad results in cases where they should work. As said, this is not meant to be negative, just my observations. It's good to see the field is worked on. Chemaxon might improve, and so will nmrshiftdb2 - an improved prediction is in the pipeline, restricted unfortunately by my time constraints.
+
+# 2012-6-28 Introduction
+
+This pages is used as a sort of blog by me (that is Stefan Kuhn). I will post occasional observations, remarks and thoughts about nmrshiftdb2 or its environment here. These should neither be news (which are on the [nmrshiftdb home page](http://www.nmrshiftdb.org)) nor help topics (which are in the [help pages](http://www.nmrshiftdb.org/portal/media-type/html/user/anon/page/default.psml/js_pane/P-Help)) nor pure technical topics, which are elsewhere in this wiki - really all the rest goes in here. I will see if this becomes a proper blog or something else over time.
&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Stefan Kuhn</dc:creator><pubDate>Thu, 20 Jun 2013 14:57:44 -0000</pubDate><guid>https://sourceforge.netfcd544214d10ddcd32fa98b72240456d56d793ed</guid></item><item><title>Blog modified by Stefan Kuhn</title><link>https://sourceforge.net/p/nmrshiftdb2/wiki/Blog/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;--- v10
+++ v11
@@ -1,3 +1,7 @@
+# 2013-2-13 Removal of Duplicates
+
+Since last year, I did some work on removing duplicate structures from the database (the spectra were kept, of course, and all assigned to one structure). Just in case somebody wonders what happened so that duplicate structures occured, here is the main reason (the other reason are simple software bugs e. g. in SMILES generation or in the code doing the save to the database): nmrshiftdb2 requires information about which configurations around double bonds are actual E/Z-configurations as drawn and which are unspecified and just drawn randomly in one way (I still think that without this information, real decisions about structure identity cannot be made). In the past, this information was also asked in small rings where actual E/Z-configurations cannot occur. If in one case the bond was declared as "unspecified" and in the other as "as drawn", two entries with different IDs for the same structure were made. A typical example is [molecule ID 2505](http://nmrshiftdb.nmr.uni-koeln.de/portal/js_pane/P-Results/nmrshiftdbaction/showDetailsFromHome/molNumber/2505) \- this existed twice (the other ID was 20039308, but this has disappeared, really the only case where ID disappear) with the bond from 5 to 6 being considered in Z configuration in one of them, unspecified in the other. These have now been merged and 2505 has two spectra (both 13C here, but many of the merges lead to structures now having 13C and 1H spectra). 90 such merges happened. 
+
 # 2012-11-28 Thomson Reuters co-operation

 As announced earlier in the news, nmrshiftb2 is part of the Thomson Reuters Data Citation Indes (DCI). I did not yet have a chance to test it myself (no subscription at the local university), but I can show some screenshots showing the entry for nmrshiftdb2 (top) and two examples molecules with links to nmrshiftdb2. I think CDI is a good start to extend bibliographies beyond "old style" literature. &amp;lt;gallery&amp;gt; File:Nmrshiftdb2_dci.jpg|nmrshiftdb dci entry File:Nmrshiftdb2_dci_example1.jpg|example 1 File:Nmrshiftdb2_dci_example2.jpg|example 2 &amp;lt;/gallery&amp;gt;
&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Stefan Kuhn</dc:creator><pubDate>Fri, 07 Jun 2013 20:36:28 -0000</pubDate><guid>https://sourceforge.net62b8c7e155106e9f8202eeb6d28e1810c7c9295e</guid></item><item><title>Blog modified by Stefan Kuhn</title><link>https://sourceforge.net/p/nmrshiftdb2/wiki/Blog/</link><description>&lt;div class="markdown_content"&gt;&lt;pre&gt;&lt;/pre&gt;
&lt;/div&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Stefan Kuhn</dc:creator><pubDate>Fri, 07 Jun 2013 20:36:28 -0000</pubDate><guid>https://sourceforge.netc26c135ee86014a364275408375571f39488b03a</guid></item></channel></rss>