From: <dfl...@us...> - 2013-09-09 10:12:59
|
Revision: 4100 http://sourceforge.net/p/dl-learner/code/4100 Author: dfleischhacker Date: 2013-09-09 10:12:56 +0000 (Mon, 09 Sep 2013) Log Message: ----------- Improve document content cleanup Modified Paths: -------------- trunk/components-core/src/main/java/org/dllearner/algorithms/isle/index/TextDocument.java Modified: trunk/components-core/src/main/java/org/dllearner/algorithms/isle/index/TextDocument.java =================================================================== --- trunk/components-core/src/main/java/org/dllearner/algorithms/isle/index/TextDocument.java 2013-09-09 10:12:21 UTC (rev 4099) +++ trunk/components-core/src/main/java/org/dllearner/algorithms/isle/index/TextDocument.java 2013-09-09 10:12:56 UTC (rev 4100) @@ -18,9 +18,10 @@ */ public TextDocument(String content) { this.rawContent = content; - this.content = content.replaceAll("[^A-Za-z ]", " "); + this.content = content.toLowerCase(); + this.content = this.content.replaceAll("[^a-z ]", " "); this.content = this.content.replaceAll("\\s{2,}", " "); - this.content = content.toLowerCase(); + this.content = content.trim(); } @Override This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |