Thread: [Htmlparser-cvs] htmlparser/src/org/htmlparser/util ParserUtils.java,1.45,1.46
Brought to you by:
derrickoswald
From: Alberto N. <an...@us...> - 2004-08-27 09:54:37
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/util In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv26947 Modified Files: ParserUtils.java Log Message: Bug fixing and trimAllTags method added. Index: ParserUtils.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/util/ParserUtils.java,v retrieving revision 1.45 retrieving revision 1.46 diff -C2 -d -r1.45 -r1.46 *** ParserUtils.java 31 Jul 2004 16:42:34 -0000 1.45 --- ParserUtils.java 27 Aug 2004 09:54:27 -0000 1.46 *************** *** 93,97 **** /** * Split the input string considering as string separator ! * all the non numerical characters * with the only exception of the characters specified in charsDoNotBeRemoved param. * <BR>For example if you call splitButDigits("<DIV> +12.5, +3.4 </DIV>", "+."), --- 93,97 ---- /** * Split the input string considering as string separator ! * all the not numerical characters * with the only exception of the characters specified in charsDoNotBeRemoved param. * <BR>For example if you call splitButDigits("<DIV> +12.5, +3.4 </DIV>", "+."), *************** *** 154,158 **** /** ! * Remove from the input string all the non numerical characters * with the only exception of the characters specified in charsDoNotBeRemoved param. * <BR>For example if you call trimButDigits("<DIV> +12.5 </DIV>", "+."), --- 154,158 ---- /** ! * Remove from the input string all the not numerical characters * with the only exception of the characters specified in charsDoNotBeRemoved param. * <BR>For example if you call trimButDigits("<DIV> +12.5 </DIV>", "+."), *************** *** 185,189 **** /** ! * Remove from the input string all the non numerical characters * with the only exception of the characters specified in charsDoNotBeRemoved param. * <BR>The removal process removes only chars at the beginning and at the end of the string. --- 185,189 ---- /** ! * Remove from the beginning and the end of the input string all the not numerical characters * with the only exception of the characters specified in charsDoNotBeRemoved param. * <BR>The removal process removes only chars at the beginning and at the end of the string. *************** *** 191,195 **** * <BR>you obtain a string "+12.5" as output (1,2 and 5 are digits and +,. are chars that do not be removed). * <BR>For example if you call trimButDigitsBeginEnd("<DIV> +1 2 . 5 </DIV>", "+."), ! * <BR>you obtain a string "+1 2 . 5" as output (the spaces inside the string are not removed). * @param input - The string in input. * @param charsDoNotBeRemoved - The chars that do not be removed. --- 191,195 ---- * <BR>you obtain a string "+12.5" as output (1,2 and 5 are digits and +,. are chars that do not be removed). * <BR>For example if you call trimButDigitsBeginEnd("<DIV> +1 2 . 5 </DIV>", "+."), ! * <BR>you obtain a string "+1 2 . 5" as output (the spacess inside the string are not removed). * @param input - The string in input. * @param charsDoNotBeRemoved - The chars that do not be removed. *************** *** 238,242 **** /** * Split the input string considering as string separator ! * all the space and tabs like chars and * the chars specified in the input variable charsToBeRemoved. * <BR>For example if you call splitSpaces("<DIV> +12.5, +3.4 </DIV>", "<>DIV/,"), --- 238,242 ---- /** * Split the input string considering as string separator ! * all the spaces and tabs like chars and * the chars specified in the input variable charsToBeRemoved. * <BR>For example if you call splitSpaces("<DIV> +12.5, +3.4 </DIV>", "<>DIV/,"), *************** *** 299,308 **** /** ! * Remove from the input string all the space and tabs like chars. ! * <BR>Remove also the chars specified in the input variable charsToBeRemoved. * <BR>For example if you call trimSpaces("<DIV> +12.5 </DIV>", "<>DIV/"), * <BR>you obtain a string "+12.5" as output (space chars and <,>,D,I,V,/ are chars that must be removed). * <BR>For example if you call trimSpaces("<DIV> Trim All Spaces Also The Ones Inside The String </DIV>", "<>DIV/"), ! * <BR>you obtain a string "TrimAllSpacesAlsoTheOnesInsideTheString" as output (all the space inside the string are removed). * @param input The string in input. * @param charsToBeRemoved The chars to be removed. --- 299,308 ---- /** ! * Remove from the input string all the spaces and tabs like chars. ! * Remove also the chars specified in the input variable charsToBeRemoved. * <BR>For example if you call trimSpaces("<DIV> +12.5 </DIV>", "<>DIV/"), * <BR>you obtain a string "+12.5" as output (space chars and <,>,D,I,V,/ are chars that must be removed). * <BR>For example if you call trimSpaces("<DIV> Trim All Spaces Also The Ones Inside The String </DIV>", "<>DIV/"), ! * <BR>you obtain a string "TrimAllSpacesAlsoTheOnesInsideTheString" as output (all the spaces inside the string are removed). * @param input The string in input. * @param charsToBeRemoved The chars to be removed. *************** *** 330,340 **** /** ! * Remove from the input string all the space and tabs like chars. ! * <BR>Remove also the chars specified in the input variable charsToBeRemoved. * <BR>The removal process removes only chars at the beginning and at the end of the string. * <BR>For example if you call trimSpacesBeginEnd("<DIV> +12.5 </DIV>", "<>DIV/"), * <BR>you obtain a string "+12.5" as output (space chars and <,>,D,I,V,/ are chars that must be removed). * <BR>For example if you call trimSpacesBeginEnd("<DIV> Trim all spaces but not the ones inside the string </DIV>", "<>DIV/"), ! * <BR>you obtain a string "Trim all spaces but not the ones inside the string" as output (all the space inside the string are preserved). * @param input The string in input. * @param charsToBeRemoved The chars to be removed. --- 330,340 ---- /** ! * Remove from the beginning and the end of the input string all the spaces and tabs like chars. ! * Remove also the chars specified in the input variable charsToBeRemoved. * <BR>The removal process removes only chars at the beginning and at the end of the string. * <BR>For example if you call trimSpacesBeginEnd("<DIV> +12.5 </DIV>", "<>DIV/"), * <BR>you obtain a string "+12.5" as output (space chars and <,>,D,I,V,/ are chars that must be removed). * <BR>For example if you call trimSpacesBeginEnd("<DIV> Trim all spaces but not the ones inside the string </DIV>", "<>DIV/"), ! * <BR>you obtain a string "Trim all spaces but not the ones inside the string" as output (all the spaces inside the string are preserved). * @param input The string in input. * @param charsToBeRemoved The chars to be removed. *************** *** 369,373 **** if (charsToBeRemoved.charAt(charsCount)==input.charAt(index)) charFound=true; ! if (!( (Character.isWhitespace(input.charAt(index))) || (Character.isSpaceChar(input.charAt(index-1))) || (charFound) )) { end=index; --- 369,373 ---- if (charsToBeRemoved.charAt(charsCount)==input.charAt(index)) charFound=true; ! if (!( (Character.isWhitespace(input.charAt(index))) || (Character.isSpaceChar(input.charAt(index))) || (charFound) )) { end=index; *************** *** 475,479 **** /** ! * Remove from the input string all the characters * with the only exception of the characters specified in charsDoNotBeRemoved param. * <BR>The removal process removes only chars at the beginning and at the end of the string. --- 475,479 ---- /** ! * Remove from the beginning and the end of the input string all the characters * with the only exception of the characters specified in charsDoNotBeRemoved param. * <BR>The removal process removes only chars at the beginning and at the end of the string. *************** *** 592,596 **** * <BR>you obtain a string "+12.5" as output (<,>,D,I,V,/ and space char are chars that must be removed). * <BR>For example if you call trimChars("<DIV> Trim All Chars Also The Ones Inside The String </DIV>", "<>DIV/ "), ! * <BR>you obtain a string "TrimAllCharsAlsoTheOnesInsideTheString" as output (all the space inside the string are removed). * @param input The string in input. * @param charsToBeRemoved The chars to be removed. --- 592,596 ---- * <BR>you obtain a string "+12.5" as output (<,>,D,I,V,/ and space char are chars that must be removed). * <BR>For example if you call trimChars("<DIV> Trim All Chars Also The Ones Inside The String </DIV>", "<>DIV/ "), ! * <BR>you obtain a string "TrimAllCharsAlsoTheOnesInsideTheString" as output (all the spaces inside the string are removed). * @param input The string in input. * @param charsToBeRemoved The chars to be removed. *************** *** 618,627 **** /** ! * Remove from the input string all the chars specified in the input variable charsToBeRemoved. * <BR>The removal process removes only chars at the beginning and at the end of the string. * <BR>For example if you call trimCharsBeginEnd("<DIV> +12.5 </DIV>", "<>DIV/ "), * <BR>you obtain a string "+12.5" as output (' ' is a space char and <,>,D,I,V,/ are chars that must be removed). * <BR>For example if you call trimCharsBeginEnd("<DIV> Trim all spaces but not the ones inside the string </DIV>", "<>DIV/ "), ! * <BR>you obtain a string "Trim all spaces but not the ones inside the string" as output (all the space inside the string are preserved). * @param input The string in input. * @param charsToBeRemoved The chars to be removed. --- 618,627 ---- /** ! * Remove from the beginning and the end of the input string all the chars specified in the input variable charsToBeRemoved. * <BR>The removal process removes only chars at the beginning and at the end of the string. * <BR>For example if you call trimCharsBeginEnd("<DIV> +12.5 </DIV>", "<>DIV/ "), * <BR>you obtain a string "+12.5" as output (' ' is a space char and <,>,D,I,V,/ are chars that must be removed). * <BR>For example if you call trimCharsBeginEnd("<DIV> Trim all spaces but not the ones inside the string </DIV>", "<>DIV/ "), ! * <BR>you obtain a string "Trim all spaces but not the ones inside the string" as output (all the spaces inside the string are preserved). * @param input The string in input. * @param charsToBeRemoved The chars to be removed. *************** *** 900,905 **** /** ! * Trim the input string in a string array, ! * considering the tags as delimiter for splitting. * @see ParserUtils#trimTags (String input, String[] tags, boolean recursive, boolean insideTag). */ --- 900,945 ---- /** ! * Trim the input string, removing all the tags in the input string. ! * <BR>The method trims all the substrings included in the input string of the following type: ! * "<XXX>", where XXX could be a string of any type. ! * <BR>If you set to true the inside parameter, the method deletes also the YYY string in the following input string: ! * "<XXX>YYY<ZZZ>", note that ZZZ is not necessary the closing tag of XXX. ! * @param input The string in input. ! * @param inside If true, it forces the method to delete also what is inside the tags. ! * @return The string without tags. ! */ ! public static String trimAllTags (String input, boolean inside) ! { ! ! StringBuffer output = new StringBuffer(); ! ! if (inside) { ! if ((input.indexOf('<')==-1) || (input.lastIndexOf('>')==-1) || (input.lastIndexOf('>')<input.indexOf('<'))) { ! output.append(input); ! } else { ! output.append(input.substring(0, input.indexOf('<'))); ! output.append(input.substring(input.lastIndexOf('>')+1, input.length())); ! } ! } else { ! boolean write = true; ! for (int index=0; index<input.length(); index++) ! { ! if (input.charAt(index)=='<' && write) ! write = false; ! if (write) ! output.append(input.charAt(index)); ! if (input.charAt(index)=='>' && (!write)) ! write = true; ! } ! } ! ! return output.toString(); ! } ! ! ! /** ! * Trim all tags in the input string and ! * return a string like the input one ! * without the tags and their content. * @see ParserUtils#trimTags (String input, String[] tags, boolean recursive, boolean insideTag). */ *************** *** 988,993 **** /** ! * Trim the input string in a string array, ! * considering the tags as delimiter for splitting. * <BR>Use Class class as input parameter * instead of tags[] string array. --- 1028,1034 ---- /** ! * Trim all tags in the input string and ! * return a string like the input one ! * without the tags and their content. * <BR>Use Class class as input parameter * instead of tags[] string array. *************** *** 1001,1006 **** /** ! * Trim the input string in a string array, ! * considering the tags as delimiter for splitting. * <BR>Use Class class as input parameter * instead of tags[] string array. --- 1042,1048 ---- /** ! * Trim all tags in the input string and ! * return a string like the input one ! * without the tags and their content (optional). * <BR>Use Class class as input parameter * instead of tags[] string array. *************** *** 1014,1019 **** /** ! * Trim the input string in a string array, ! * considering the tags as delimiter for splitting. * <BR>Use NodeFilter class as input parameter * instead of tags[] string array. --- 1056,1062 ---- /** ! * Trim all tags in the input string and ! * return a string like the input one ! * without the tags and their content. * <BR>Use NodeFilter class as input parameter * instead of tags[] string array. *************** *** 1027,1032 **** /** ! * Trim the input string in a string array, ! * considering the tags as delimiter for splitting. * <BR>Use NodeFilter class as input parameter * instead of tags[] string array. --- 1070,1076 ---- /** ! * Trim all tags in the input string and ! * return a string like the input one ! * without the tags and their content (optional). * <BR>Use NodeFilter class as input parameter * instead of tags[] string array. |