[jflex-users] Issue in Tokenizing
The fast lexer generator for Java
Brought to you by:
lsf37,
steve_rowe
From: Denis W. <ddw...@gm...> - 2009-12-28 07:19:15
|
Hi all, I wrote a .flex to tokenize an XML document. Tokens will be like elements, attributes, values etc. in the XML document. My .flex is attached here. I used the following code test the input. I would be happy if anyone give me a clue on how to resolve this issue. Best Regards, Denis. String text = "*<stringtype id=\"h:NAME\">\n" + " </stringtype>*"; try { InputStream is = new ByteArrayInputStream(text.getBytes("UTF-8")); Scanner sc = new Scanner(is); try { for(int i=0; i <45; i++) System.out.println(sc.yytext() + ":" + sc.next_token()); }catch (Exception ex) { ex.printStackTrace(); } } catch (UnsupportedEncodingException e) { e.printStackTrace(); } catch (java.io.IOException e) { e.printStackTrace(); //To change body of catch statement use File | Settings | File Templates. } The output is always like (I wanna show that tokenizing happens at character level.) :#106 <:#104 s:#104 t:#104 r:#104 i:#104 n:#104 g:#104 t:#104 y:#104 p:#104 e:#104 i:#104 d:#106 =:#106 ":#104 h:#106 ::#104 N:#104 A:#104 M:#104 E:#106 ":#106 >:#106 <:#106 /:#104 s:#104 t:#104 r:#104 i:#104 n:#104 g:#104 t:#104 y:#104 p:#104 e:#106 |