[Flex-help] multipart form lexer sample
flex is a tool for generating scanners
Brought to you by:
wlestes
From: Brian M. <mcq...@gm...> - 2007-01-27 00:28:49
|
Here is something that extracts the file from stdin where stdin is sent from a web client. So stdin has all of the protocol headers and such and a sample is appended at the bottom: Its a lexer that parses that and extracts an embedded file. Maybe you guys could improve it. I don't like it much. I don't like the fact that line ending characters may be either \n or \r\n. I don't like the possible memory consumption when scanning for the end of the FILE_DATA. %{ #include <stdio.h> #include <string.h> char boundary[1024]; %} %s HEADERS %s BODY %s BOUNDARY %s PARM_NAME %s PARM_VALUE %s FILE_TYPE %s FILE_DATA EOL \r?\n DOT [^\r\n] %% <INITIAL>^(POST|GET|HEAD)" "{DOT}*"HTTP/1."[12]{DOT}*{EOL} BEGIN(HEADERS); <HEADERS>^"Content-Length: "[0-9]+{EOL} ; <HEADERS>^"Content-Type: multipart/form-data; boundary=" BEGIN(BOUNDARY); <HEADERS>^{EOL} BEGIN(BODY); <HEADERS>{DOT} ; <PARM_NAME>{DOT}+"filename="{DOT}*{EOL} BEGIN(FILE_TYPE); <PARM_NAME>{DOT}+{EOL} ; <PARM_NAME>^{EOL} BEGIN(PARM_VALUE); <PARM_VALUE>{DOT}+{EOL} BEGIN(BODY); <BOUNDARY>{DOT}*{EOL} strncpy(boundary,yytext,yyleng - 1); BEGIN(HEADERS); <FILE_TYPE>{DOT}+{EOL} ; <FILE_TYPE>^{EOL} BEGIN(FILE_DATA); <FILE_DATA>{DOT} ECHO; <FILE_DATA>^--{DOT}*{EOL} { if (strncmp(yytext + 2, boundary, strlen(boundary)) == 0) { BEGIN(PARM_NAME); } else { REJECT; } } <BODY>^--{DOT}*{EOL} { if (strncmp(yytext + 2, boundary, strlen(boundary)) == 0) { BEGIN(PARM_NAME); } else { REJECT; } } <BODY>{DOT} ; <<EOF>> yyterminate(); %% int main(void) { yylex(); exit(0); } Sample STDIN from a web client: POST /form1?some_stuff HTTP/1.1 Host: whatever Content-Length: 98398 Content-Type: ... Other Headers: ... --boundary .. EOF |