#31 Saving the files from POST

Future_Release
open
Davis
5
2010-02-24
2010-02-24
Mihai Marinescu
No

Hi, it would be great if the HTTP server could parse also the FILES posted using the "upload" input.

I attached some requests which show how the headers are built, how the boundries are set, hoping that helps in parsing them.

Discussion

  • Archive containing a few information

     
    Attachments
  • Davis
    Davis
    2010-02-24

    • milestone: 468311 --> Future_Release
     
  • Davis
    Davis
    2010-02-24

    Thanks. I'll look into this. Probably not in the next release but maybe in the one that follows it.

     
  • Thanks.

     
  • Hi again,

    Sorry that I forgot to say that when "Content-Type: application/x-www-form-urlencoded", then the request will not contain files, so the variables appear at the end of the request, like this (sorry, maybe you knew all these, but i just would remind that both cases need to be treated differently, depending the encoding/content: application/x-www-form-urlencoded and multipart/form-data):

    POST /upload.html HTTP/1.1
    Accept: image/gif, image/jpeg, image/pjpeg, application/x-ms-application, application/vnd.ms-xpsdocument, application/xaml+xml, application/x-ms-xbap, application/x-shockwave-flash, */*
    Referer: http://192.168.64.128/
    Accept-Language: en-us
    User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; AskTB5.5)
    Content-Type: application/x-www-form-urlencoded
    Accept-Encoding: gzip, deflate
    Host: 192.168.64.128
    Content-Length: 35
    Connection: Keep-Alive
    Cache-Control: no-cache

    aaausername=uuu&aaapassword=ppp

     
  • Davis
    Davis
    2010-02-25

    Thanks for the extra info :)

     
  • Davis
    Davis
    2010-02-25

    I just realized that the code doesn't have any input validation on the Content-Length. So as it stands someone could tell the HTTP server that a huge amount of data was coming and the server would attempt to allocate a correspondingly huge amount of memory. Obviously this should be addressed.

    I'm putting this note here just so that I remember to add a user supplied upper bound on the size of the post back.

     
  • Right, an empty row comes after all the headers are finished, and then it comes the content itselt (which has the length defined by Content-Length). The content is a list of variables (in case Content-Type = application/x-www-form-urlencoded) or a list of data separated by the boundry row/line (in case Content-Type = multipart/form-data).

    You had a nice idea for adding a kind of MaxPostSize. Just an idea of limiting the request size would be: Read the headers until you reach an empty line (that means the headers are over), then check the already "picked" value of the "Content-Length" and see if that value is bigger than allowed. If not, then read the total amount of data and parse it. If files in the content, then save them.

    Please note that HTTP protocol allows the same header to be sent even more than once. In this case there are 2 ways to handle this: 1) if we talk about a "unique" header (like Content-Length or Host), then we take it only once; 2) if we talk about a header which can be acumulated (like Accept-Language or Accept), then we add all the values of that header (the header that is present more than once) like this:

    Accept: image/gif
    Accept: image/jpeg

    will have the same effect like:

    Accept: image/gif, image/jpeg

    One more thing, the name of the headers are case insensitive.

     
  • Sorry for putting so many comments, but going in the code of the server, in "server_http_1.h" i found that all that i said was already done, except the case of "Content-Type=multipart/form-data". Lines 336-340 in my oppinion should be replaced by the small code:

    Filename: server_http_1.h
    Lines 336-340 in my oppinion should be replaced by:

    if (strings_equal_ignore_case(incoming.request_type, "POST")
    {
    if (strings_equal_ignore_case(left_substr(content_type,";"), "application/x-www-form-urlencoded"))
    {
    parse_url(incoming.body, incoming.queries);
    }
    else if (strings_equal_ignore_case(left_substr(content_type,";"), "multipart/form-data"))
    {
    // Here comes the boundry separated data which i cannot do myself...
    // Maybe the user should be able to set a path for the uploaded files, kind of "UploadPath".
    // Maybe there should be an array "incoming.files", which could look like this
    // incoming.files[0] = "128129381293.tmp" (user will find this file at "<UploadPath>/128129381293.tmp")
    // incoming.files[1] = "kfgjdfgdfg.tmp" (user will find this file at "<UploadPath>/kfgjdfgdfg.tmp")
    // For a simple handling, the name "128129381293.tmp" could be composed by: a randomly generated name + the original file extension.
    }
    }

     
  • Davis
    Davis
    2010-02-25

    All of the data that gets posted is already one of the arguments to on_request(). It is in incoming.body. So I was thinking that there would just be a global function or some other tool that exists outside the server itself that takes the body and parses it and outputs the files in some form. I like this because it is very simple and modular as it doesn't require adding additional complexity to the HTTP server itself. However, it has obvious disadvantage that any data posted needs to be small enough to fit into the server's memory.

    I think a better approach is probably more similar to what you proposed. For example, it should be possible to introduce a new event, on_file_posted() say, that is triggered for each file that gets posted. I could arrange for this event to contain an std::istream that would supply the user with the data in the file. This way you could handle files of arbitrary size by sending the data stream immediately to file. But you could also do something else with it which is nice.

     
  • These days i am trying to learn C++ as much as the time allows me. I saw the incoming.body and thought the same as you said: this way is simple and it lets the user to handle it by himself. :)

    On the other hand, it's nice to have all the variables sent into the incoming.queries, no matter which Content-Type was sent, since you already put all the variables into incoming.queries (no matter the Request-Type is GET or POST), right? :)

    Your approach indeed is very pro, and not even close to what i suggested. Yours is much better. :)

    Now let's see the practical situation: when using the server, the user will not be interested in one file, but he will be interested in handling that file in the context of a whole request.
    For example, the web page there is a <FORM ...> from which the visitor has sent 2 regular variables and one file (uploaded), and the server needs - let's say - to handle that file/std::istream depending on (one of) those 2 other variables from the FORM. So, maybe an event will not easily link the file to the original request, or even if it would be linked, a request containing files couldn't be processed into a kind of "ProcessRequests" general function/method.

    What if, as you said, you put the files into an array of streams, with items like: incoming.files['myfile1"]="<here comes the stream content>"? Because this way the files will be available in the same way of the other variables (incoming.queries), so using them and validating them by the user on the server wide would be very straight...

    This is just a little suggestion, i am not sure it's the best approach, of course.

     
  • Davis
    Davis
    2010-02-26

    At the moment the queries get populated for normal GET or POST with application/x-www-form-urlencoded content. But the multipart/form-data content type is totally unhandled and just goes to the body field. So in that case the queries aren't dealt with.

    I see what you mean about putting it all in one event. That's definitely better. Although, I don't think an array of streams is exactly the right way to go but something basically equivalent should be good. Something like a special istream that has a member function that advances the stream to the next content block. It would also need a method to query the headers associated with the current content block as well.

    The only negative I can see is that any queries won't end up in the incoming.queries field but will have to come in though the special istream in the case of multipart/form-data content. It's either that or we would need to buffer the entire datastream in memory to parse it before triggering the on_request() event since it's possible to have queries arrive after files. But I think it's definitely worth being able to deal with large POST requests. And it should be possible to put some kind of nice interface on this "super stream" that somehow makes dealing with the multipart/form-data queries simple anyway.

    Thanks for all this input. Now that I'm thinking about this more this seems like something that will really make the HTTP server a lot more useful :)

    Cheers,
    Davis

     
  • Hi Davis,

    as long as it's easy to be used and understood by the user, it's surely very welcome :)

    I am here when you need some support, at least from the point of view of ideas :)

    Thanks for your patience,
    Mihai

     
  • Sorry, but i really have a question - a little bit related to the request parsing method:

    In "/server/server_http_1.h", at lines 231 and 234 you read 2 strings (one is the request type and the other is the path). But those 2 informations are sent together (for example "POST /path/of/file HTTP/1.1", so how comes that your code works? I am very curious how could this work.....

    Thanks :)

     
  • Davis
    Davis
    2010-02-27

    It's just that the >> operator for std::istream reads single words at a time. So since there is whitespace between POST and /path it works out correctly.

    If the path has spaces in it then the browser would convert them into %20. And now that I'm looking at this I see that nothing here automatically takes care of any url-encoding on the actual path. That should probably be added as well.

    Cheers,
    Davis

     
  • WOW... right, right, right :)
    C++ seems so cool... thank you Davis...

    Yes, i didn't notice that the url is not decoded :) Anyway it's easy, the decoding function is already there :)

     
  • Davis
    Davis
    2010-02-27

    No problem.

    Yeah, it is pretty cool :)