From: Thorsten S. <tsc...@am...> - 2010-07-15 11:06:18
|
Hello, we use SOAP::Lite to transfer binary files as Base64 encoded strings as a parameter like others within a normal Soap message. We are not using Soap with attachments or else and yet don't plan to use it for compatibility concerns. The Base64-encoding is handled in front of SOAP::Lite for historical reasons, too, therefore SOAP::Lite really does only see a very large, multi lined string. The problem is that this approach scales pretty bad. Transferring a 40 MB binary file exceeds the amount of addressable RAM of the perl process on 32 Bit systems. The HTTP-communication itself is not a problem, but serialization and deserialization by SOAP::Lite seems to be. I don't think that using more than 2 GB of memory for handling maybe 100 MB of data is really necessary and tried to find my way through the serializer and deserializer to look if I can find which obviously blow up memory consumption. "envelope" in the serializer is one method after which the used memory increased to several hundreds of MB, the real big thing happens afterwards, though. The main thing envelope seems to do is creating SOAP::Data-instances for existing data. Looking at SOAP::Data::set_value it seems to me it's copying data, but I'm not sure if it's always copying the data it gets how it gets it or always dereferencing it and really producing maybe unneeded copies. My soap call to SOAP::lite looks the following: eval{$t = $soap->elrev_EndUpload( SOAP::Data->new('name' => 'sessionid', 'type' => 'string', 'value' => $sessionid), SOAP::Data->new('name' => 'mandant', 'type' => 'string', 'value' => $mandant), SOAP::Data->new('name' => 'satzid', 'type' => 'string', 'value' => $dsnummer), SOAP::Data->new('name' => 'dateiinhalt', 'type' => 'string', 'value' => $$datei_ref), SOAP::Data->new('name' => 'einreichart', 'type' => 'string', 'value' => $art), SOAP::Data->new('name' => 'mailadresse', 'type' => 'string', 'value' => $email));}; $$datei_ref holds the Base64 encoded file data. Calling the method this way I thought that only references to objects are handled in envelope and copying them doesn't really matter. But if SOAP::Data::set_value would copy the contents of the references the huge memory consumption could easily be explained because it seems SOAP::Data-objects are created very often. Am I right and set_value copies the content in the existing SOAP::Data-objects? Do you see any other apparently reasons why the memory consumption could be this high? Thanks to all. Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning AM-SoFT IT-Systeme - Hameln | Potsdam | Leipzig Telefon: Potsdam: 0331-743881-0 E-Mail: tsc...@am... Web: http://www.am-soft.de AM-SoFT GmbH IT-Systeme, Konsumhof 1-5, 14482 Potsdam Amtsgericht Potsdam HRB 21278 P, Geschäftsführer: Andreas Muchow |
From: Martin K. <mar...@fe...> - 2010-07-17 18:42:16
|
Hi Thorsten, SOAP::Lite indeed copies the request data several times - in the envelope creation (the one you suspected), the serializer (it keeps both the original data and the XML form in memory), the deserializer, and, as far as I know, twice in the transport layer (one for converting the XML string to bytes, and one for passing the data to LWP::UserAgent by value). This makes up for at least 4 copies for sending a request. Using base64 also increases memory usage by 1.5, so the factor of > 20 you're reporting seems quite likely. This behavior, however, is unlikely to change in the future: Passing values by reference (instead of a pass-by-value, which means copying) would require API changes and possibly break existing applications. Eliminating all (really) unnecessary copies would probably reduce memory usage by a factor of between 2 and 10. Unfortunately, even this is unlikely to happen: SOAP::Lite's test suite is far from being perfect, and memory optimizations (and testing they don't break anything) would require quite some effort - and it's not even sure memory usage would drop below a level acceptable for you. The best approach would be to avoid the memory usage problem altogether and stream the request to the server. Unfortunately, SOAP::Lite doesn't have any means for streaming requests. To my knowledge, there's also no other SOAP library in perl which has direct support for streaming request, so you may need to roll your own. Martin Am Donnerstag, den 15.07.2010, 13:06 +0200 schrieb Thorsten Schöning: > Hello, > > we use SOAP::Lite to transfer binary files as Base64 encoded strings > as a parameter like others within a normal Soap message. We are not > using Soap with attachments or else and yet don't plan to use it for > compatibility concerns. The Base64-encoding is handled in front of > SOAP::Lite for historical reasons, too, therefore SOAP::Lite really > does only see a very large, multi lined string. > > The problem is that this approach scales pretty bad. Transferring a 40 > MB binary file exceeds the amount of addressable RAM of the perl > process on 32 Bit systems. The HTTP-communication itself is not a > problem, but serialization and deserialization by SOAP::Lite seems to > be. I don't think that using more than 2 GB of memory for handling > maybe 100 MB of data is really necessary and tried to find my way > through the serializer and deserializer to look if I can find which > obviously blow up memory consumption. "envelope" in the serializer is > one method after which the used memory increased to several hundreds > of MB, the real big thing happens afterwards, though. > > The main thing envelope seems to do is creating SOAP::Data-instances > for existing data. Looking at SOAP::Data::set_value it seems to me > it's copying data, but I'm not sure if it's always copying the data it > gets how it gets it or always dereferencing it and really producing > maybe unneeded copies. > > My soap call to SOAP::lite looks the following: > > eval{$t = $soap->elrev_EndUpload( SOAP::Data->new('name' => 'sessionid', > 'type' => 'string', > 'value' => $sessionid), > SOAP::Data->new('name' => 'mandant', > 'type' => 'string', > 'value' => $mandant), > SOAP::Data->new('name' => 'satzid', > 'type' => 'string', > 'value' => $dsnummer), > SOAP::Data->new('name' => 'dateiinhalt', > 'type' => 'string', > 'value' => $$datei_ref), > SOAP::Data->new('name' => 'einreichart', > 'type' => 'string', > 'value' => $art), > SOAP::Data->new('name' => 'mailadresse', > 'type' => 'string', > 'value' => $email));}; > > $$datei_ref holds the Base64 encoded file data. > > Calling the method this way I thought that only references to objects > are handled in envelope and copying them doesn't really matter. But > if SOAP::Data::set_value would copy the contents of the references the > huge memory consumption could easily be explained because it seems > SOAP::Data-objects are created very often. > > Am I right and set_value copies the content in the existing > SOAP::Data-objects? Do you see any other apparently reasons why the > memory consumption could be this high? > > Thanks to all. > > Mit freundlichen Grüßen, > > Thorsten Schöning > |
From: Thorsten S. <tsc...@am...> - 2010-07-18 18:42:59
|
Guten Tag Martin Kutter, am Samstag, 17. Juli 2010 um 20:41 schrieben Sie: > Eliminating all (really) unnecessary copies would probably reduce memory > usage by a factor of between 2 and 10. Unfortunately, even this is > unlikely to happen: SOAP::Lite's test suite is far from being perfect, > and memory optimizations (and testing they don't break anything) would > require quite some effort - and it's not even sure memory usage would > drop below a level acceptable for you. Hi Martin, thanks for your answer. With some optimizations we at least can gain enough time until everyone runs 64 Bit systems. ;-) Today I reimplemented SOAP::Serializer::tag and ::xmlize in my own serializer and could drop memory usage about some hundred MBs with using array refs and string refs als elements instead of strings which are often copied twice or more. One approach I thought of was using our own data types which hold references to their data and recognizing them everywhere where SOAP::Lite needs to handle this data. Especially tag and xmlize are places where thos data type maybe recognized and changed to the standard ones, but that's now problem because SOAP::Lite gives one the possibility to override them very easy. If I find some of those places in the deserializer and maybe the transport layer, too, I'm optimistic to decrease memory usage to an acceptable level without breaking to much. > Unfortunately, SOAP::Lite doesn't have any means for streaming requests. > To my knowledge, there's also no other SOAP library in perl which has > direct support for streaming request, so you may need to roll your own. I don't think this will happen, it's more likely to just use 64 Bit systems and optimize serializer and deserializer to fit our needs. It's not even clear if we stick with perl on the server side. We did a project with Java and Axis2, this could be an option for new things, too. Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning AM-SoFT IT-Systeme - Hameln | Potsdam | Leipzig Telefon: Potsdam: 0331-743881-0 E-Mail: tsc...@am... Web: http://www.am-soft.de AM-SoFT GmbH IT-Systeme, Konsumhof 1-5, 14482 Potsdam Amtsgericht Potsdam HRB 21278 P, Geschäftsführer: Andreas Muchow |