Windows CE is VERY slow with memory allocations, loading a 90kb xml file took around 7.5s, so I did some searching, found these 2 threads on slow loading
Which brought my XML loading down to 0.5s, I was wondering if anyone had any other tips on speeding the loading up even more?
I'm not sure if it does this yet, but TinyXML would be A LOT faster if it used local stack buffers when scanning in strings, then switching to a TiXmlString after the buffer is full or the string is complete, this way only 1 allocation is made for most of the strings.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I tried putting local buffers in TiXmlUnknown::Parse and TiXmlAttribute::Parse but it didn't seem to help much, with only that change my parse went from 7.5s to 5.5s, but with both the assign_new_size change and the local buffer change it stayed about 0.55s, most of the allocations must be happening somewhere else.
Where do most of the TiXmlString allocations occur?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
BTW here my local buffer code for TiXmlAttribute::Parse, TiXmlUnknown::Parse is very similiar...
const int buffsize = 1023;
char buff[buffsize+1] = "";
int at = 0;
// All attribute values should be in single or double quotes.
// But this is such a common error that the parser will try
// its best, even without them.
//value = "";
while ( p && *p // existence
&& !IsWhiteSpace( *p ) && *p != '\n' && *p != '\r' // whitespace
&& *p != '/' && *p != '>' ) // tag end
{
if(at < buffsize)
{
buff[at++] = *p;
}
else if(at == buffsize)
{
buff[buffsize] = 0;
value = buff;
}
else
value += *p;
++p;
}
No I don't, I've just been using some homebrew stuff for profiling right now. One very easy way would be to add some counters next to any place that makes a new string, or appends to a string which causes a memory allocation, then change the places with the most allocations to use local buffers. Or just add one counter next each "newstring = new char [newlen];", this would give you a pretty decent profile to optimize against.
After some more looking through the code TiXmlBase::ReadText seems to be called from a few important places, does anyone know if it is called a lot when a file is loaded?
I also am going to try some different assign_new_size settings, does anyone have any recommendations?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I looked at the files in CVS, and noticed that tinystr.h and tinystr.cpp were completely revamped. I am attempting to make a similar change to the new "reserve" operation as the original "assign_new_size" operation. Any thoughts?
ruanep: I don't think your operator = optimization would do anything, I highly doubt that case is called a lot. I took another look through the code and noticed a few other optimizations you could do, which are probably called a lot.
In my version of tinyxmlparser, the very first line in both TiXmlBase::ReadName and TiXmlBase::ReadText use the operator =. So for _every element_, we do an extra new+delete for the name. For _every leaf_ in the XML tree, we do yet another unnecessary new+delete for the value.
For an XML file of any significant size, that results in hundreds of extra new+deletes. The operator == might occur more frequently, but new+delete will usually be much more costly than a strcmp.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
before additional optz tinystr.cpp file 2.15s
modified tinystr.cpp file 1.98s
So not as big an improvement as I had hoped, but not too bad. My changes are condensing all string creating and appending to append(char*, int) so all allocations use the assign new size function, the == optimization, and your zero length string optimization. I will include my full file in my next post.
One intersting thing is out of 65k allocations for my file being loaded, 22k of them are zero length strings, which is why I would have thought your change would have a bigger impact.
Also all tests above include the following changes to tinystr.h
// Return the length of a TiXmlString
__forceinline unsigned length () const
{
return current_length;
}
/*
www.sourceforge.net/projects/tinyxml
Original file by Yves Berquin.
This software is provided 'as-is', without any express or implied
warranty. In no event will the authors be held liable for any
damages arising from the use of this software.
Permission is granted to anyone to use this software for any
purpose, including commercial applications, and to alter it and
redistribute it freely, subject to the following restrictions:
1. The origin of this software must not be misrepresented; you must
not claim that you wrote the original software. If you use this
software in a product, an acknowledgment in the product documentation
would be appreciated but is not required.
2. Altered source versions must be plainly marked as such, and
must not be misrepresented as being the original software.
3. This notice may not be removed or altered from any source
distribution.
*/
// TiXmlString = operator. Safe when assign own content
void TiXmlString ::operator = (const char * content)
{
// don't free our current allocated space as we can reuse it
current_length = 0;
append(content, strlen(content));
}
// = operator. Safe when assign own content
void TiXmlString ::operator = (const TiXmlString & copy)
{
// don't free our current allocated space as we can reuse it
current_length = 0;
append(copy.c_str(), copy.length());
}
// append a const char * to an existing TiXmlString
void TiXmlString::append( const char* str, int len )
{
if(!str || str[0] == '\0' || len == 0)
{
#ifdef TINYSTR_PROFILE
zeroCount[4]++;
#endif
return;
}
char * new_string;
unsigned new_alloc, new_size;
new_size = length () + len + 1;
// check if we need to expand
if (new_size > allocated)
{
// compute new size
new_alloc = assign_new_size (new_size);
// allocate new buffer
new_string = new char [new_alloc];
new_string [0] = 0;
#ifdef TINYSTR_PROFILE
profileCount[4]++;
#endif
if(!new_string)
return;
// copy the previous allocated buffer into this one
if (allocated && cstring)
// strcpy (new_string, cstring);
memcpy (new_string, cstring, length ());
// append the suffix. It does exist, otherwize we wouldn't be expanding
// strncat (new_string, str, len);
memcpy (new_string + length (),
str,
len);
// return previsously allocated buffer if any
if (allocated && cstring)
delete [] cstring;
// update member variables
cstring = new_string;
allocated = new_alloc;
}
else
{
// we know we can safely append the new string
// strncat (cstring, str, len);
memcpy (cstring + length (),
str,
len);
}
current_length = new_size - 1;
cstring [current_length] = 0;
}
// append a const char * to an existing TiXmlString
void TiXmlString::append( const char * suffix )
{
append(suffix, strlen(suffix));
}
Hmm -- I'm getting ready to release a new version of TinyXml in about 2 weeks. Does anyone have the speed improvements against the new (in CVS) version of TiXmlString? It is potentially a much faster implementation, so I'm not sure how much these changes will pan out.
thanks,
lee
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Lee, just make sure the following things are in there:
1) add an early out in == like this
if(length() != compare.length())
return false;
2) the assign_new_size change as above
3) do a check for a 0 len string (this is different than NULL) as above
In the code i posted above I merged all string allocations to one function, which is much cleaner and easier to optimize, but its not thouroughly tested.
Also whenever possible in the rest of the TinyXML classes when comparing strings, try to make full use of the length() information, this can speed it up a lot, most of the time this involves simply adding an early out as in 1) above, or computing the strlen outside of a loop for a char*, then using it as an early out inside the loop. also avoid casting to a char* simply for comparisons and such, as the extra memory allocation will usually far outweigh the speed improvement.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
#2 and #3 needs to be tested against the new string class. I'd like to put in the change, but I don't have the time to check performance changes. (Need to fix some bugs!) Could you either pull the current string code from CVS and check, or wait for the next release (end of the month) and patch against that?
thanks,
lee
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Windows CE is VERY slow with memory allocations, loading a 90kb xml file took around 7.5s, so I did some searching, found these 2 threads on slow loading
http://sourceforge.net/forum/forum.php?thread_id=764135&forum_id=42748
http://sourceforge.net/forum/forum.php?thread_id=797723&forum_id=42748
Which seem to be outdated because TiXmlString now uses its own length variable (good job). So I looked in the code and changed the following
unsigned assign_new_size (unsigned minimum_to_allocate)
{
if(minimum_to_allocate < 16)
return minimum_to_allocate + 32;
else
return minimum_to_allocate * 2;
}
Which brought my XML loading down to 0.5s, I was wondering if anyone had any other tips on speeding the loading up even more?
I'm not sure if it does this yet, but TinyXML would be A LOT faster if it used local stack buffers when scanning in strings, then switching to a TiXmlString after the buffer is full or the string is complete, this way only 1 allocation is made for most of the strings.
I tried putting local buffers in TiXmlUnknown::Parse and TiXmlAttribute::Parse but it didn't seem to help much, with only that change my parse went from 7.5s to 5.5s, but with both the assign_new_size change and the local buffer change it stayed about 0.55s, most of the allocations must be happening somewhere else.
Where do most of the TiXmlString allocations occur?
BTW here my local buffer code for TiXmlAttribute::Parse, TiXmlUnknown::Parse is very similiar...
const int buffsize = 1023;
char buff[buffsize+1] = "";
int at = 0;
// All attribute values should be in single or double quotes.
// But this is such a common error that the parser will try
// its best, even without them.
//value = "";
while ( p && *p // existence
&& !IsWhiteSpace( *p ) && *p != '\n' && *p != '\r' // whitespace
&& *p != '/' && *p != '>' ) // tag end
{
if(at < buffsize)
{
buff[at++] = *p;
}
else if(at == buffsize)
{
buff[buffsize] = 0;
value = buff;
}
else
value += *p;
++p;
}
if(at < buffsize)
{
buff[buffsize] = 0;
value = buff;
}
Very cool...
Do you know of any profiling tools for CE? That way we could zoom right in to the exact areas taking all the CPU?
No I don't, I've just been using some homebrew stuff for profiling right now. One very easy way would be to add some counters next to any place that makes a new string, or appends to a string which causes a memory allocation, then change the places with the most allocations to use local buffers. Or just add one counter next each "newstring = new char [newlen];", this would give you a pretty decent profile to optimize against.
After some more looking through the code TiXmlBase::ReadText seems to be called from a few important places, does anyone know if it is called a lot when a file is loaded?
I also am going to try some different assign_new_size settings, does anyone have any recommendations?
Good change to the memory allocator. I'll try to get that in to the next version -- performance on devices comes up a bunch.
The next version won't be out until end of September, so once you have time to tune the performance a bit, send me a diff.
lee
What about this optimization?
void TiXmlString ::operator = (const char * content)
{
size_t newlen;
char * newstring;
if (! content)
{
empty_it ();
return;
}
// mm: begin
if (content[0] == '\0')
{
// empty string string passed in
empty_it();
return;
}
// mm: end
newlen = strlen (content) + 1;
newstring = new char [newlen];
// strcpy (newstring, content);
memcpy (newstring, content, newlen);
empty_it ();
allocated = newlen;
cstring = newstring;
current_length = newlen - 1;
}
I looked at the files in CVS, and noticed that tinystr.h and tinystr.cpp were completely revamped. I am attempting to make a similar change to the new "reserve" operation as the original "assign_new_size" operation. Any thoughts?
TiXmlString::reserve (size_type cap)
{
if((cap < 16)||(capacity()<16))
{
cap = 32;
}
if (cap > capacity())
{
TiXmlString tmp;
tmp.init(length(), cap);
memcpy(tmp.start(), data(), length());
swap(tmp);
}
}
ruanep: I don't think your operator = optimization would do anything, I highly doubt that case is called a lot. I took another look through the code and noticed a few other optimizations you could do, which are probably called a lot.
in == add this
if(length() != compare.length())
return false;
and in < I think you can add something like this
if(length() < compare.length())
return true;
if(length() > compare.length())
return false;
> is same with signs reversed
NOTE none of this is tested.
In my version of tinyxmlparser, the very first line in both TiXmlBase::ReadName and TiXmlBase::ReadText use the operator =. So for _every element_, we do an extra new+delete for the name. For _every leaf_ in the XML tree, we do yet another unnecessary new+delete for the value.
For an XML file of any significant size, that results in hundreds of extra new+deletes. The operator == might occur more frequently, but new+delete will usually be much more costly than a strcmp.
Wow, good catch then, I did some testing...
Dell Axim x5 300mhz
189k xml file with 2400 lines
before additional optz tinystr.cpp file 2.15s
modified tinystr.cpp file 1.98s
So not as big an improvement as I had hoped, but not too bad. My changes are condensing all string creating and appending to append(char*, int) so all allocations use the assign new size function, the == optimization, and your zero length string optimization. I will include my full file in my next post.
One intersting thing is out of 65k allocations for my file being loaded, 22k of them are zero length strings, which is why I would have thought your change would have a bigger impact.
Also all tests above include the following changes to tinystr.h
// Return the length of a TiXmlString
__forceinline unsigned length () const
{
return current_length;
}
unsigned assign_new_size (unsigned minimum_to_allocate)
{
if(minimum_to_allocate < 16)
return 16;
else
return minimum_to_allocate + 16;
}
new tinystr.cpp...
/*
www.sourceforge.net/projects/tinyxml
Original file by Yves Berquin.
This software is provided 'as-is', without any express or implied
warranty. In no event will the authors be held liable for any
damages arising from the use of this software.
Permission is granted to anyone to use this software for any
purpose, including commercial applications, and to alter it and
redistribute it freely, subject to the following restrictions:
1. The origin of this software must not be misrepresented; you must
not claim that you wrote the original software. If you use this
software in a product, an acknowledgment in the product documentation
would be appreciated but is not required.
2. Altered source versions must be plainly marked as such, and
must not be misrepresented as being the original software.
3. This notice may not be removed or altered from any source
distribution.
*/
#include "tinyxml.h"
#ifndef TIXML_USE_STL
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include "tinystr.h"
#define TINYSTR_PROFILE
#ifdef TINYSTR_PROFILE
unsigned int profileCount[15] = {0,0,0,0,0, 0,0,0,0,0, 0,0,0,0,0};
unsigned int zeroCount[15] = {0,0,0,0,0, 0,0,0,0,0, 0,0,0,0,0};
#endif
// TiXmlString constructor, based on a C string
TiXmlString::TiXmlString (const char* instring)
{
allocated = 0;
cstring = NULL;
current_length = 0;
append(instring, strlen(instring));
}
// TiXmlString copy constructor
TiXmlString::TiXmlString (const TiXmlString& copy)
{
allocated = 0;
cstring = NULL;
current_length = 0;
append(copy.c_str(), copy.length());
}
// TiXmlString = operator. Safe when assign own content
void TiXmlString ::operator = (const char * content)
{
// don't free our current allocated space as we can reuse it
current_length = 0;
append(content, strlen(content));
}
// = operator. Safe when assign own content
void TiXmlString ::operator = (const TiXmlString & copy)
{
// don't free our current allocated space as we can reuse it
current_length = 0;
append(copy.c_str(), copy.length());
}
// append a const char * to an existing TiXmlString
void TiXmlString::append( const char* str, int len )
{
if(!str || str[0] == '\0' || len == 0)
{
#ifdef TINYSTR_PROFILE
zeroCount[4]++;
#endif
return;
}
char * new_string;
unsigned new_alloc, new_size;
new_size = length () + len + 1;
// check if we need to expand
if (new_size > allocated)
{
// compute new size
new_alloc = assign_new_size (new_size);
// allocate new buffer
new_string = new char [new_alloc];
new_string [0] = 0;
#ifdef TINYSTR_PROFILE
profileCount[4]++;
#endif
if(!new_string)
return;
// copy the previous allocated buffer into this one
if (allocated && cstring)
// strcpy (new_string, cstring);
memcpy (new_string, cstring, length ());
// append the suffix. It does exist, otherwize we wouldn't be expanding
// strncat (new_string, str, len);
memcpy (new_string + length (),
str,
len);
// return previsously allocated buffer if any
if (allocated && cstring)
delete [] cstring;
// update member variables
cstring = new_string;
allocated = new_alloc;
}
else
{
// we know we can safely append the new string
// strncat (cstring, str, len);
memcpy (cstring + length (),
str,
len);
}
current_length = new_size - 1;
cstring [current_length] = 0;
}
// append a const char * to an existing TiXmlString
void TiXmlString::append( const char * suffix )
{
append(suffix, strlen(suffix));
}
// Check for TiXmlString equuivalence
//bool TiXmlString::operator == (const TiXmlString & compare) const
//{
// return (! strcmp (c_str (), compare . c_str ()));
//}
//unsigned TiXmlString::length () const
//{
// if (allocated)
// // return strlen (cstring);
// return current_length;
// return 0;
//}
unsigned TiXmlString::find (char tofind, unsigned offset) const
{
char * lookup;
if (offset >= length ())
return (unsigned) notfound;
for (lookup = cstring + offset; * lookup; lookup++)
if (* lookup == tofind)
return lookup - cstring;
return (unsigned) notfound;
}
bool TiXmlString::operator == (const TiXmlString & compare) const
{
if ( allocated && compare.allocated )
{
assert( cstring );
assert( compare.cstring );
#ifdef TINYSTR_PROFILE
profileCount[7]++;
#endif
if(length() != compare.length())
return false;
return ( strcmp( cstring, compare.cstring ) == 0 );
}
return false;
}
bool TiXmlString::operator < (const TiXmlString & compare) const
{
if ( allocated && compare.allocated )
{
assert( cstring );
assert( compare.cstring );
return ( strcmp( cstring, compare.cstring ) > 0 );
}
return false;
}
bool TiXmlString::operator > (const TiXmlString & compare) const
{
if ( allocated && compare.allocated )
{
assert( cstring );
assert( compare.cstring );
return ( strcmp( cstring, compare.cstring ) < 0 );
}
return false;
}
#endif // TIXML_USE_STL
Also I just tried our your reserve change, out of 65k+ allocations only 3 are coming from reserve, perhaps it would help with other files though...
I think the next biggest change might come from:
1) adding local static buffers to TiXmlString so there are no allocations until the string goes over the buffer size
2) changing strings to be read into local storage buffers, then only allocate a TiXmlString once
3) using a custom memory handler on CE
Performance is good enough for me though, so I won't be pursing them yet...
Hmm -- I'm getting ready to release a new version of TinyXml in about 2 weeks. Does anyone have the speed improvements against the new (in CVS) version of TiXmlString? It is potentially a much faster implementation, so I'm not sure how much these changes will pan out.
thanks,
lee
Hi Lee, just make sure the following things are in there:
1) add an early out in == like this
if(length() != compare.length())
return false;
2) the assign_new_size change as above
3) do a check for a 0 len string (this is different than NULL) as above
In the code i posted above I merged all string allocations to one function, which is much cleaner and easier to optimize, but its not thouroughly tested.
Also whenever possible in the rest of the TinyXML classes when comparing strings, try to make full use of the length() information, this can speed it up a lot, most of the time this involves simply adding an early out as in 1) above, or computing the strlen outside of a loop for a char*, then using it as an early out inside the loop. also avoid casting to a char* simply for comparisons and such, as the extra memory allocation will usually far outweigh the speed improvement.
*correction
"also avoid casting to a char* simply for comparisons and such"
should be
also avoid casting a char* to a TiXmlString simply for comparisons and such
Hmm - #1 has been checked in to CVS.
#2 and #3 needs to be tested against the new string class. I'd like to put in the change, but I don't have the time to check performance changes. (Need to fix some bugs!) Could you either pull the current string code from CVS and check, or wait for the next release (end of the month) and patch against that?
thanks,
lee