Hi,
I was wondering why did you choose to set this constant to 10. In the function CreateBuiltInElementGrammar malloc is called twice, each time allocating in the memory an array of size sizeof(Production) * DEFAULT_PROD_ARRAY_DIM. The size of Production construct is 16 bytes, so these two statements allocate 320 bytes of memory on the heap for each new grammar element. This is severely limiting its use for big files on an embedded system, where these is only about 10K of free memory available on heap(in my case anyway).
Wahab
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It is not based on empirical tests but more like intuition and as you might guess the best value for it would be very much application specific. The reason for not choosing smaller value is to avoid the costly realloc of the array (and hence avoid dynamic memory fragmentation) if the number of added productions exceeds DEFAULT_PROD_ARRAY_DIM. You would safely reduce the value of DEFAULT_PROD_ARRAY_DIM to 2 or 3 for example especially if the EXI files are not that big or you are not prioritizing processing speed. The reason for two mallocs in CreateBuiltInElementGrammar is because you have two grammar rules: one with StartTagContent on the left hand side and another with ElementContent.
In any case however, with 10K of RAM it is very challenging to use EXI with default options. Maybe only some trivial messages would fit in that RAM when in schema-less mode. My advice is to use schema-informed mode and if possible eliminate the need for BuiltIn Grammars.
Also strict=true (in case of schema-informed mode) and setting valuePartitionCapacity=0 can reduce significantly memory usage. Other optimizations are also possible - reduce the size of some types etc.
I hope that helps!
Rumen
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The default mode is a combination of schema-informed and schema-less mode, which seems to be working well at the moment. I have a question regarding the schema-informed mode. How can I use it when no schema information is available for the input files? Also, I reduced the DEFAULT_GRAMMAR_TABLE constant from 300 to 100 in the file initSchemaInstance.c. If I am dealing with a maximum file size of 4K, how much further can I reduce this value? And what other constants can I reduce?
Wahab
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
There is nothing you could do if the input files are encoded without schema information. There is a possibility to encode only certain types more efficiently even if there is no XML schema describing the document. This is controlled by schemaId option but again by the encoder - the decoder must follow the rules set by the encoder.
As for DEFAULT_GRAMMAR_TABLE this should be fixed if you use the right build configuration for resource constraned devices. I am not sure how do you build your code but if you use:
make TARGET=contiki all
DEFAULT_GRAMMAR_TABLE should be already set to 40. There are other config optimizations done in build/gcc/contiki/exipConfig.h
Compiler settings are in build-params.mk
Regards,
Rumen
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Rumen,
Can you help me a little in understanding the difference between a grammar rule and production. I have set the size of DEFAULT_GRAMMAR_TABLE to 100. Each time a new element is encountered a new rule should be added to the grammar table. Have I understood that correctly? But the production array of one particular rule keeps incrementing with each new element. While the production arrays of other rules stay untouched. Shouldn't the new production be added to the newly created grammar rule for the new element?
Regards,
Wahab
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I was looking for some documentation that explains the exip terminology for grammar rule and production. Please have a look at this page from exip developers documentation: http://exip.sourceforge.net/doxygen/grammars.html
It shows how exip stores the grammar information and what is the meaning of grammar rule and grammar production according to exip documentation and code comments.
DEFAULT_GRAMMAR_TABLE is setting the capacity of the dynamic array which holds all grammars (each grammar itself contains grammar rules and productions). Then yes, each time a new element is encountered in the EXI stream a new GRAMMAR should be added to the grammar table. The new grammar on the other hand consists of grammar rules and productions.
The new element appears in some context i.e., it is inserted while some grammar is in effect during the processing. For the current grammar, the occurrence of the new element often results (depending on the type of grammar) in addition of new production to the current grammar rule of the current grammar.
Please note that in Computer science literature and even in the EXI spec, the terms grammar rule and grammar production are used interchangeably. In exip, the grammar rule is used as a container for grammar productions - I know, it is confusing but this terminology was adopted long back in the project beginning and it is a bit late to change now.
I hope this brings some clarity on the issue!
Regards,
Rumen
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have understood the concept very clearly now. So the sub-elements of an element get added to the element's grammar rule's production list represented by GR_START_TAG_CONTENT. So here's my dilemma, some elements have upto 50 sub-elements while some elements have none. So if I assign a large value to DEFAULT_PROD_ARRAY_DIM, I'd avoid fragmentation and excessive calls to EXIP_REALLOC, but I'll end up wasting a lot of memory. And if I keep this constant very small, 1 for instance, then I'll waste no memory, but I'll end up with a lot of memory fragmentation.
I've thought of a solution to this problem. How about making the Production array size variable? The size can then be passed as an argument to the createBuiltInGrammar() function. The argument may come from the XML file itself. For example we can append the name of every element with the number of sub-elements it has. Here's an example:
I think making the Production array size variable is very good idea. Indeed, in most of the cases there is just one rule that has a lot of productions in a grammar so it will be good to be able to set different sizes for different Production arrays in a grammar. This should not be a big change from what is happening now. I just have one concern - in most of the cases I guess you would need knowledge of the size and the exact Production array which will need to be large in advance. The example you provide is perfectly fit for such solution but in real-world scenarions you would not get that information in advance I think.
Some heuristic approaches are also possible - start small (2-3) and increase the size of the dynamic array exponentially and not linearily as it is now.
Certanly there is a room for improvement in this area.
Best regards,
Rumen
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I was wondering why did you choose to set this constant to 10. In the function
CreateBuiltInElementGrammar
malloc is called twice, each time allocating in the memory an array of sizesizeof(Production) * DEFAULT_PROD_ARRAY_DIM
. The size ofProduction
construct is 16 bytes, so these two statements allocate 320 bytes of memory on the heap for each new grammar element. This is severely limiting its use for big files on an embedded system, where these is only about 10K of free memory available on heap(in my case anyway).Wahab
Hi Wahab,
It is not based on empirical tests but more like intuition and as you might guess the best value for it would be very much application specific. The reason for not choosing smaller value is to avoid the costly realloc of the array (and hence avoid dynamic memory fragmentation) if the number of added productions exceeds DEFAULT_PROD_ARRAY_DIM. You would safely reduce the value of DEFAULT_PROD_ARRAY_DIM to 2 or 3 for example especially if the EXI files are not that big or you are not prioritizing processing speed. The reason for two mallocs in CreateBuiltInElementGrammar is because you have two grammar rules: one with StartTagContent on the left hand side and another with ElementContent.
In any case however, with 10K of RAM it is very challenging to use EXI with default options. Maybe only some trivial messages would fit in that RAM when in schema-less mode. My advice is to use schema-informed mode and if possible eliminate the need for BuiltIn Grammars.
Also strict=true (in case of schema-informed mode) and setting valuePartitionCapacity=0 can reduce significantly memory usage. Other optimizations are also possible - reduce the size of some types etc.
I hope that helps!
Rumen
View and moderate all "Open Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Discussion"
Hi Rumen,
The default mode is a combination of schema-informed and schema-less mode, which seems to be working well at the moment. I have a question regarding the schema-informed mode. How can I use it when no schema information is available for the input files? Also, I reduced the DEFAULT_GRAMMAR_TABLE constant from 300 to 100 in the file initSchemaInstance.c. If I am dealing with a maximum file size of 4K, how much further can I reduce this value? And what other constants can I reduce?
Wahab
Hi Wahab,
There is nothing you could do if the input files are encoded without schema information. There is a possibility to encode only certain types more efficiently even if there is no XML schema describing the document. This is controlled by schemaId option but again by the encoder - the decoder must follow the rules set by the encoder.
As for DEFAULT_GRAMMAR_TABLE this should be fixed if you use the right build configuration for resource constraned devices. I am not sure how do you build your code but if you use:
DEFAULT_GRAMMAR_TABLE should be already set to 40. There are other config optimizations done in build/gcc/contiki/exipConfig.h
Compiler settings are in build-params.mk
Regards,
Rumen
View and moderate all "Open Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Discussion"
Hi Rumen,
Can you help me a little in understanding the difference between a grammar rule and production. I have set the size of DEFAULT_GRAMMAR_TABLE to 100. Each time a new element is encountered a new rule should be added to the grammar table. Have I understood that correctly? But the production array of one particular rule keeps incrementing with each new element. While the production arrays of other rules stay untouched. Shouldn't the new production be added to the newly created grammar rule for the new element?
Regards,
Wahab
Hi Wahab,
I was looking for some documentation that explains the exip terminology for grammar rule and production. Please have a look at this page from exip developers documentation:
http://exip.sourceforge.net/doxygen/grammars.html
It shows how exip stores the grammar information and what is the meaning of grammar rule and grammar production according to exip documentation and code comments.
DEFAULT_GRAMMAR_TABLE is setting the capacity of the dynamic array which holds all grammars (each grammar itself contains grammar rules and productions). Then yes, each time a new element is encountered in the EXI stream a new GRAMMAR should be added to the grammar table. The new grammar on the other hand consists of grammar rules and productions.
The new element appears in some context i.e., it is inserted while some grammar is in effect during the processing. For the current grammar, the occurrence of the new element often results (depending on the type of grammar) in addition of new production to the current grammar rule of the current grammar.
Please note that in Computer science literature and even in the EXI spec, the terms grammar rule and grammar production are used interchangeably. In exip, the grammar rule is used as a container for grammar productions - I know, it is confusing but this terminology was adopted long back in the project beginning and it is a bit late to change now.
I hope this brings some clarity on the issue!
Regards,
Rumen
View and moderate all "Open Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Discussion"
Hi Rumen,
I have understood the concept very clearly now. So the sub-elements of an element get added to the element's grammar rule's production list represented by GR_START_TAG_CONTENT. So here's my dilemma, some elements have upto 50 sub-elements while some elements have none. So if I assign a large value to DEFAULT_PROD_ARRAY_DIM, I'd avoid fragmentation and excessive calls to EXIP_REALLOC, but I'll end up wasting a lot of memory. And if I keep this constant very small, 1 for instance, then I'll waste no memory, but I'll end up with a lot of memory fragmentation.
I've thought of a solution to this problem. How about making the Production array size variable? The size can then be passed as an argument to the createBuiltInGrammar() function. The argument may come from the XML file itself. For example we can append the name of every element with the number of sub-elements it has. Here's an example:
<root__2>
<child1__3>
<element0__0>000</element0__0>
<element1__0>000</element1__0>
<element2__0>000</element2__0>
</child1__3>
<child2__3>
<element0__0>000</element0__0>
<element1__0>000</element1__0>
<element2__0>000</element2__0>
</child2__3>
</root__2>
Regards,
Wahab
Hi Wahab,
I think making the Production array size variable is very good idea. Indeed, in most of the cases there is just one rule that has a lot of productions in a grammar so it will be good to be able to set different sizes for different Production arrays in a grammar. This should not be a big change from what is happening now. I just have one concern - in most of the cases I guess you would need knowledge of the size and the exact Production array which will need to be large in advance. The example you provide is perfectly fit for such solution but in real-world scenarions you would not get that information in advance I think.
Some heuristic approaches are also possible - start small (2-3) and increase the size of the dynamic array exponentially and not linearily as it is now.
Certanly there is a room for improvement in this area.
Best regards,
Rumen