Thread: Re: [Parser-devel] [token.c] A new functionality
Status: Beta
Brought to you by:
jineshkj
From: Noam P. <npo...@uw...> - 2006-10-03 03:33:46
|
Jinesh K J wrote: > Hi Noam, > > As I've mentioned in the previous mail, I'm working on integrating > gettoken() and puttoken() into the trunk. Mean while I would like to > have one more functionality as given below: > > char *groupname(char *input) > > Here, input would be a string which containts a group name within > '[]'. Examples for input are given below: > > @[group1]@ -> @group1@ > @[group 1]@ -> @group 1@ > > Please note that '[' and ']' are included in the input(so that i've > used '@' as cover). ie, What ever that comes in between those square > brackets are to be extracted as a single token. But, even inside the > square brackets, quoted values must be possible. For example: > > @["group"]@ -> @group@ > @["group 1"]@ -> @group 1@ > @["group [H]"]@ -> @group [H]@ > @[group "[H]"]@ -> @group [H]@ > @[group "[H\\T]"]@ -> @group [H\T]@ > ... > > All the rules for quoted values we used for gettoken() are valid here > also. So, I wish if we could use gettoken here too. > Ok, I uploaded my first attempt, but I'm wondering whether the input always has '[' and ']'? Can there be stuff outside the brackets? eg @[group 1] hello@ -> @group 1@ > Also, in token.c, all the memory allocation functions' return value > are unchecked. For example, in puttoken() after calling calloc(), str > is not checked for NULL. > > char *str = calloc(2+1, sizeof(char)); > str[0] = str[1] = '"'; > str[2] = '\0'; > return str; > > > I wish if we can just do an exit(-1), whenever such an error occurs. > I put that in, but isn't there any better way to signal errors? Noam |
From: Jinesh K J <jin...@gm...> - 2006-10-04 02:17:05
|
> Ok, I uploaded my first attempt, but I'm wondering whether the input > always has '[' and ']'? Can there be stuff outside the brackets? eg > @[group 1] hello@ -> @group 1@ If anything comes outside the box, let us ignore that. What do you say? Could there be a better way? Also, note that I've updated the test cases. I hope the assertion can be avoided and I've updated the test cases to include those cases. 116 char * getgroup(const char *string) 117 { 118 assert(string[0] == '['); > > I wish if we can just do an exit(-1), whenever such an error occurs. > > > I put that in, but isn't there any better way to signal errors? Yes ! This policy must be changed. But, let's keep it aside, since memory allocation is least likely to fail in Linux. We can clean this stuff as soon as we get our functionalities right. Regards, Jinesh. |
From: Noam P. <npo...@uw...> - 2006-10-04 05:07:50
|
Jinesh K J wrote: >> Ok, I uploaded my first attempt, but I'm wondering whether the input >> always has '[' and ']'? Can there be stuff outside the brackets? eg >> @[group 1] hello@ -> @group 1@ > If anything comes outside the box, let us ignore that. What do you > say? Could there be a better way? > Well, that is consistent with gettoken's behaviour, although if I was writing a config file and left it invalid I would at least want to be warned about it... > Also, note that I've updated the test cases. right, I've updated getgroup to pass them. Noam |
From: Jinesh K J <jin...@gm...> - 2006-10-04 09:32:29
|
On 10/4/06, Noam Postavsky <npo...@uw...> wrote: > Jinesh K J wrote: > >> Ok, I uploaded my first attempt, but I'm wondering whether the input > >> always has '[' and ']'? Can there be stuff outside the brackets? eg > >> @[group 1] hello@ -> @group 1@ > > If anything comes outside the box, let us ignore that. What do you > > say? Could there be a better way? > > > Well, that is consistent with gettoken's behaviour, although if I was > writing a config file and left it invalid I would at least want to be > warned about it... Yes. Its true. But we are writing a library. We cannot print any error or warning message by ourselves. The best way I think is to set the some global status variable. Even a call-back method won't be that harmful, I believe. > > Also, note that I've updated the test cases. > right, I've updated getgroup to pass them. I'm getting some fatal error. Can u go through them. Its given below: [jinesh@jinesh np]$ ./a.out Testing gettoken() VERIFIED : < > VERIFIED : <> VERIFIED : <Hello> VERIFIED : < Hello > VERIFIED : < Hello World > VERIFIED : < "Hello" World> VERIFIED : < "Hello World" > VERIFIED : < " Hello World " > VERIFIED : < "Hello \" World " > VERIFIED : < 'Hello' World> VERIFIED : < 'Hello World' > VERIFIED : < ' Hello World ' > VERIFIED : < 'Hello \' World ' > VERIFIED : < " Hello ' World " > VERIFIED : < ' Hello " World ' > VERIFIED : < " Hello \\ World " > VERIFIED : < ' Hello \\ World ' > VERIFIED : < " Hello World "Pista > VERIFIED : < " Hello World ""Pista" > VERIFIED : < " Hello World " "Pista" > VERIFIED : < Pista" Hello World " > VERIFIED : < " " > VERIFIED : < "" > VERIFIED : < " \z " > VERIFIED : < \ > VERIFIED : < " \ " > VERIFIED : <hello\ world> VERIFIED : <hello\" world"> VERIFIED : <"hello world > Testing puttoken() VERIFIED : <Hello> VERIFIED : < Hello > VERIFIED : <Hello World> VERIFIED : < Hello World > VERIFIED : <Hello " World > VERIFIED : <Hello ' World > VERIFIED : < Hello \ World > VERIFIED : < > VERIFIED : <> VERIFIED : < z > VERIFIED : <\> VERIFIED : < \ > VERIFIED : <hello\> VERIFIED : <hello"> Testing getgroup() VERIFIED : < [] > *** glibc detected *** ./a.out: munmap_chunk(): invalid pointer: 0x0804933e *** ======= Backtrace: ========= /lib/libc.so.6(__libc_free+0x179)[0x4cb744f0] ./a.out[0x80490c9] ./a.out[0x8049226] /lib/libc.so.6(__libc_start_main+0xdc)[0x4cb22724] ./a.out[0x80485f1] ======= Memory map: ======== 08048000-0804a000 r-xp 00000000 03:02 1481 /home/jinesh/work/parser/branches/np/a.out 0804a000-0804b000 rw-p 00001000 03:02 1481 /home/jinesh/work/parser/branches/np/a.out 0804b000-0806c000 rw-p 0804b000 00:00 0 [heap] 4509c000-450a7000 r-xp 00000000 03:03 1240382 /lib/libgcc_s-4.1.1-20060525.so.1 450a7000-450a8000 rw-p 0000a000 03:03 1240382 /lib/libgcc_s-4.1.1-20060525.so.1 4caf0000-4cb09000 r-xp 00000000 03:03 1240355 /lib/ld-2.4.so 4cb09000-4cb0a000 r--p 00018000 03:03 1240355 /lib/ld-2.4.so 4cb0a000-4cb0b000 rw-p 00019000 03:03 1240355 /lib/ld-2.4.so 4cb0d000-4cc3a000 r-xp 00000000 03:03 1240359 /lib/libc-2.4.so 4cc3a000-4cc3c000 r--p 0012d000 03:03 1240359 /lib/libc-2.4.so 4cc3c000-4cc3d000 rw-p 0012f000 03:03 1240359 /lib/libc-2.4.so 4cc3d000-4cc40000 rw-p 4cc3d000 00:00 0 b7f09000-b7f0a000 rw-p b7f09000 00:00 0 b7f28000-b7f2a000 rw-p b7f28000 00:00 0 b7f2a000-b7f2b000 r-xp b7f2a000 00:00 0 [vdso] bffc1000-bffd7000 rw-p bffc1000 00:00 0 [stack] Aborted -------- Regards, Jinesh. |
From: Noam P. <npo...@uw...> - 2006-10-04 21:54:40
|
Jinesh K J wrote: > Yes. Its true. But we are writing a library. We cannot print any error > or warning message by ourselves. The best way I think is to set the > some global status variable. Even a call-back method won't be that > harmful, I believe. Hmm, now I can appreciate the usefulness of exceptions... > >> > Also, note that I've updated the test cases. >> right, I've updated getgroup to pass them. > > I'm getting some fatal error. Can u go through them. Its given below: > OK, fixed it, I was returning a non-malloc'd string. The error didn't show up unless I set the environment variable MALLOC_CHECK_ so I didn't notice. Noam |
From: Jinesh K J <jin...@gm...> - 2006-10-05 07:30:43
|
> Hmm, now I can appreciate the usefulness of exceptions... I really miss that one here. > OK, fixed it, I was returning a non-malloc'd string. The error didn't > show up unless I set the environment variable MALLOC_CHECK_ so I didn't > notice. It seem to be working now. There are a few more work to be done as follows: =====1===== So, firstly let us make a modification over our gettoken - there need to have a slight change in its prototype: char * gettoken(const char *str, const char **end); When gettoken returns, *end need to have the address of the next character after the current token in str. For example, for an input of [Hello World], gettoken would be returning the malloc'ed [Hello], whereas *end should contain the address of the ' ' (space) character after 'Hello'. Even if it has reached the end of line, the address of the last '\0' should be returned. Note: *end need to be assigned only if we have found a valid token. ie, if gettoken() is returning NULL, *end need not be modified. ======2======= Secondly, we need a new function to dissect a comment from a line: char *cutcomment(char *str); The function searches in a string to find a comment (starting with '#'). If one is found, these are the steps to be performed: 1. Replace '#' with a '\0' in the original string (str) 2. Return a malloc'ed string that contains the contents after '#'(whatever it is) If no comment could be found, then the function need to return NULL. PS: The original string in this case get modified if a comment is found. =====end===== Try these and let me know. Jinesh. |
From: Jinesh K J <jin...@gm...> - 2006-10-06 05:00:18
|
On 10/6/06, Noam Postavsky <npo...@uw...> wrote: > Jinesh K J wrote: > There are a few more work to be done as follows: > > > > =====1===== > > > > So, firstly let us make a modification over our gettoken - there need > > to have a slight change in its prototype: > > > > char * gettoken(const char *str, const char **end); > > > > ======2======= > > > > Secondly, we need a new function to dissect a comment from a line: > > > > char *cutcomment(char *str); > > > > =====end===== > > > > Try these and let me know. > > > > Jinesh. > > Done and Done. Both the functions are working fine. While gettoken is perfect there's a flaw with cutcomment. I think I didn't take much time to explain what 'str' can contain. 'str' contains a line inside the configuration file. For example: server=192.168.0.1 port=2183 # This is our server A line can also contain quoted values like: server="192.168.0.1" port="2183" # This is our server In any case, the '#' inside the quote should be counted as the start of the comment. I've added a few test cases for that too. All the rules for quotes that we've followed till now applies here also. I hope now things are clear. What do you say? > Noam > Jinesh. |
From: Noam P. <npo...@uw...> - 2006-10-06 12:28:13
|
Jinesh K J wrote: > Both the functions are working fine. While gettoken is perfect there's > a flaw with cutcomment. I think I didn't take much time to explain > what 'str' can contain. 'str' contains a line inside the configuration > file. For example: > > server=192.168.0.1 port=2183 # This is our server > > A line can also contain quoted values like: > > server="192.168.0.1" port="2183" # This is our server > > In any case, the '#' inside the quote should be counted as the start OK, I think you mean should NOT be counted, that's what the test cases seem to indicate. > of the comment. I've added a few test cases for that too. All the > rules for quotes that we've followed till now applies here also. I > hope now things are clear. What do you say? > > Jinesh. I've changed the function to pass the test cases, I think this is what you meant, if not, more thorough explanation is needed. Noam |
From: Jinesh K J <jin...@gm...> - 2006-10-07 12:47:31
|
On 10/6/06, Noam Postavsky <npo...@uw...> wrote: > OK, I think you mean should NOT be counted, that's what the test cases > seem to indicate. Yup! What I meant was 'NOT BE COUNTED'. Now our cutcomment() is working perfectly. > I've changed the function to pass the test cases, I think this is what > you meant, if not, more thorough explanation is needed. I think we have got it cleared. Good work! So, one more functional change(I very well hope so) is all that we need to merge these changes to the main tree. A behavioural change with gettoken is needed. I don't really know how hard it would be for you to make this change. It is as follows: ======begin======== char *gettoken(char *str, char **kv, char **end); str - input string *end - the next character after our extracted token *kv - the value part of our token(if its a key/value pair) - malloc'ed Returns -> the key/value part - malloc'ed ======end========== Explanation: Our str can contain tokens of two types: - A normal token which we can call a 'value' - A key/value pair which has a 'key' part and a 'value' part are are joined by an '='. Till now we've been worrying only about just a few words in a line. The words could also be grouped by using quotes. Those rules are to remain the same for our keys and values as well. The only change to be made to gettoken() is to make it recognise a key/value pair and return them. For example, below are the examples for - str, function return value, and *kv respectively. The function of *end remain the same. =1= [Hello World] [Hello] NULL Since Hello and World does not form a key/value pair, *kv should be NULL. =2= [Hello=World] [Hello] [World] This is a very simple and clear example of a key value pair. =3= [Hello = World] [Hello] [World] There can be spaces between a key, value and its '=' =4= ["Hello" = "World"] [Hello] [World] Keys and values can also be in tokens =5= ["Hello=World" = "Parser=Library"] [Hello=World] [Parser=Library] Rules for quotes remain the same - they are opaque objects =6= [Hello === World] [Hello] [World] [Hello = = = World] [Hello] [World] More than one '=' can be joined together =7= [Hello = World = Parser = Library] [Hello] [World Parser Library] Cascading of '=' can cause the values to be joined. Can you think of a better behaviour for this case? Should we have it this way: [Hello = World = Parser = Library] [Hello World Parser] [Library] =8= [Hello = ] [Hello] [] I think by now our requirements are almost clear and complete. What is your opinion? Any suggestions? Jinesh. |
From: Noam P. <npo...@uw...> - 2006-10-10 03:56:50
|
Jinesh K J wrote: > So, one more functional change(I very well hope so) is all that we > need to merge these changes to the main tree. A behavioural change > with gettoken is needed. I don't really know how hard it would be for > you to make this change. It is as follows: > > ======begin======== > > char *gettoken(char *str, char **kv, char **end); > > str - input string > *end - the next character after our extracted token > *kv - the value part of our token(if its a key/value pair) - malloc'ed > > Returns -> the key/value part - malloc'ed > > ======end========== > I think by now our requirements are almost clear and complete. What is > your opinion? Any suggestions? > > > Jinesh. I feel like this it too much functionality to be putting into gettoken. It's true I don't really have much of the big picture, ie how these functions are going to be used in the library, but I think that splitting tokens should be a separate task from interpreting those tokens. That's what it looks like to me, anyway. Noam |
From: Jinesh K J <jin...@gm...> - 2006-10-10 04:37:36
|
> > I feel like this it too much functionality to be putting into gettoken. > It's true I don't really have much of the big picture, ie how these > functions are going to be used in the library, but I think that > splitting tokens should be a separate task from interpreting those tokens. This is how I thought is gonna work these functions for the library: 1. All the lines in the file will converted to an array of strings 2. cutcomment() operates on a line and extracts/removes the comment part 3. getgroup() checks and extracts the group name if the line represents a group declaration. 4. Lastly, gettoken() will extract each token from the line(if step 3 is failed) till the end of line is reached. All the information obtained is stored in the internal structure called 'entry'. The ability of gettoken() to identify the '=' connected data items together as key/value pair will help very much in reducing the complexity of the function transform() in transform.c. Introducing the additional test for checking the key/value pair is better not to be placed in transform(). So, it need to be done in some other function - if not gettoken(), then something else. I thought since gettoken() has done some much for us, may be we can get this one also be done by it. > > That's what it looks like to me, anyway. I think we can proceed either of two ways: 1. Modify gettoken() as I had mentioned previously to get the key/value pairs too. 2. Write another function getkeyval() that utilise gettoken() to do our job. The prototype I suggested remains the same. Any other option do you think there is? |
From: Jinesh K J <jin...@gm...> - 2006-10-11 05:19:12
|
On 10/10/06, Noam Postavsky <npo...@uw...> wrote: > Jinesh K J wrote: > > I think we can proceed either of two ways: > > > > 1. Modify gettoken() as I had mentioned previously to get the > > key/value pairs too. > > 2. Write another function getkeyval() that utilise gettoken() to do > > our job. The prototype I suggested remains the same. > Okay, I've done that. Almost over. I have made a few changes to the test cases. Please go through it. Jinesh. |