I have given myself a gift of dual opteron box with 2 gigs of RAM; and JCF still performs like Pentium 1...
I have profiled JCF beta12. The results for the top
15 are below.
The one thing that spends most time is ExtractFirst() routine. It really must be looked into. All that code is doing is taking the first node from a long TObjectList and then deleting that first node. This is an incredibly wasteful approach. You should either use a Heap or first sort that TObjectlist so that the original first item is at the bottom. That way you can avoid having to continiously resize TObjectlist's internal array.
Then there are things that needn't be function calls at all; such as, ChildNodeCount(), HasParentNode() etc. These should be simple properties.
One other thing I noticed is that you use TObjectlists as internal list for simple classes derived from TObject. I think those classes should best be derived directly from TObjectlist to avoid unnecessary second-level calls.
Anyway, here are the results. In percentage terms.
Interesting. The odd thing is that ExtractFirst is a two-liner
function TSourceTokenList.ExtractFirst: TSourceToken;
begin
Result := SourceTokens[0];
fcList.Extract(Result);
end;
And it's only called from two places in TBuildParseTree, whan turning a token list into a parse tree.
Perhaps the overhead is from removing the first item from the list, and moving the rest up each time. Could be quicker to keep a current index and just increment it each time one is "extracted", then free the list all at once when the end is reached?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, that 2-liner is taking a lion's share. It would be most wise --IMHO-- to get rid of *all* TObjectList.Extract() crap whatever way possible.
Similarly, TObjectList.Delete() and TObjectList.Remove() should be avoided too.
Actually, I was so abhorred the way TObjectList was performing, I attempt to replace it with a TDoubleLinkedObjectList which does not suffer a similar delete performance. But, you use a lot of indexed accesses too. In the end, I would have either get rid of indexed accesses or do full iteration every time it was needed. I gave up.
I think you need to look at the whole thing again and chose either indexed accesses (then you dont really need a TObjectList, a straight forward array will do) or use a linked list of some sort...
My 2 cents or whatever...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hit Count ranking:
TParseTreeNode.ChildNodeCount()
TParseTreeNode.GetChildNodes()
TParseTreeNode.HasParentNode()
TParseTreeNode.IndexOfChild()
As you see, all the interesting stuff (performance-wise) is happening in TParseTreeNode.
Before anything else, it needs to be optimized.
And.. I dont see why this should be a function call. All it does is to set a few defaults for lrVisitResult.
ClearVisitResult(lrVisitResult);
It should be inlined.
What I still can not believe is why it is monopolizing a whole CPU. ProcessMessages() can not help that problem. It needs a more thorough treatment.
Finaly, putting aside the X and Y jargon, can you tell me what exactly does AdvanceTextPos() do in plain English. That routine is also taking a sizable time chunk and I can't tell what it does. If I did I could optimize it too.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I will have to look at these changes in detail outside of company time and incorporate them soon.
AdvanceTextPos's role is simple. Given a current text posistion of say, line 20, indent 5, ie (x, y) = (5, 20)
then if a text string is concatenated into the current page position, what is the resulting page position at the end of that string?
- if the added text string is 5 chars, say "begin", then the resulting position will be line 20, indent 10.
- If the added text string is a return, then then resulting text pos is line 21, indent 1
- the the added text string is a multiline comment, then the resulting position is:
-- the line is incremented by the number of lines in the comment
-- the indent is set equal to the number of chars on the last line of the comment.
This is used to scan through the parse tree, to work out the line and indent (x, y pos) of all tokens on that tree. Just start at the first token, and advance the text position for each token in order.
This X,y data allows you to do all kinds of things, like inspect indent and line length, and give messages like "empty begin end block on line 812"
This data has to be recalculated for the whole unit after any process that changes the spacing and linebreaking. Calls to this could probably be reduced if it's a bottleneck, I was erring on the side of caution by putting TVisitSetXY in the token process pipeline wherever it *might* be needed.
TParseTreeNode.ChildNodeCount could perhaps be cached, if the cached value could be invalidated whenever a child is added or removed underneath...
ProcessMessages has to go in every now and again, or else the UI looks frozen during a long run, like after some bad crash.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
OK, I was wrong: Perf improvement is more than 10 times :-)
Under the exact same compile and run-time conditions, here is how it turned out. These are GetTickCount figures (converted to Seconds from miliSeconds for clarity) so they are as accurate as windows suppies that value.
Current CVS: 261.656 Seconds
My Mod: 11.922 Seconds
So, the improvement is approx 22 times.
I have optimized AdvanceTextPos() so that it is about 3 times faster than before, but I havent yet optimized the places it is called from.
I used 'MSHTML_TLB.pas' from my test corpus as it was the largest file (1,699,065 bytes) in there.
I still think there is some way to go. It should be able to do this file in under 2 secs.
I'll send you my mods (the whole project actually) when I am more satisfied with the results.
Regarding ProcessMessages(): Actually, whenever you need ProcessMessages() what you actually need to do is to put that piece of code in a new thread. Having said that, I have to study the code to see whether it applies --though it must.
One final gripe: TFileCOnverter should really be a TComponent and should have events instead of all those message boxes. Otherwise the whole thing becomes a nightmare when you need to do timing and or when you wish to use JCF in some other code.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> the improvement is approx 22 times.
....
> I'll send you my mods when I am more satisfied with the results
Why wait? I could do another beta within the week, and 22 times faster is not to be scorned.
If it passes all the DUnit tests and doesn't leak memory, lets have a look - you have access to the developers forum and can post it there. You can also check into CVS if it's good.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> If it passes all the DUnit tests and doesn't leak memory
I havent yet done any DUinit tests. I'll do them after I am done with modifications.
But, be warned: I have changed a lot. Not least the overloaded stuff --I had always hated oveloading, and I know why. They simply make code slower. I have had to rename each and every overloaded routine. And the repurcussion is: Every unit has seen changes.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
just a remark on the performance issue: what I read here remembered me on a comment Julian Bucknall made about TList performance starting with D5. Deleting seems to be an O(n) operation starting with D5 (didnt know, if Borland has fixed this on D7). If you have Julians book '
Algorithms and Data Structures' then look at Capter 2, p.42-43 here he describes the problem and developed a better class.
best regards,
Ulrich
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Ulrich , unfortunately, I dont have immediate acces to that book. I was, however, aware of a similar issue about TCollection which seems to have been fixed in D7.
Anthony, you're right about overloads. I backup from that now. Actually, I have backed up a little more in order to make things more manageable.
Here is brief summary of what i did.
First, I have done quite a bit of changes in TSourceTokenList. It is now directly derived from TObjectList.
There is more to it though. In order to accomodate ExtractFirst --which is now Extract(Index: Integer) -- I have introduced a property called StackIndex. You can see its effects in TBuildParseTree.Recognise() and elsewhere in BuildParseTree.pas unit.
In order to gain some more performance, I have also modified these routines in JcfMiscFunctions.pas unit.
Procedure AdvanceTextPos()
Function LastLineLength()
There is some minor mods in PreProcessorExpressionParser.pas and PreProcessorExpressionTokens.pas units.
You can see what they are as I have tried to mark them '{AdemBaba}'. You can delete that when you see fit.
Finally, since I have modified TSourceTokenList to descend from TObjectList, I have taken the liberty to get rid of SourceTokens[] and replaced it with the default Items[] property. And since it is a default property, all occurances of '.SourceTokens[' should be replaced with '.[' everywhere else in the project.
I have imported those files, all test cases are passing, and checked them in to CVS.
Will clean them up a bit before next beta.
My speed test were conducted on 125 files of various sizes, 8K to 1660K (MSHTML_TLB.pas being the biggest one). The files held 7,307,885 bytes total
Averaged 110 seconds for the run, as opposed to test with beta 12, which took 1246 seconds.
i.e. the new version took around 9% of the time of the previous one.
I could be wrong, but I get the feeling that this change seems not to influence the time taken to parse small files much, but has a big effect on large one.
To time runs like this, put the test files in a directory, and parse all files in that dir using JCFGui. On settings|registry settings|Log file check the options for "View log file after each run" and "Log time taken to process".
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you. 11 times the previous is because I did not submit other mods I had done. They would cause too many alterations.
And, it is only natural that the perforance becomes issue for the larger files. The smaller ones are slipping through the radar, but the larger ones are the ones that are likely to cause the max grief.
Now the K question :-)
I have spent the entire week studying JCF code. My focus was not on parsing algos and such, but on the structures used. IMO, there is absolutely no need for indexed access anywhere in the code. IOW, it would benefit immensely if you scrapped all the indexed accesses and went for a more OO solution: Trees.
The interesting thing is, you seemed to aim for trees most of the time but then appeared have felt lazy and used TObjectLists internally :-) --that seems why, for example, TParseTreeNode is named in such a confusing way. Is it a node or is it a tree? :-)
My question is: If I contributed a tree class (one which I think is quite suitable --well, I tailor made it and checked for performance and all), would you use that?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
TobjectList is used on each tree node to store the ordered list of that node's child nodes. Is there a faster way to do this?
The way I coded it is in line with the normal definition of an ordered tree data structure (http://en.wikipedia.org/wiki/Tree_data_structure).
As is usual here, a node is a tree - or rather, each node (having zero or one parent, and zero or more children) is potentially the root of a subtree.
I'm not sure how much you can do differently while still being able to, from any tree node, inspect parent and child nodes, and from any leaf node get the previous and next leaf.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Trouble with TObjectList is this: TObjectList is deceptively simple.
Very useful if you are dealing with a few hundred items, but if it gets to be more than that, or you are doing unforseen (by its creator's vision) things to it --like deleting items at the top or in the middle, you are misusing it.
Naturely, Nature has its ways of penalizing you for that sort of thing :-)
Linked-lists, or trees are what it is trying to tell you to use..
You dont get penalized for removing any node in those structures. You do, only when you try to use indexed-access. You dont want that. You dont need indexed-access, anyway.
OK, here is what it looks like: (I used underscores to keep proper indentation with SF. Copy to text editor and replace underscores with spaces for human readability). See notes below for further info.
Type
__TBaseTree = Class;
__TBaseLinkedItems = Class;
__TBaseLinkedItemsClass = Class Of TBaseLinkedItems;
Now, the interesting thing here is that the TBaseTree holds all of it neatly together. TBaseTree.Root is how you access the root elemet of the tree. You dont store any information in the root.
Notes:
TBaseNode.PosterityCount is there so that you know how many nodes exist below that particular node. It needs to be updated every time the tree is changed. It is a performance sucker if used excessively.
Anyway, unless you are hell bent to need and use TBaseNode.PosterityCount, TBaseNode.Index (and I would really have no idea why you would need this) you can get rid of TBaseTree.ReIndex(), TBaseTree.FullRefresh() and TBaseTree.Cells[] all together --these are PITA for performance.
Again, unless you would die without a GUID identifier, you could get rid of TBaseNode.GUID and TBaseTree.ItemByGUID() and have a very very lean and fast structure.
You'll also notice that TBaseTree.GetPrevSibling() and TBaseTree.GetNextSibling() are deprecated. For good reason. No one in their right mind would want to use them. Use TBaseTree.NextSibling
and TBaseTree.PrevSibling are faster and more elegant.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I forgot. I am not expecting to write this thing (though, of course you're free to do so). It's already written.
I just wanted you to take a look at it.
IMHO, It will improve the speed in a MAJOR way:
TParseTreeNode.VisitTree() needs it badly. VisitTree() is used by TAllProcesses.ApplyVisitorType() which, in turn, is used by every Tom, Dick and Hary in JFC.
Here is something bugging me for sometime: Why did you decide use IVisitParseTree (an Interface). Is it really necessary at all?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> Why did you decide use IVisitParseTree (an Interface). Is it really necessary at all?
Ah, the beauty of open source - it all gets questioned sooner or later, not just left as is "because it's working".
I used an interface these because I wanted to decouple the definition from the implementation, and avoid circular references and the like.
Did it seem like a good idea at the time? Definitely.
Is it necessary? Maybe not.
Does it significantly impact the run speed? Maybe.
Should it be reinvestigated at some time? Definitely.
Also worth considering that most of the processes don't use the pre and post visits, and just tap into VisitSourceToken, and so could use a simpler interface and supporting structure, with 1 virtual method call, not 3.
This is the visitor pattern (http://en.wikipedia.org/wiki/Visitor_pattern) to walk the tree - think of the process as visiting each node on the tree in turn. Perhaps it is overkill.
You are quite correct that this process of visiting and modifying the source tree is what drives all of the JCF processes that do the actual work of transforming JCF input to JCF output.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> > Why did you decide use IVisitParseTree (an
> > Interface). Is it really necessary at all?
> Ah, the beauty of open source - it all gets questioned
> sooner or later, not just left as is "because it's working".
:-)
Too true... But, my departure was not from a POV of purism. I just did not understand it. Otherwise, as we all know, the new sexy thing is to use Interfaces all over the place ;-)
> Did it seem like a good idea at the time? Definitely.
> Is it necessary? Maybe not.
> Does it significantly impact the run speed? Maybe.
> Should it be reinvestigated at some time? Definitely.
Actually, what I was (and still am) questioning was not these. These are architectural decisions and one has to make a decision one way or the other. Plus, I dont think using Interfaces slows things down too much.
Anyway, my issue is with the implementation in
TParseTreeNode.VisitTree().
Here is what I believe TParseTreeNode.VisitTree() should look like in order to speed the whole thing up.
This is, IMHO, the most critical place for speed. (That is, after TParseTreeNode is turned into a 'proper' tree structure).
Procedure TParseTreeNode.VisitTree(Const AVisitor: IVisitParseTree);
Const
__MAX_NODE_CHILDREN = 32768;
Var
__Node1: TBaseNode;
__Node2: TBaseNode;
__VisitResult1: TRVisitResult;
Begin
__ClearVisitResult(VisitResult1);
__AcceptVisitor(AVisitor, VisitResult1);
__Case VisitResult1.Action Of // Process the results
____aNone: ;
____aDelete: Begin //Destroy Self
________Self.Free;
________Exit; // Can't go on. No more Self
______End;
____aInsertAfter: Begin // must have a new item
________Assert(VisitResult1.NewItem <> Nil);
________Node1 := TParseTreeNode(VisitResult1.NewItem);
________Node1.JoinAfter(Self);
________Node2 := TParseTreeNode(VisitResult1.NewItem2);
________If Node2 <> Nil Then Node2.JoinAfter(Node1);
______End;
____aInsertBefore: Begin // must have a new item
________Assert(VisitResult1.NewItem <> Nil);
________TParseTreeNode(VisitResult1.NewItem).JoinBefore(Self); {Insert before me}
________Node2 := TParseTreeNode(VisitResult1.NewItem2);
________If Node2 <> Nil Then Node2.JoinBefore(Self); {Insert before me. Again}
______End;
__Else
____Assert(false, 'Unhandled action ' + IntToStr(Ord(VisitResult1.action)));
__End;
__Node1 := Items.FirstChild;
__While Node1 <> Nil Do Begin
____Node2 := Node1.PrevSibbling; {let us keep a reference to the PrevSibbling, in case Node1 gets removed in VisitTree()}
____TParseTreeNode(Node1).VisitTree(AVisitor);
____If Items.Count > MAX_NODE_CHILDREN Then Raise Exception.Create('Too many child nodes ' + IntToStr(Items.Count)); // some parse or insert process has gone bezerk
____If Assigned(Node1) Then Node1 := Node1.NextSibbling {If Node1 has not been destroyed by VisitTree(), the move on to Node1.NextSibbling}
____Else Begin {If Node1 has been destroyed by VisitTree()}
______If Node2 = Nil Then Node1 := Items.FirstChild {Node1.PrevSibbling was Nil. So lets start with the FirstChild again}
______Else Node1 := Node2.NextSibbling; {This is actually Node1.PrevSibbling.NextSibbling }
____End;
__End;
__AVisitor.PostVisitParseTreeNode(self);
End;
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have replaced StrLastPos() with 2 different routines, i.e. PosLast() and PosLastAndCount(). The latter has the benefit of doing 2 things in one call. Actually, it might be a little faster if PosLastAndCount() was a local procedure in AdvanceTextPos().
Before I go; here is one more comment: I know that you have your own style of naming variables (which, of course, you are more than entitled to), but this being Delphi, do we really need Polish naming styles. Would you not consider Borland's style?
Ah, another one :-) How about formatting JCF code with JCF ;-) so that we would get properly indented units :-)
Anyway, here are the routines. You will of course check them before applying :-)
Function PosLast(Const ASubString, AString: AnsiString; Const ALastPos: Integer = 0): Integer; {AdemBaba}
Var {This is at least two or three times faster than Jedi's StrLastPos. I tested it}
LastCahr1: Char;
Index1: Integer;
Index2: Integer;
Index3: Integer;
Length1: Integer;
Length2: Integer;
Found1: Boolean;
Begin
Result := 0;
Length1 := Length(AString);
Length2 := Length(ASubString);
If ALastPos <> 0 Then Length1 := ALastPos;
If Length2 > Length1 Then Exit
Else Begin
LastCahr1 := ASubString[Length2];
Index1 := Length1;
While Index1 > 0 Do Begin
If (AString[Index1] = LastCahr1) Then Begin
Index2 := Index1;
Index3 := Length2;
Found1 := Index2 >= Length2;
While Found1 And (Index2 > 0) And (Index3 > 0) Do Begin
Found1 := (AString[Index2] = ASubString[Index3]);
Dec(Index2);
Dec(Index3);
End;
If Found1 Then Begin
Result := Index2 + 1;
Exit;
End;
End;
Dec(Index1);
End;
End;
End;
Procedure PosLastAndCount(Const ASubString, AString: AnsiString; Var ALastPos: Integer; Var ACount: Integer);
Var {This gets the last occurance and count in one go. It saves time}
LastCahr1: Char;
Index1: Integer;
Index2: Integer;
Index3: Integer;
Length1: Integer;
Length2: Integer;
Found1: Boolean;
Begin
ACount := 0;
ALastPos := 0;
Length1 := Length(AString);
Length2 := Length(ASubString);
If Length2 > Length1 Then Exit
Else Begin
LastCahr1 := ASubString[Length2];
Index1 := Length1;
While Index1 > 0 Do Begin
If (AString[Index1] = LastCahr1) Then Begin
Index2 := Index1;
Index3 := Length2;
Found1 := Index2 >= Length2;
While Found1 And (Index2 > 0) And (Index3 > 0) Do Begin
Found1 := (AString[Index2] = ASubString[Index3]);
Dec(Index2);
Dec(Index3);
End;
If Found1 Then Begin
If ALastPos = 0 Then ALastPos := Index2 + 1;
Inc(ACount);
Index1 := Index2;
Continue;
End;
End;
Dec(Index1);
End;
End;
End;
Procedure AdvanceTextPos(Const AText: String; Var ARow, ACol: integer); {AdemBaba}
Var
Length1: Integer;
Count1: Integer;
Pos1: integer;
Begin {This is more than 3 times faster than the original. I have meticilously checked that it conforms with the original}
Length1 := Length(AText);
Case Length1 Of
0: ; {Trivial case}
1: Begin
Case AText[1] Of
AnsiCarriageReturn, AnsiLineFeed: Begin {#13 or #10}
Inc(ACol);
ARow := 1; // XPos is indexed from 1
End;
Else
Inc(ARow, Length1)
End;
End;
2: Begin
If (AText[1] = AnsiCarriageReturn) And (AText[2] = AnsiLineFeed) Then Begin
Inc(ACol);
ARow := 1; // XPos is indexed from 1
End Else Inc(ARow, Length1);
End;
Else
PosLastAndCount(AnsiLineBreak, AText, Pos1, Count1);
If Pos1 <= 0 Then Inc(ARow, Length1)
Else Begin // multiline
Inc(ACol, Count1);
ARow := Length1 - (Pos1 + 2); {2 = Length(AnsiLineBreak)}
End;
End;
End;
Function LastLineLength(Const AString: String): integer;
Var { in a multiline sting, how many chars on last line (after last return) }
Pos1: integer;
Begin
Pos1 := PosLast(AnsiLineBreak, AString); {AdemBaba}
If Pos1 <= 0 Then Result := Length(AString)
Else Result := Length(AString) - (Pos1 + Length(AnsiLineBreak));
End;
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
My gripes about performance continues...
I have given myself a gift of dual opteron box with 2 gigs of RAM; and JCF still performs like Pentium 1...
I have profiled JCF beta12. The results for the top
15 are below.
The one thing that spends most time is ExtractFirst() routine. It really must be looked into. All that code is doing is taking the first node from a long TObjectList and then deleting that first node. This is an incredibly wasteful approach. You should either use a Heap or first sort that TObjectlist so that the original first item is at the bottom. That way you can avoid having to continiously resize TObjectlist's internal array.
Then there are things that needn't be function calls at all; such as, ChildNodeCount(), HasParentNode() etc. These should be simple properties.
One other thing I noticed is that you use TObjectlists as internal list for simple classes derived from TObject. I think those classes should best be derived directly from TObjectlist to avoid unnecessary second-level calls.
Anyway, here are the results. In percentage terms.
% Time
TSourceTokenList.ExtractFirst 62.71
TParseTreeNode.VisitTree 7.65
TParseTreeNode.ChildNodeCount 3.37
TParseTreeNode.HasParentNode 2.43
TParseTreeNode.GetChildNodes 2.40
TParseTreeNode.IndexOfChild 1.84
TBuildTokenList.TryWord 1.33
TParseTreeNode.SolidChildCount 1.07
TSourceToken.AcceptVisitor 1.04
TParseTreeNode.AcceptVisitor 0.99
TParseTreeNode.Destroy 0.96
TNestingLevelList.Destroy 0.96
ClearVisitResult 0.88
TNestingLevelList.Create 0.83
% Hit Count
TParseTreeNode.ChildNodeCount 22.24
TParseTreeNode.HasParentNode 11.28
TParseTreeNode.GetChildNodes 10.04
TParseTreeNode.IndexOfChild 6.85
TParseTreeNode.VisitTree 6.54
ClearVisitResult 6.54
TParseTreeNode.AcceptVisitor 3.52
TSourceToken.AcceptVisitor 3.02
TCodeReader.Current 2.85
TSourceToken.IsSolid 2.78
TParseTreeNode.SolidChildCount 1.71
TSwitchableVisitor.CheckEnabled 1.32
TSwitchableVisitor.VisitSourceToken 1.32
TSourceToken.SolidChildCount 1.31
Interesting. The odd thing is that ExtractFirst is a two-liner
function TSourceTokenList.ExtractFirst: TSourceToken;
begin
Result := SourceTokens[0];
fcList.Extract(Result);
end;
And it's only called from two places in TBuildParseTree, whan turning a token list into a parse tree.
Perhaps the overhead is from removing the first item from the list, and moving the rest up each time. Could be quicker to keep a current index and just increment it each time one is "extracted", then free the list all at once when the end is reached?
Yes, that 2-liner is taking a lion's share. It would be most wise --IMHO-- to get rid of *all* TObjectList.Extract() crap whatever way possible.
Similarly, TObjectList.Delete() and TObjectList.Remove() should be avoided too.
Actually, I was so abhorred the way TObjectList was performing, I attempt to replace it with a TDoubleLinkedObjectList which does not suffer a similar delete performance. But, you use a lot of indexed accesses too. In the end, I would have either get rid of indexed accesses or do full iteration every time it was needed. I gave up.
I think you need to look at the whole thing again and chose either indexed accesses (then you dont really need a TObjectList, a straight forward array will do) or use a linked list of some sort...
My 2 cents or whatever...
I have tried an iterator instead of an extract for that and related procedures - no big difference in run time over last beta.
OK. I tried a few ways. Finally I have got JCF to go around 10-12 times faster. Now, the performance is nearer to bearable.
I have derived TSourceTokenList from TObjectList and then made it into some sort of stack + list.
I have also slightly modified TNestingLevelList (it too descends directly from TObjectList). It helps things go a little faster.
There a few others to take care of. Now, the worst offenders are:
TParseTreeNode.VisitTree() --> 21 % (Time)
TParseTreeNode.IndexOfChild() --> 14 % (Time)
Hit Count ranking:
TParseTreeNode.ChildNodeCount()
TParseTreeNode.GetChildNodes()
TParseTreeNode.HasParentNode()
TParseTreeNode.IndexOfChild()
As you see, all the interesting stuff (performance-wise) is happening in TParseTreeNode.
Before anything else, it needs to be optimized.
And.. I dont see why this should be a function call. All it does is to set a few defaults for lrVisitResult.
ClearVisitResult(lrVisitResult);
It should be inlined.
What I still can not believe is why it is monopolizing a whole CPU. ProcessMessages() can not help that problem. It needs a more thorough treatment.
Finaly, putting aside the X and Y jargon, can you tell me what exactly does AdvanceTextPos() do in plain English. That routine is also taking a sizable time chunk and I can't tell what it does. If I did I could optimize it too.
Very impressive!
I will have to look at these changes in detail outside of company time and incorporate them soon.
AdvanceTextPos's role is simple. Given a current text posistion of say, line 20, indent 5, ie (x, y) = (5, 20)
then if a text string is concatenated into the current page position, what is the resulting page position at the end of that string?
- if the added text string is 5 chars, say "begin", then the resulting position will be line 20, indent 10.
- If the added text string is a return, then then resulting text pos is line 21, indent 1
- the the added text string is a multiline comment, then the resulting position is:
-- the line is incremented by the number of lines in the comment
-- the indent is set equal to the number of chars on the last line of the comment.
This is used to scan through the parse tree, to work out the line and indent (x, y pos) of all tokens on that tree. Just start at the first token, and advance the text position for each token in order.
This X,y data allows you to do all kinds of things, like inspect indent and line length, and give messages like "empty begin end block on line 812"
This data has to be recalculated for the whole unit after any process that changes the spacing and linebreaking. Calls to this could probably be reduced if it's a bottleneck, I was erring on the side of caution by putting TVisitSetXY in the token process pipeline wherever it *might* be needed.
TParseTreeNode.ChildNodeCount could perhaps be cached, if the cached value could be invalidated whenever a child is added or removed underneath...
ProcessMessages has to go in every now and again, or else the UI looks frozen during a long run, like after some bad crash.
OK, I was wrong: Perf improvement is more than 10 times :-)
Under the exact same compile and run-time conditions, here is how it turned out. These are GetTickCount figures (converted to Seconds from miliSeconds for clarity) so they are as accurate as windows suppies that value.
Current CVS: 261.656 Seconds
My Mod: 11.922 Seconds
So, the improvement is approx 22 times.
I have optimized AdvanceTextPos() so that it is about 3 times faster than before, but I havent yet optimized the places it is called from.
I used 'MSHTML_TLB.pas' from my test corpus as it was the largest file (1,699,065 bytes) in there.
I still think there is some way to go. It should be able to do this file in under 2 secs.
I'll send you my mods (the whole project actually) when I am more satisfied with the results.
Regarding ProcessMessages(): Actually, whenever you need ProcessMessages() what you actually need to do is to put that piece of code in a new thread. Having said that, I have to study the code to see whether it applies --though it must.
One final gripe: TFileCOnverter should really be a TComponent and should have events instead of all those message boxes. Otherwise the whole thing becomes a nightmare when you need to do timing and or when you wish to use JCF in some other code.
> the improvement is approx 22 times.
....
> I'll send you my mods when I am more satisfied with the results
Why wait? I could do another beta within the week, and 22 times faster is not to be scorned.
If it passes all the DUnit tests and doesn't leak memory, lets have a look - you have access to the developers forum and can post it there. You can also check into CVS if it's good.
> RE: 222 times is not enough?
Well, if I only could get it to there :-)
> If it passes all the DUnit tests and doesn't leak memory
I havent yet done any DUinit tests. I'll do them after I am done with modifications.
But, be warned: I have changed a lot. Not least the overloaded stuff --I had always hated oveloading, and I know why. They simply make code slower. I have had to rename each and every overloaded routine. And the repurcussion is: Every unit has seen changes.
> overloading ... makes code slower
I thought it didn't matter to run speed at all, since which overload to call is resolved when compiling.
The D7 help says "the compiler determines which function to invoke by looking at the actual parameters passed in the call"
I may be wrong. But, I did see performance improvement.
OK, not much. Something like 10-15 percent. But, I have done it all :-(
I have just done a test - source below, and found no difference with/without overloads.
Perhaps some of the object list subclasses rather than wrappers will benefit the next beta substantially?
--------------------------
unit Unit1;
interface
uses
Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
Dialogs, StdCtrls;
type
TForm1 = class(TForm)
btntime: TButton;
procedure btntimeClick(Sender: TObject);
procedure FormCreate(Sender: TObject);
private
{ Private declarations }
fiTotal: double;
procedure Add(const pi: integer); overload;
{ overloads
procedure Add(const pf: double); overload;
procedure Add(const ps: string); overload;
procedure Add(const piX, piY: integer); overload;
procedure Add(var foo: TObject); overload;
procedure Add(var foo: TObject; pix: integer); overload;
}
public
end;
var
Form1: TForm1;
implementation
{$R *.dfm}
const LOOP_MAX = 500000000;
procedure TForm1.Add(const pi: integer);
begin
fiTotal := fiTotal + pi;
end;
{ overloads
procedure TForm1.Add(const ps: string);
begin
Add(1);
end;
procedure TForm1.Add(const pf: double);
begin
Add(round(pf));
end;
procedure TForm1.Add(var foo: TObject);
begin
Add(1);
end;
procedure TForm1.Add(const piX, piY: integer);
begin
Add(piX + piY);
end;
procedure TForm1.Add(var foo: TObject; pix: integer);
begin
Add(piX);
end;
}
procedure TForm1.btntimeClick(Sender: TObject);
var
liLoop: integer;
dwStart, dwEnd: DWord;
begin
dwStart := GetTickCount;
for liLoop := 0 to LOOP_MAX do
begin
// how long does it take to call this? does the presence of overloads matter?
Add(liLoop);
end;
dwEnd := GetTickCount;
ShowMessage('Time taken is ' + IntToStr(dwEnd - dwStart) + ' ticks');
end;
procedure TForm1.FormCreate(Sender: TObject);
begin
fiTotal := 0;
end;
end.
Hi,
just a remark on the performance issue: what I read here remembered me on a comment Julian Bucknall made about TList performance starting with D5. Deleting seems to be an O(n) operation starting with D5 (didnt know, if Borland has fixed this on D7). If you have Julians book '
Algorithms and Data Structures' then look at Capter 2, p.42-43 here he describes the problem and developed a better class.
best regards,
Ulrich
Ulrich , unfortunately, I dont have immediate acces to that book. I was, however, aware of a similar issue about TCollection which seems to have been fixed in D7.
Anthony, you're right about overloads. I backup from that now. Actually, I have backed up a little more in order to make things more manageable.
Here is brief summary of what i did.
First, I have done quite a bit of changes in TSourceTokenList. It is now directly derived from TObjectList.
There is more to it though. In order to accomodate ExtractFirst --which is now Extract(Index: Integer) -- I have introduced a property called StackIndex. You can see its effects in TBuildParseTree.Recognise() and elsewhere in BuildParseTree.pas unit.
In order to gain some more performance, I have also modified these routines in JcfMiscFunctions.pas unit.
Procedure AdvanceTextPos()
Function LastLineLength()
There is some minor mods in PreProcessorExpressionParser.pas and PreProcessorExpressionTokens.pas units.
You can see what they are as I have tried to mark them '{AdemBaba}'. You can delete that when you see fit.
Finally, since I have modified TSourceTokenList to descend from TObjectList, I have taken the liberty to get rid of SourceTokens[] and replaced it with the default Items[] property. And since it is a default property, all occurances of '.SourceTokens[' should be replaced with '.[' everywhere else in the project.
The files mentioned above is at
http://sourceforge.net/tracker/index.php?func=detail&aid=900836&group_id=41564&atid=430782
Note: After these modifications I get 5 failures with DUnit test. I did look at them and I could not see what was wrong. Can you look please.
Anyway, hope you like them.
I have imported those files, all test cases are passing, and checked them in to CVS.
Will clean them up a bit before next beta.
My speed test were conducted on 125 files of various sizes, 8K to 1660K (MSHTML_TLB.pas being the biggest one). The files held 7,307,885 bytes total
Averaged 110 seconds for the run, as opposed to test with beta 12, which took 1246 seconds.
i.e. the new version took around 9% of the time of the previous one.
I could be wrong, but I get the feeling that this change seems not to influence the time taken to parse small files much, but has a big effect on large one.
To time runs like this, put the test files in a directory, and parse all files in that dir using JCFGui. On settings|registry settings|Log file check the options for "View log file after each run" and "Log time taken to process".
Thank you. 11 times the previous is because I did not submit other mods I had done. They would cause too many alterations.
And, it is only natural that the perforance becomes issue for the larger files. The smaller ones are slipping through the radar, but the larger ones are the ones that are likely to cause the max grief.
Now the K question :-)
I have spent the entire week studying JCF code. My focus was not on parsing algos and such, but on the structures used. IMO, there is absolutely no need for indexed access anywhere in the code. IOW, it would benefit immensely if you scrapped all the indexed accesses and went for a more OO solution: Trees.
The interesting thing is, you seemed to aim for trees most of the time but then appeared have felt lazy and used TObjectLists internally :-) --that seems why, for example, TParseTreeNode is named in such a confusing way. Is it a node or is it a tree? :-)
My question is: If I contributed a tree class (one which I think is quite suitable --well, I tailor made it and checked for performance and all), would you use that?
TobjectList is used on each tree node to store the ordered list of that node's child nodes. Is there a faster way to do this?
The way I coded it is in line with the normal definition of an ordered tree data structure (http://en.wikipedia.org/wiki/Tree_data_structure).
As is usual here, a node is a tree - or rather, each node (having zero or one parent, and zero or more children) is potentially the root of a subtree.
I'm not sure how much you can do differently while still being able to, from any tree node, inspect parent and child nodes, and from any leaf node get the previous and next leaf.
Trouble with TObjectList is this: TObjectList is deceptively simple.
Very useful if you are dealing with a few hundred items, but if it gets to be more than that, or you are doing unforseen (by its creator's vision) things to it --like deleting items at the top or in the middle, you are misusing it.
Naturely, Nature has its ways of penalizing you for that sort of thing :-)
Linked-lists, or trees are what it is trying to tell you to use..
You dont get penalized for removing any node in those structures. You do, only when you try to use indexed-access. You dont want that. You dont need indexed-access, anyway.
OK, here is what it looks like: (I used underscores to keep proper indentation with SF. Copy to text editor and replace underscores with spaces for human readability). See notes below for further info.
Type
__TBaseTree = Class;
__TBaseLinkedItems = Class;
__TBaseLinkedItemsClass = Class Of TBaseLinkedItems;
__TBaseNode = Class(TObject)
__Private
__Public
____Constructor Create(ATree: TBaseTree); Reintroduce; Virtual;
____Destructor Destroy; Override;
____Procedure Assign(ASource: TObject); Virtual;
____Procedure Clear; Virtual;
____Procedure JoinAfter(ANode: TBaseNode); Virtual;
____Procedure JoinBefore(ANode: TBaseNode); Virtual;
____Procedure LeaveParent; Virtual;
____Property Peer: TObject Read FPeer Write SetPeer;
____Property Index: Integer Read FIndex; {see Notes}
____Property Level: Integer Read FLevel;
____Property Tag: Integer Read FTag Write FTag;
____Property Items: TBaseLinkedItems Read FItems;
____Property NextSibling: TBaseNode Read FNextSibling;
____Property ParentItems: TBaseLinkedItems Read FParentItems;
____Property PrevSibling: TBaseNode Read FPrevSibling;
____Property HasChild: Boolean Read FHasChild;
____Property Tree: TBaseTree Read FTree;
____Property PosterityCount: Integer Read FPosterityCount; {see Notes}
____Property GUID: TGUID Read FGUID; {see Notes}
__End;
__TBaseNodeClass = Class Of TBaseNode;
__TBaseLinkedItems = Class(TObject)
__Private
__Public
____Constructor Create(AParentNode: TBaseNode); Reintroduce; Virtual;
____Destructor Destroy; Override;
____Function AddChild: TBaseNode;
____Procedure Assign(ASource: TObject); Virtual;
____Procedure Clear; Virtual;
____Procedure ReIndex(ARecursive: Boolean = False); {see Notes}
____Property Count: Integer Read FCount;
____Property FirstChild: TBaseNode Read FFirstChild;
____Property LastChild: TBaseNode Read FLastChild;
____Property Level: Integer Read FLevel;
____Property ParentNode: TBaseNode Read FParentNode;
__End;
__TBaseTree = Class(TObject)
__Private
__Protected
__Public
____Property Cells[ACol, ARow: Integer]: TBaseNode Read GetCell Write SetCell; {Deprecated;} {see Notes}
__Published
____Constructor Create(Const ANodeClass: TBaseNodeClass = Nil; Const AItemListClass: TBaseLinkedItemsClass = Nil); Virtual;
____Destructor Destroy; Override;
____Procedure FullRefresh; Virtual; {see Notes}
____Function ItemByGUID(AGUID: TGUID): TBaseNode; {see Notes}
____Function GetPrevSibling(ANode: TBaseNode): TBaseNode; {Deprecated;}
____Function GetNextSibling(ANode: TBaseNode): TBaseNode; {Deprecated;}
____Property FullRefreshNeeded: Boolean Read FFullRefreshNeeded; {see Notes}
____Property Root: TBaseNode Read FRoot;
____Property OwnsObjects: Boolean Read FOwnsObjects Write SetOwnsObjects Default True;
__End;
Now, the interesting thing here is that the TBaseTree holds all of it neatly together. TBaseTree.Root is how you access the root elemet of the tree. You dont store any information in the root.
Notes:
TBaseNode.PosterityCount is there so that you know how many nodes exist below that particular node. It needs to be updated every time the tree is changed. It is a performance sucker if used excessively.
Anyway, unless you are hell bent to need and use TBaseNode.PosterityCount, TBaseNode.Index (and I would really have no idea why you would need this) you can get rid of TBaseTree.ReIndex(), TBaseTree.FullRefresh() and TBaseTree.Cells[] all together --these are PITA for performance.
Again, unless you would die without a GUID identifier, you could get rid of TBaseNode.GUID and TBaseTree.ItemByGUID() and have a very very lean and fast structure.
You'll also notice that TBaseTree.GetPrevSibling() and TBaseTree.GetNextSibling() are deprecated. For good reason. No one in their right mind would want to use them. Use TBaseTree.NextSibling
and TBaseTree.PrevSibling are faster and more elegant.
I forgot. I am not expecting to write this thing (though, of course you're free to do so). It's already written.
I just wanted you to take a look at it.
IMHO, It will improve the speed in a MAJOR way:
TParseTreeNode.VisitTree() needs it badly. VisitTree() is used by TAllProcesses.ApplyVisitorType() which, in turn, is used by every Tom, Dick and Hary in JFC.
Here is something bugging me for sometime: Why did you decide use IVisitParseTree (an Interface). Is it really necessary at all?
> Why did you decide use IVisitParseTree (an Interface). Is it really necessary at all?
Ah, the beauty of open source - it all gets questioned sooner or later, not just left as is "because it's working".
I used an interface these because I wanted to decouple the definition from the implementation, and avoid circular references and the like.
Did it seem like a good idea at the time? Definitely.
Is it necessary? Maybe not.
Does it significantly impact the run speed? Maybe.
Should it be reinvestigated at some time? Definitely.
Also worth considering that most of the processes don't use the pre and post visits, and just tap into VisitSourceToken, and so could use a simpler interface and supporting structure, with 1 virtual method call, not 3.
This is the visitor pattern (http://en.wikipedia.org/wiki/Visitor_pattern) to walk the tree - think of the process as visiting each node on the tree in turn. Perhaps it is overkill.
You are quite correct that this process of visiting and modifying the source tree is what drives all of the JCF processes that do the actual work of transforming JCF input to JCF output.
> > Why did you decide use IVisitParseTree (an
> > Interface). Is it really necessary at all?
> Ah, the beauty of open source - it all gets questioned
> sooner or later, not just left as is "because it's working".
:-)
Too true... But, my departure was not from a POV of purism. I just did not understand it. Otherwise, as we all know, the new sexy thing is to use Interfaces all over the place ;-)
> Did it seem like a good idea at the time? Definitely.
> Is it necessary? Maybe not.
> Does it significantly impact the run speed? Maybe.
> Should it be reinvestigated at some time? Definitely.
Actually, what I was (and still am) questioning was not these. These are architectural decisions and one has to make a decision one way or the other. Plus, I dont think using Interfaces slows things down too much.
Anyway, my issue is with the implementation in
TParseTreeNode.VisitTree().
Here is what I believe TParseTreeNode.VisitTree() should look like in order to speed the whole thing up.
This is, IMHO, the most critical place for speed. (That is, after TParseTreeNode is turned into a 'proper' tree structure).
Procedure TParseTreeNode.VisitTree(Const AVisitor: IVisitParseTree);
Const
__MAX_NODE_CHILDREN = 32768;
Var
__Node1: TBaseNode;
__Node2: TBaseNode;
__VisitResult1: TRVisitResult;
Begin
__ClearVisitResult(VisitResult1);
__AcceptVisitor(AVisitor, VisitResult1);
__Case VisitResult1.Action Of // Process the results
____aNone: ;
____aDelete: Begin //Destroy Self
________Self.Free;
________Exit; // Can't go on. No more Self
______End;
____aInsertAfter: Begin // must have a new item
________Assert(VisitResult1.NewItem <> Nil);
________Node1 := TParseTreeNode(VisitResult1.NewItem);
________Node1.JoinAfter(Self);
________Node2 := TParseTreeNode(VisitResult1.NewItem2);
________If Node2 <> Nil Then Node2.JoinAfter(Node1);
______End;
____aInsertBefore: Begin // must have a new item
________Assert(VisitResult1.NewItem <> Nil);
________TParseTreeNode(VisitResult1.NewItem).JoinBefore(Self); {Insert before me}
________Node2 := TParseTreeNode(VisitResult1.NewItem2);
________If Node2 <> Nil Then Node2.JoinBefore(Self); {Insert before me. Again}
______End;
__Else
____Assert(false, 'Unhandled action ' + IntToStr(Ord(VisitResult1.action)));
__End;
__Node1 := Items.FirstChild;
__While Node1 <> Nil Do Begin
____Node2 := Node1.PrevSibbling; {let us keep a reference to the PrevSibbling, in case Node1 gets removed in VisitTree()}
____TParseTreeNode(Node1).VisitTree(AVisitor);
____If Items.Count > MAX_NODE_CHILDREN Then Raise Exception.Create('Too many child nodes ' + IntToStr(Items.Count)); // some parse or insert process has gone bezerk
____If Assigned(Node1) Then Node1 := Node1.NextSibbling {If Node1 has not been destroyed by VisitTree(), the move on to Node1.NextSibbling}
____Else Begin {If Node1 has been destroyed by VisitTree()}
______If Node2 = Nil Then Node1 := Items.FirstChild {Node1.PrevSibbling was Nil. So lets start with the FirstChild again}
______Else Node1 := Node2.NextSibbling; {This is actually Node1.PrevSibbling.NextSibbling }
____End;
__End;
__AVisitor.PostVisitParseTreeNode(self);
End;
> IVisitParseTree ... is it really necessary at all?
No. I have rewriten this with a simpler architecture, using no interface. It seems to be around 15% faster.
To to recap on speeds, my tests showed times to do the same set of files as follows:
Beta 12: Around 1240 seconds
After mods to sourcetokenList and buildParseTree: around 125 seconds
After mods to AdvanceTextPos and LastLineLen: Around 108 seconds. This is Beta 13.
With new treewalker: Around 90 seconds.
I will check this into CVS when it passes all DUnit regression tests
now checked in
> I have optimized AdvanceTextPos() so that it is about 3 times faster than before, but I havent yet optimized the places it is called from.
This is likely to be an isolated change, and could possibly be included with little disruption?
OK. Here it is; or rather they are.
I have replaced StrLastPos() with 2 different routines, i.e. PosLast() and PosLastAndCount(). The latter has the benefit of doing 2 things in one call. Actually, it might be a little faster if PosLastAndCount() was a local procedure in AdvanceTextPos().
Before I go; here is one more comment: I know that you have your own style of naming variables (which, of course, you are more than entitled to), but this being Delphi, do we really need Polish naming styles. Would you not consider Borland's style?
Ah, another one :-) How about formatting JCF code with JCF ;-) so that we would get properly indented units :-)
Anyway, here are the routines. You will of course check them before applying :-)
Function PosLast(Const ASubString, AString: AnsiString; Const ALastPos: Integer = 0): Integer; {AdemBaba}
Var {This is at least two or three times faster than Jedi's StrLastPos. I tested it}
LastCahr1: Char;
Index1: Integer;
Index2: Integer;
Index3: Integer;
Length1: Integer;
Length2: Integer;
Found1: Boolean;
Begin
Result := 0;
Length1 := Length(AString);
Length2 := Length(ASubString);
If ALastPos <> 0 Then Length1 := ALastPos;
If Length2 > Length1 Then Exit
Else Begin
LastCahr1 := ASubString[Length2];
Index1 := Length1;
While Index1 > 0 Do Begin
If (AString[Index1] = LastCahr1) Then Begin
Index2 := Index1;
Index3 := Length2;
Found1 := Index2 >= Length2;
While Found1 And (Index2 > 0) And (Index3 > 0) Do Begin
Found1 := (AString[Index2] = ASubString[Index3]);
Dec(Index2);
Dec(Index3);
End;
If Found1 Then Begin
Result := Index2 + 1;
Exit;
End;
End;
Dec(Index1);
End;
End;
End;
Procedure PosLastAndCount(Const ASubString, AString: AnsiString; Var ALastPos: Integer; Var ACount: Integer);
Var {This gets the last occurance and count in one go. It saves time}
LastCahr1: Char;
Index1: Integer;
Index2: Integer;
Index3: Integer;
Length1: Integer;
Length2: Integer;
Found1: Boolean;
Begin
ACount := 0;
ALastPos := 0;
Length1 := Length(AString);
Length2 := Length(ASubString);
If Length2 > Length1 Then Exit
Else Begin
LastCahr1 := ASubString[Length2];
Index1 := Length1;
While Index1 > 0 Do Begin
If (AString[Index1] = LastCahr1) Then Begin
Index2 := Index1;
Index3 := Length2;
Found1 := Index2 >= Length2;
While Found1 And (Index2 > 0) And (Index3 > 0) Do Begin
Found1 := (AString[Index2] = ASubString[Index3]);
Dec(Index2);
Dec(Index3);
End;
If Found1 Then Begin
If ALastPos = 0 Then ALastPos := Index2 + 1;
Inc(ACount);
Index1 := Index2;
Continue;
End;
End;
Dec(Index1);
End;
End;
End;
Procedure AdvanceTextPos(Const AText: String; Var ARow, ACol: integer); {AdemBaba}
Var
Length1: Integer;
Count1: Integer;
Pos1: integer;
Begin {This is more than 3 times faster than the original. I have meticilously checked that it conforms with the original}
Length1 := Length(AText);
Case Length1 Of
0: ; {Trivial case}
1: Begin
Case AText[1] Of
AnsiCarriageReturn, AnsiLineFeed: Begin {#13 or #10}
Inc(ACol);
ARow := 1; // XPos is indexed from 1
End;
Else
Inc(ARow, Length1)
End;
End;
2: Begin
If (AText[1] = AnsiCarriageReturn) And (AText[2] = AnsiLineFeed) Then Begin
Inc(ACol);
ARow := 1; // XPos is indexed from 1
End Else Inc(ARow, Length1);
End;
Else
PosLastAndCount(AnsiLineBreak, AText, Pos1, Count1);
If Pos1 <= 0 Then Inc(ARow, Length1)
Else Begin // multiline
Inc(ACol, Count1);
ARow := Length1 - (Pos1 + 2); {2 = Length(AnsiLineBreak)}
End;
End;
End;
Function LastLineLength(Const AString: String): integer;
Var { in a multiline sting, how many chars on last line (after last return) }
Pos1: integer;
Begin
Pos1 := PosLast(AnsiLineBreak, AString); {AdemBaba}
If Pos1 <= 0 Then Result := Length(AString)
Else Result := Length(AString) - (Pos1 + Length(AnsiLineBreak));
End;