Thread: [CEDET-devel] Why project-specific caches?
Brought to you by:
zappo
From: Daniel C. <da...@da...> - 2014-03-18 01:40:28
Attachments:
signature.asc
|
Why does every project type I've seen maintain its own cache of projects, usually managed in the implementation of the load-type slot, instead of just relying on the global one managed by auto.el? |
From: Eric M. L. <er...@si...> - 2014-03-22 02:41:20
|
On 03/17/2014 09:10 PM, Daniel Colascione wrote: > Why does every project type I've seen maintain its own cache of > projects, usually managed in the implementation of the load-type slot, > instead of just relying on the global one managed by auto.el? For some projects, it is necessary, such as ede-project-root. For others it is primarily for performance. If a project was already detected, you can save a bunch of time by testing against existing projects. Some projects can only be detected from the root of the project. For such a project EDE will not see your project unless it checks the roots of previously found projects of the same type. If you have a long list of different kinds of projects, there is no sense testing projects not of the same time you are in. Some of it is historical too. The independence between the projects has some to do with it. In retrospect I have also thought it would be better to search to core list only once from the core instead of asking each project to do it one at a time. Eric |
From: Daniel C. <da...@da...> - 2014-03-24 20:54:06
Attachments:
signature.asc
|
On 03/21/2014 07:41 PM, Eric M. Ludlam wrote: > On 03/17/2014 09:10 PM, Daniel Colascione wrote: >> Why does every project type I've seen maintain its own cache of >> projects, usually managed in the implementation of the load-type slot, >> instead of just relying on the global one managed by auto.el? > > For some projects, it is necessary, such as ede-project-root. For > others it is primarily for performance. > > If a project was already detected, you can save a bunch of time by > testing against existing projects. > > Some projects can only be detected from the root of the project. For > such a project EDE will not see your project unless it checks the roots > of previously found projects of the same type. > > If you have a long list of different kinds of projects, there is no > sense testing projects not of the same time you are in. > > Some of it is historical too. The independence between the projects has > some to do with it. In retrospect I have also thought it would be > better to search to core list only once from the core instead of asking > each project to do it one at a time. Isn't that what the code in auto.el does now? In ede-load-project-file, we see whether we have a projects in ede-projects corresponding to a directory; if we don't, we call into ede-auto-load-project, which builds a project and it to ede-projects via ede-add-project-to-global-list. So why does each project type have to redundantly maintain its own list of projects? We already have a global list. All this complexity is very confusing when trying to create a new project type. The latest problem I'm having is that there are weird state dependencies, and sometimes detection fails with ede-object-project being nil and somethings with both ede-object-project and ede-object-root-project being nil. (My project type has no subprojects, and ede-project-root-directory works fine.) I wish there were a much simpler way to just wire up a simple project (for a type for which we don't have some kind of existing XXX-root thing pre-built). |
From: Eric M. L. <er...@si...> - 2014-03-25 00:47:47
|
On 03/24/2014 04:53 PM, Daniel Colascione wrote: > On 03/21/2014 07:41 PM, Eric M. Ludlam wrote: >> On 03/17/2014 09:10 PM, Daniel Colascione wrote: >>> Why does every project type I've seen maintain its own cache of >>> projects, usually managed in the implementation of the load-type slot, >>> instead of just relying on the global one managed by auto.el? >> >> For some projects, it is necessary, such as ede-project-root. For >> others it is primarily for performance. >> >> If a project was already detected, you can save a bunch of time by >> testing against existing projects. >> >> Some projects can only be detected from the root of the project. For >> such a project EDE will not see your project unless it checks the roots >> of previously found projects of the same type. >> >> If you have a long list of different kinds of projects, there is no >> sense testing projects not of the same time you are in. >> >> Some of it is historical too. The independence between the projects has >> some to do with it. In retrospect I have also thought it would be >> better to search to core list only once from the core instead of asking >> each project to do it one at a time. > > Isn't that what the code in auto.el does now? In ede-load-project-file, > we see whether we have a projects in ede-projects corresponding to a > directory; if we don't, we call into ede-auto-load-project, which builds > a project and it to ede-projects via ede-add-project-to-global-list. > > So why does each project type have to redundantly maintain its own list > of projects? We already have a global list. Hi Daniel, I agree that project loading is a bit confusing. A vast majority of the complexity is due to performance optimization. As EDE started providing facilities to other operations, such as Semantic for finding header files, the poor performance forced performance optimizations. I also spent time trying to get EDE to function where I work, and we have network files systems there, and the old behavior actually started to cause our filers to crash due to too many Emacs users querying for project files that didn't exist at the filer root where the automounter kicks in. What this means is that EDE now ONLY checks the current directory for the different projects. It doesn't scan upward for a project root if no project is found. Some projects, such as the one that leaves Project.ede files around will allow the auto-loader to identify a project, and only THEN will it scan upward for the root. This too was pretty slow, and the local variable for the root project was added to speed that up. Anyway, this means that any project that ONLY has an identifying project file at the root needs to handle the case where the user opens a file in a subdirectory. It used to be this was the minority case, so it was handled only in the project definitions. I think the ratios have since changed as new project styles I've used have in a majority of cases only had a unique identifier at the root. > All this complexity is very confusing when trying to create a new > project type. I agree. I think it would be worthwhile to take this common case and pull some of the logic up into the core of EDE. That is bound to simplify creating new projects. Fortunately, after initial project identification is done, everything is cached internal to EDE and your code won't be called anymore except in new directories. The latest problem I'm having is that there are weird > state dependencies, and sometimes detection fails with > ede-object-project being nil and somethings with both ede-object-project > and ede-object-root-project being nil. (My project type has no > subprojects, and ede-project-root-directory works fine.) I wish there > were a much simpler way to just wire up a simple project (for a type for > which we don't have some kind of existing XXX-root thing pre-built). If this happens while you are in the middle of testing changes in your ede project, you may be encountering cached results from a previous test run. You can use ede-flush-directory-hash to clear out any pesky caches. You can also ede-flush-project-hash to clear out data from any calls that use ede-locate. That seems like an unlikely cause here though. Eric |
From: Daniel C. <da...@da...> - 2014-03-25 01:17:58
Attachments:
signature.asc
|
On 03/24/2014 05:47 PM, Eric M. Ludlam wrote: > On 03/24/2014 04:53 PM, Daniel Colascione wrote: >> On 03/21/2014 07:41 PM, Eric M. Ludlam wrote: >>> On 03/17/2014 09:10 PM, Daniel Colascione wrote: >>>> Why does every project type I've seen maintain its own cache of >>>> projects, usually managed in the implementation of the load-type slot, >>>> instead of just relying on the global one managed by auto.el? >>> >>> For some projects, it is necessary, such as ede-project-root. For >>> others it is primarily for performance. >>> >>> If a project was already detected, you can save a bunch of time by >>> testing against existing projects. >>> >>> Some projects can only be detected from the root of the project. For >>> such a project EDE will not see your project unless it checks the roots >>> of previously found projects of the same type. >>> >>> If you have a long list of different kinds of projects, there is no >>> sense testing projects not of the same time you are in. >>> >>> Some of it is historical too. The independence between the projects has >>> some to do with it. In retrospect I have also thought it would be >>> better to search to core list only once from the core instead of asking >>> each project to do it one at a time. >> >> Isn't that what the code in auto.el does now? In ede-load-project-file, >> we see whether we have a projects in ede-projects corresponding to a >> directory; if we don't, we call into ede-auto-load-project, which builds >> a project and it to ede-projects via ede-add-project-to-global-list. >> >> So why does each project type have to redundantly maintain its own list >> of projects? We already have a global list. > > Hi Daniel, > > I agree that project loading is a bit confusing. A vast majority of the > complexity is due to performance optimization. As EDE started providing > facilities to other operations, such as Semantic for finding header > files, the poor performance forced performance optimizations. I think it'd help to build abstractions for the complexity. Right now, the complexity is scattered throughout the code, which hurts understanding. > Some projects, such as the one that leaves Project.ede files around will > allow the auto-loader to identify a project, and only THEN will it scan > upward for the root. This too was pretty slow, and the local variable > for the root project was added to speed that up. > > Anyway, this means that any project that ONLY has an identifying project > file at the root needs to handle the case where the user opens a file in > a subdirectory. It used to be this was the minority case, so it was > handled only in the project definitions. I think the ratios have since > changed as new project styles I've used have in a majority of cases only > had a unique identifier at the root. Yes. Lots of other tools, like git, scan upwards as well. The "normal", default case should just be scanning upward for a project root every time, for simplicity's sake. It's going to be fast enough on most systems, and the statelessness of the system will go a long way toward simplifying understanding of the code and building new projects. If you need a stateful cache, please build it as an optional add-on. Still, what I'm asking about specifically are caches specific to project types, like ede-cpp-root-project-list. I don't understand why *this specific* variable needs to exist at all, and why cpp-root.el has to have its own cache. Anything the cpp-root specific cache can do, an overload of ede-dir-to-projectfile can do, yes? > >> All this complexity is very confusing when trying to create a new >> project type. > > I agree. I think it would be worthwhile to take this common case and > pull some of the logic up into the core of EDE. That is bound to > simplify creating new projects. > > Fortunately, after initial project identification is done, everything is > cached internal to EDE and your code won't be called anymore except in > new directories. > > You can also ede-flush-project-hash to clear out data from any calls > that use ede-locate. That seems like an unlikely cause here though. How are these flush functions supposed to know about private caches maintained by individual project type classes? |
From: Eric M. L. <er...@si...> - 2014-03-25 02:08:37
|
On 03/24/2014 09:17 PM, Daniel Colascione wrote: > On 03/24/2014 05:47 PM, Eric M. Ludlam wrote: >> Some projects, such as the one that leaves Project.ede files around will >> allow the auto-loader to identify a project, and only THEN will it scan >> upward for the root. This too was pretty slow, and the local variable >> for the root project was added to speed that up. >> >> Anyway, this means that any project that ONLY has an identifying project >> file at the root needs to handle the case where the user opens a file in >> a subdirectory. It used to be this was the minority case, so it was >> handled only in the project definitions. I think the ratios have since >> changed as new project styles I've used have in a majority of cases only >> had a unique identifier at the root. > > Yes. Lots of other tools, like git, scan upwards as well. The "normal", > default case should just be scanning upward for a project root every > time, for simplicity's sake. It's going to be fast enough on most > systems, and the statelessness of the system will go a long way toward > simplifying understanding of the code and building new projects. > > If you need a stateful cache, please build it as an optional add-on. EDE used to do searches that way, and while 'fast enough' for identification of a file, the number of other functions that kept asking for the location of the project root made that check far too slow requiring caches. Note that the cache I am talking bout here is NOT the same as the per-project-type list of projects you might be thinking of. >> >> You can also ede-flush-project-hash to clear out data from any calls >> that use ede-locate. That seems like an unlikely cause here though. > > How are these flush functions supposed to know about private caches > maintained by individual project type classes? The directory hash tracks directories and their associated projects so classic searching isn't needed. The project hash uses the locator database, usually something like the unix system "locate", or perhaps GNU Global to find files using a short name more quickly. These have nothing to do with the lists of projects maintained in individual project classes like ede-cpp-root. Eric |
From: Daniel C. <da...@da...> - 2014-03-25 02:43:06
Attachments:
signature.asc
|
On 03/24/2014 07:08 PM, Eric M. Ludlam wrote: > On 03/24/2014 09:17 PM, Daniel Colascione wrote: >> On 03/24/2014 05:47 PM, Eric M. Ludlam wrote: >>> Some projects, such as the one that leaves Project.ede files around will >>> allow the auto-loader to identify a project, and only THEN will it scan >>> upward for the root. This too was pretty slow, and the local variable >>> for the root project was added to speed that up. >>> >>> Anyway, this means that any project that ONLY has an identifying project >>> file at the root needs to handle the case where the user opens a file in >>> a subdirectory. It used to be this was the minority case, so it was >>> handled only in the project definitions. I think the ratios have since >>> changed as new project styles I've used have in a majority of cases only >>> had a unique identifier at the root. >> >> Yes. Lots of other tools, like git, scan upwards as well. The "normal", >> default case should just be scanning upward for a project root every >> time, for simplicity's sake. It's going to be fast enough on most >> systems, and the statelessness of the system will go a long way toward >> simplifying understanding of the code and building new projects. >> >> If you need a stateful cache, please build it as an optional add-on. > > EDE used to do searches that way, and while 'fast enough' for > identification of a file, the number of other functions that kept asking > for the location of the project root made that check far too slow > requiring caches. Note that the cache I am talking bout here is NOT the > same as the per-project-type list of projects you might be thinking of. The choice doesn't have to be between walking the filesystem for each call and caching everything in global data structures forever. You can reference count projects --- use filesystem traversal to find a project for a buffer, then cache that project object in a buffer-local variable. Instead of just keeping that project on a list forever, add a reference for each buffer using it, and delete the project object when the last buffer associated with a project disappears. This way, the global state problem is mitigated and the mental modeling of state becomes a lot simpler. If you want to cache more aggressively than that, you should do it by providing alternate implementations of filesystem functions instead of using Emacs primitives that turn directly into system calls. I really don't see why EDE *core*, for example, has to know anything about inodes. The logic gets in the way of trying to understand both the actual flow of the code and the intended method of operation. >>> You can also ede-flush-project-hash to clear out data from any calls >>> that use ede-locate. That seems like an unlikely cause here though. >> >> How are these flush functions supposed to know about private caches >> maintained by individual project type classes? > > The directory hash tracks directories and their associated projects so > classic searching isn't needed. > > The project hash uses the locator database, usually something like the > unix system "locate", or perhaps GNU Global to find files using a short > name more quickly. Fair enough. So why is it a hash mapping file shortnames to full paths? Why doesn't each project just maintain a list of files belonging to that project --- if we want to find a file not on that list, we can find that file the hard way (using locate or whatever) and update the list as we go. Implementing the existing short-name-to-full-path mapping using this list is trivial. > These have nothing to do with the lists of projects maintained in > individual project classes like ede-cpp-root. So why do these individual lists exist? I don't understand what purpose they serve. What would go wrong if we just got rid of, say, ede-emacs-project-list? |
From: Eric M. L. <er...@si...> - 2014-03-29 02:50:39
|
On 03/24/2014 10:42 PM, Daniel Colascione wrote: > On 03/24/2014 07:08 PM, Eric M. Ludlam wrote: >> On 03/24/2014 09:17 PM, Daniel Colascione wrote: >>> On 03/24/2014 05:47 PM, Eric M. Ludlam wrote: >> EDE used to do searches that way, and while 'fast enough' for >> identification of a file, the number of other functions that kept asking >> for the location of the project root made that check far too slow >> requiring caches. Note that the cache I am talking bout here is NOT the >> same as the per-project-type list of projects you might be thinking of. > > The choice doesn't have to be between walking the filesystem for each > call and caching everything in global data structures forever. You can > reference count projects --- use filesystem traversal to find a project > for a buffer, then cache that project object in a buffer-local variable. > Instead of just keeping that project on a list forever, add a reference > for each buffer using it, and delete the project object when the last > buffer associated with a project disappears. This way, the global state > problem is mitigated and the mental modeling of state becomes a lot simpler. Hi Daniel, Sure - there are of course more than two ways to do this. The EDE mechanism for matching a file to a project is not just a simple hash match either. Projects are asked for a bunch of different reasons. If you just want to know a project for a buffer, that is a local buffer, as you suggest. If you want to know a project for a new buffer when it gets first created, we check the to see if that directory has been matched up to a project yet. If so, it is a nice fast answer. If it hasn't been matched up yet, we go through a process of trying to detect a project on disk for it. > If you want to cache more aggressively than that, you should do it by > providing alternate implementations of filesystem functions instead of > using Emacs primitives that turn directly into system calls. I really > don't see why EDE *core*, for example, has to know anything about > inodes. The logic gets in the way of trying to understand both the > actual flow of the code and the intended method of operation. The inode thing is in the core because while I was profiling, that was the fastest way to resolve sym links I found. Many folks were plagued by symlinks, automounter problems, and EDE identifying files to the wrong projects. Once I resolved that with inodes, all has been peaceful. I originally used file-truename, which is very slow, especially on networked file systems, and quite abusive to automounter systems. >> The project hash uses the locator database, usually something like the >> unix system "locate", or perhaps GNU Global to find files using a short >> name more quickly. > > Fair enough. So why is it a hash mapping file shortnames to full paths? There is a hash between fully qualified directory names, and already found projects, which is what I was mostly talking about above. There is a second hash in the locator subsystem to speed up cases where programatic use keeps pinging for the same files many times in a row. Usually header files during a smart complete operation. It has taken me an extra round on this thread to realize the combination of your questions is identifying a flaw where different sub-directories in your project should identify short-names (ie a header file) differently based on location. I think that is a real problem I hadn't encountered before. It will require some restructuring, or just ignoring this hash to make that work correctly. > Why doesn't each project just maintain a list of files belonging to that > project --- if we want to find a file not on that list, we can find that > file the hard way (using locate or whatever) and update the list as we > go. Implementing the existing short-name-to-full-path mapping using this > list is trivial. Not all projects have a mechanism for quickly creating the list of files, and some projects can have an external tool for managing that list. The ones that do maintain a list do implement expand-file-name as you suggest, just by scanning it quickly. The extra 'locate' stuff is a handy way for users to combine a tool they have (ie - locate) and some other independent project they use together to get a feature. >> These have nothing to do with the lists of projects maintained in >> individual project classes like ede-cpp-root. > > So why do these individual lists exist? I don't understand what purpose > they serve. What would go wrong if we just got rid of, say, > ede-emacs-project-list? Their existence is a side-effect of the history behind the code. I am not opposed to getting rid of the list if a suitable replacement of the behavior is proposed. History tells me that performance will be a key test when evaluating any updated system. I do not actually like most of the code you have been challenging. It started out quite simple and has evolved into something I have a hard time fixing bugs in. It is, however, quite fast for what it does, and has a good feature set that is important in making smart completion work in the Semantic package which is what most people use it for. I would be glad to accept patches and I would help advise testing strategies based on my experience for any good ideas that would help simplify it and make it easier to extend. Eric |