Thread: [Doxygen-users] Compare with already existing and identical output file and *not* refresh its times
Brought to you by:
dimitri
From: Marc H. <mar...@gm...> - 2019-02-07 02:45:48
|
Hi, I understand incremental doxygen compilation is a hard problem. I've read some of the past discussions. This request is *not* about incremental doxygen compilation and it is not about making doxygen faster at all. It's about something much simpler. Can doxygen redundantly recompute everything from scratch (even when no input has changed), and then at the very end compare and realise that it just regenerated the exact same (XML, HTML,...) output file that is already there from the previous run? Is there in this case an option *not* to refresh the timestamp? I mean on a per output file basis of course. Refreshing the timestamp has a disastrous cascade effect on further processing like Sphinx which wrongly assumes everything has changed even when no doxygen output has changed at all (except timestamps). Marc |
From: Richard D. <Ri...@Da...> - 2019-02-07 03:15:29
|
On 2/6/19 9:45 PM, Marc Herbert wrote: > Hi, > > I understand incremental doxygen compilation is a hard problem. I've > read some of the past discussions. This request is *not* about > incremental doxygen compilation and it is not about making doxygen > faster at all. It's about something much simpler. > > Can doxygen redundantly recompute everything from scratch (even when > no input has changed), and then at the very end compare and realise > that it just regenerated the exact same (XML, HTML,...) output file > that is already there from the previous run? Is there in this case an > option /not/ to refresh the timestamp? I mean on a per output file > basis of course. > > Refreshing the timestamp has a disastrous cascade effect on further > processing like Sphinx which wrongly assumes everything has changed > even when no doxygen output has changed at all (except timestamps). > > Marc The simplest method in my mind would be to have a post-process task that compares the Doxygen output directory with another copy, updating that copy if there is a ''significant' (not just timestamping) difference. It would be a simple enough program to write to customize for you own needs. -- Richard Damon |
From: Sebastien L. (GeometryFactory) <slo...@gm...> - 2019-02-07 07:45:06
|
At least for html you can set the option HTML_TIMESTAMP = OFF. Sebastien. On 02/07/2019 04:15 AM, Richard Damon wrote: > On 2/6/19 9:45 PM, Marc Herbert wrote: >> Hi, >> >> I understand incremental doxygen compilation is a hard problem. I've >> read some of the past discussions. This request is *not* about >> incremental doxygen compilation and it is not about making doxygen >> faster at all. It's about something much simpler. >> >> Can doxygen redundantly recompute everything from scratch (even when >> no input has changed), and then at the very end compare and realise >> that it just regenerated the exact same (XML, HTML,...) output file >> that is already there from the previous run? Is there in this case an >> option /not/ to refresh the timestamp? I mean on a per output file >> basis of course. >> >> Refreshing the timestamp has a disastrous cascade effect on further >> processing like Sphinx which wrongly assumes everything has changed >> even when no doxygen output has changed at all (except timestamps). >> >> Marc > The simplest method in my mind would be to have a post-process task > that compares the Doxygen output directory with another copy, updating > that copy if there is a ''significant' (not just timestamping) > difference. It would be a simple enough program to write to customize > for you own needs. > |
From: Marc H. <mar...@gm...> - 2019-02-07 17:07:23
|
No, that's not enough because I'm referring to the modified timestamp on the *filesystem*, not to any timestamp embedded in the code (Subject changed accordingly) Marc Le mer. 6 févr. 2019 à 23:46, Sebastien Loriot (GeometryFactory) < slo...@gm...> a écrit : > At least for html you can set the option HTML_TIMESTAMP = OFF. > > On 02/07/2019 04:15 AM, Richard Damon wrote: > > On 2/6/19 9:45 PM, Marc Herbert wrote: > >> Hi, > >> > >> I understand incremental doxygen compilation is a hard problem. I've > >> read some of the past discussions. This request is *not* about > >> incremental doxygen compilation and it is not about making doxygen > >> faster at all. It's about something much simpler. > >> > >> Can doxygen redundantly recompute everything from scratch (even when > >> no input has changed), and then at the very end compare and realise > >> that it just regenerated the exact same (XML, HTML,...) output file > >> that is already there from the previous run? Is there in this case an > >> option /not/ to refresh the timestamp? I mean on a per output file > >> basis of course. > >> > >> Refreshing the timestamp has a disastrous cascade effect on further > >> processing like Sphinx which wrongly assumes everything has changed > >> even when no doxygen output has changed at all (except timestamps). > >> > >> Marc > > The simplest method in my mind would be to have a post-process task > > that compares the Doxygen output directory with another copy, updating > > that copy if there is a ''significant' (not just timestamping) > > difference. It would be a simple enough program to write to customize > > for you own needs. > > > > > _______________________________________________ > Doxygen-users mailing list > Dox...@li... > https://lists.sourceforge.net/lists/listinfo/doxygen-users > |
From: Marc H. <mar...@gm...> - 2019-02-08 17:21:08
|
> > > The simplest method in my mind would be to have a post-process task > that compares the Doxygen output directory with another copy, updating > that copy if there is a ''significant' (not just timestamping) > difference. It would be a simple enough program to write to customize > for you own needs. > That was my plan B. I've implemented it in Python and it works: https://github.com/zephyrproject-rtos/zephyr/pull/13159 restore_modifications_times.py In this particular example this brings down the incremental build time from 70-80 seconds down to less than 10 seconds. It took a surprisingly high number of lines of code: about 100. As usual file management in Python proved wordy. But hey, data structures suck in shell script and it's not even portable anyway. To be fair these 100 lines include logging and a decent option parser with help text. This script is not specific to Doxygen and could be used for other similar situations (Apache license). Back to Doxygen I suspect this script could be entirely avoided with say 10-20 lines of logic inside Doxygen itself? :-( Marc |
From: Travis E. <tra...@gm...> - 2019-02-08 17:55:55
|
I wonder if you could just use rsync (without timestamp checking) for this? If your: - documentation set isn't so large that temporarily having a second set causes problems - configuration is such that output stays the same until your source or Doxygen versions change You could keep the real/stable output copy in one location, generate a temporary copy into a separate location, then rsync without timestamp checking (may need to use checksum mode?) It might be possible to accomplish something similar if you keep the output in git or another vcs if they're capable of preserving timestamps, but I haven't looked into that. On Fri, Feb 8, 2019 at 11:22 AM Marc Herbert <mar...@gm...> wrote: > >> The simplest method in my mind would be to have a post-process task >> that compares the Doxygen output directory with another copy, updating >> that copy if there is a ''significant' (not just timestamping) >> difference. It would be a simple enough program to write to customize >> for you own needs. >> > > That was my plan B. I've implemented it in Python and it works: > https://github.com/zephyrproject-rtos/zephyr/pull/13159 > restore_modifications_times.py > In this particular example this brings down the incremental build time > from 70-80 seconds down to less than 10 seconds. > > It took a surprisingly high number of lines of code: about 100. As usual > file management in Python proved wordy. But hey, data structures suck in > shell script and it's not even portable anyway. To be fair these 100 lines > include logging and a decent option parser with help text. > > This script is not specific to Doxygen and could be used for other similar > situations (Apache license). > > Back to Doxygen I suspect this script could be entirely avoided with say > 10-20 lines of logic inside Doxygen itself? :-( > > Marc > _______________________________________________ > Doxygen-users mailing list > Dox...@li... > https://lists.sourceforge.net/lists/listinfo/doxygen-users > |
From: Marc H. <mar...@gm...> - 2019-02-08 19:15:23
|
Le ven. 8 févr. 2019 à 09:55, Travis Everett <tra...@gm...> a écrit : > I wonder if you could just use rsync (without timestamp checking) for > this? If your: > > - documentation set isn't so large that temporarily having a second > set causes problems > - configuration is such that output stays the same until your source > or Doxygen versions change > > You could keep the real/stable output copy in one location, generate a > temporary copy into a separate location, then rsync without timestamp > checking (may need to use checksum mode?) > I'm a big fan of rsync and I considered it. In fact one of the function in my python script is called "rsync" :-) However I don't think it can do this particular timestamp restoration job, I mean not unless you invoked it once per file but by then you wrote as much code. Rsync can create an initial "shadow"/backup copy from the first build just fine. What I don't think rsync can do is take a completely different action depending on whether the file content has changed or not, *on a per file basis.* IF the file content has changed THEN copy data *and* timestamp TO the backup ELSE IF the file content hasn't changed THEN copy the old timestamp FROM the backup = in the *opposite* direction! So opposite directions means two separate rsync invocations: can you run these two rsync operations one after the other without one breaking the other? Taking into account new files showing up and obsolete files disappearing? Another portability issue: for some unknown reasons rsync seems also to be an undesired dependency for many Windows users, robocopy seems more popular. Maybe robocopy supports some NTFS features better? Cheers, Marc |
From: Travis E. <tra...@gm...> - 2019-02-08 21:08:43
|
This doesn't address the portability issue, but I'm not sure I see why you would need to copy timestamps in two directions? I'm just suggesting that you use Doxygen to generate a "scratch" copy that you never deploy (and could delete immediately after rsyncing) and using a one-way rsync to another location where you store the "real" copy for use/deployment/processing/etc. Sphinx would run on the "real" copy, not the scratch one. To illustrate, I wrote two quick files, rsynced them, waited a minute, appended to one file, rewrote one file with the same data, added a third file, and then rsynced again. $ echo "test1" > scratch/a $ echo "test2" > scratch/b $ rsync -rc scratch/ deploy/ # wait a minute $ echo "test3" >> scratch/a $ echo "test2" > scratch/b $ echo "test3" > scratch/c $ rsync -rc scratch/ deploy/ $ ls -l scratch ... -rw-r--r-- 1 a staff 12 Feb 8 14:28 a -rw-r--r-- 1 a staff 6 Feb 8 14:28 b -rw-r--r-- 1 a staff 6 Feb 8 14:28 c $ ls -l deploy ... -rw-r--r-- 1 a staff 12 Feb 8 14:28 a -rw-r--r-- 1 a staff 6 Feb 8 14:27 b -rw-r--r-- 1 a staff 6 Feb 8 14:28 c Even though I modified all 3 files, rsync only copied the two with changed *content.* The file with no changes keeps its old timestamp. When you run Sphinx against this copy the next time, it should only process the two files with changed timestamps. On Fri, Feb 8, 2019 at 1:15 PM Marc Herbert <mar...@gm...> wrote: > Le ven. 8 févr. 2019 à 09:55, Travis Everett <tra...@gm...> > a écrit : > >> I wonder if you could just use rsync (without timestamp checking) for >> this? If your: >> >> - documentation set isn't so large that temporarily having a second >> set causes problems >> - configuration is such that output stays the same until your source >> or Doxygen versions change >> >> You could keep the real/stable output copy in one location, generate a >> temporary copy into a separate location, then rsync without timestamp >> checking (may need to use checksum mode?) >> > > I'm a big fan of rsync and I considered it. In fact one of the function in > my python script is called "rsync" :-) However I don't think it can do this > particular timestamp restoration job, I mean not unless you invoked it once > per file but by then you wrote as much code. > > Rsync can create an initial "shadow"/backup copy from the first build just > fine. What I don't think rsync can do is take a completely different action > depending on whether the file content has changed or not, *on a per file > basis.* IF the file content has changed THEN copy data *and* timestamp TO > the backup ELSE IF the file content hasn't changed THEN copy the old > timestamp FROM the backup = in the *opposite* direction! > > So opposite directions means two separate rsync invocations: can you run > these two rsync operations one after the other without one breaking the > other? Taking into account new files showing up and obsolete files > disappearing? > > Another portability issue: for some unknown reasons rsync seems also to be > an undesired dependency for many Windows users, robocopy seems more > popular. Maybe robocopy supports some NTFS features better? > > Cheers, > > Marc > > |
From: Marc H. <mar...@gm...> - 2019-02-08 21:42:06
|
Le ven. 8 févr. 2019 à 13:08, Travis Everett <tra...@gm...> a écrit : > This doesn't address the portability issue, but I'm not sure I see why you > would need to copy timestamps in two directions? > > I'm just suggesting that you use Doxygen to generate a "scratch" copy that > you never deploy (and could delete immediately after rsyncing) and using a > one-way rsync to another location where you store the "real" copy for > use/deployment/processing/etc. Sphinx would run on the "real" copy, not the > scratch one. > Thanks Travis, I think you're right. I went for a "two-ways" solution because it cost little more code once I opted to sort of "re-implement rsync in Python" instead of using rsync itself. It also saved me changing the doxygen destination which was a bit more convenient for testing and comparing. Then I lost track of this non-requirement :-) |
From: Marc H. <mar...@gm...> - 2019-02-14 20:40:15
|
Short of actually implementing complex incremental builds, there's another, unrelated and also much simpler optimization Doxygen could do: just vaguely keep track of modified times on *input* files. 1. Offer some way to --remember the newest modified_time across all input files. Could be stored in some empty file. 2. Then have a new, optional feature that: "--runs only if any input file is newer than this (empty) file" This would make a huge difference for incremental builds that involve not just doxygen but other (and faster) tools too; they could just skip running doxygen when not needed. I considered filing a new doxygen feature request for this on github (and also for the previously discussed *output* mtime optimization), however https://github.com/doxygen/doxygen/issues has 1800+ open issues right now so it feels like a black hole. PS: If anyone has ideas on how to emulate this with a small number of lines of CMake then please share. For instance this could generate an empty file right before starting doxygen as a decent approximation. |
From: Marc H. <mar...@gm...> - 2019-02-15 22:04:23
|
> Short of actually implementing complex incremental builds, there's > another, unrelated and also much simpler optimization Doxygen could do: > just vaguely keep track of modified times on *input* files. > [..] > This would make a huge difference for incremental builds that involve not > just doxygen but other (and faster) tools too; they could just skip running > doxygen when not needed. > > PS: If anyone has ideas on how to emulate this with a small number of > lines of CMake then please share. For instance this could generate an empty > file right before starting doxygen as a decent approximation. > > OK, emulating this in CMake was much easier than I thought. It would still be simpler and safer if Doxygen provided it for any build system... set(DOXYGEN_RUN_STAMP build/doxygen_run_tstamp) # Duplicates Doxyfile, must be kept in sync manually file(GLOB_RECURSE . DOXYGEN_SOURCES ../include/*.[c,h] ../tests/*.[c,h] ... ) add_custom_command( OUTPUT ${DOXYGEN_RUN_STAMP} COMMAND cmake -E touch ${DOXYGEN_RUN_STAMP} COMMAND doxygen ... ... DEPENDS ${DOXYGEN_SOURCES} ) |