Re: [Jython-dev] state of regrtests

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Stefan:

You make no reference to it but surely have not overlooked Adam Burke's 
push to get clean-running regression tests on Windows (issue 2393). The 
record of that issue contains some scores from his and my experience.

I'm with you in wanting regression tests that run cleanly. My pattern of 
work is to check that the failing tests are only the usual suspects, but 
it is easy to overlook breakage that way.  It also seems fairly easy for 
one of us to make a change that breaks Jython for a platform that that 
individual doesn't use. If we could have clean-running tests, build-bots 
would be useful again, to make us all aware of that. It may be feasible 
for me to test on Linux locally after a commit and before a push.

We mainly use CPython's tests, which is good, but not exactly what we 
need. We have at least three ways of dealing with the differences 
between CPython and Jython:
1. the expected failures of "regrtest -e", varying by platform,
2. the annotation @skipIf(is_jython), and
3. Jython-specific variants of the tests.
I think we aren't consistent in using these. I've made the mistake of 
thinking the lists in regrtest.py and our skips have been carefully 
curated and I shouldn't change them lightly. But now I think we should 
be more aggressive in adding skips and raising a matching issue. I will 
do this more often.

I think adding skips to the individual test cases is a better course 
than moving a whole test module to the expected failure list in 
regrtest.py, because of the value available from the other cases in a 
module.

The variation in the success of regrtest with its environment, and from 
run to run, is frustrating. I can get different answers from "ant 
regrtest" and regrtest.py at the prompt. A careful reading of build.xml 
suggests this is because the ant task enables resources with the "--use" 
flag that are not enabled at the prompt. Tests often pass individually 
but fail under regrtest. It would be good to drive out such differences.

Adam is interested in clean-running tests so that _users_ can run 
regrtest and report incompatibilities with their environment. For this, 
we need a configuration of regrtest that runs all the tests we think are 
reliable across platforms, but this probably isn't the same 
configuration we should be using.

Jeff

On 23/10/2015 01:33, Stefan Richthofer wrote:
> It appears for me not to be a Java8 vs Java7 issue. I tested with both 
> and roughly get the same result.
> I wouldn't have started this, if it were only test_sort and a 
> hand-full of others failing just on Java8.
> Recent run:
> Java8:
>      [exec] 26 tests failed:
>      [exec]     test_classpathimporter test_cmd_line test_grp 
> test_httpservers
>      [exec]     test_jython_launcher test_list_jy test_logging 
> test_mailbox
>      [exec]     test_marshal test_os test_os_jy test_posix 
> test_posixpath test_pwd
>      [exec]     test_quopri test_shutil test_site test_site_jy test_socket
>      [exec]     test_sort test_subprocess test_subprocess_jy test_sys_jy
>      [exec]     test_tarfile test_threading test_zipimport_jy
> Java7:
>      [exec] 23 tests failed:
>      [exec]     test_classpathimporter test_cmd_line test_grp 
> test_httpservers
>      [exec]     test_jython_launcher test_logging test_mailbox test_os 
> test_os_jy
>      [exec]     test_posix test_posixpath test_pwd test_quopri test_shutil
>      [exec]     test_site test_site_jy test_subprocess test_subprocess_jy
>      [exec]     test_sys_jy test_tarfile test_threading test_weakset
>      [exec]     test_zipimport_jy
> I also considered it might be due to using standalone, which I often 
> do, but without standalone it is also
> the same situation (the posted runs above were done after an ordinary 
> build). I admit that some of the
> failing tests seem to work fine when run on their own rather than with 
> ant regrtest, e.g. test_zipimport_jy
> appears to work fine then. Still - in that case it is an issue with 
> ant regrtest or something - not exactly satisfying!
> test_weakset sometimes passes on its own, sometimes fails with e.g.
> File "Lib/test/test_weakset.py", line 429, in 
> test_weak_destroy_and_mutate_while_iterating
>     self.assertEqual(len(s), len(t))
> AssertionError: 53 != 50
> For me this looks not like an obvious or trivial issue caused by a 
> wrong flag or something.
> This was just a random sample and would need to be investigated 
> systematically.
> @Jim It is interesting that such few tests fail on your system.
> However it is no good situation to have such a gap between different 
> systems
> (especially if it's the same platform).
> I remember from PyCon sprints that Alex also had some more tests failing.
> So I am curious whether my system is the exotic with >20 tests failing.
> Would be good to hear some statistics from others about this!
> How should I procceed with the failing tests? I could go through them, 
> run them in verbose mode and
> create issues as far as they don't yet exist. If corresponding issues 
> already exist in the tracker, the test
> should be moved to expected failures, if the solution is 
> long-term-pending, shouldn't it?
> Cleaning up these tests is some work, so I would much appreciate if 
> someone who also experiences more
> issues than Jim could help on this. I will also repeat the tests on 
> other system as soon as I find time and
> opportunity for this.
> -Stefan
> *Gesendet:* Donnerstag, 22. Oktober 2015 um 23:00 Uhr
> *Von:* "Jim Baker" <jim...@py...>
> *An:* "Stefan Richthofer" <Ste...@gm...>
> *Cc:* "Jython Developers" <jyt...@li...>
> *Betreff:* Re: [Jython-dev] state of regrtests
> Keeping a stable regrtest is something we continuously need to work 
> on. But it's not at all easy, especially on Windows. The wrap up of 
> the 2.7.0 dev cycle saw *significant* time spent on regrtest failures.
> The biggest issue currently is choice of Java version. Java 7 works 
> better with our regrtests. Java 8 complains much more. Consider the 
> craziness we do in test_sort, which raises exceptions in test_sort due 
> to greater restrictions on sorting, such as Java 8 mandating the 
> comparator be well defined. Putting on my triage hat, so to speak: 
> let's focus on building and testing against Java 7, given we have 
> bigger problems to fix, at least now for 2.7.1 and especially given 
> that we are planning a release candidate on *Nov 5*. But definitely 
> something we need to put time into for 2.7.2. Fixing these tests (most 
> likely) or underlying bugs in the runtime is especially important for 
> Java 9, a focus of the 2.7.2 release. (Java 9 is surely not going to 
> make things easier for us after all.) Perhaps some skips should be 
> conditioned on the Java version.
> The second biggest issue is tests that don't properly clean up. It is 
> still noticeable when the test are run on Windows, between the lack of 
> deterministic collection and the inability to remove files that are in 
> use (Windows specific). We have done some work here, but more needs to 
> be done.
>
> Lastly we have networking tests that by their nature are 
> nondeterministic and can also fail because of running in specific 
> network environments, such as starting up a VPN or running in 
> corporate environments. Maybe do all your testing at Starbucks? ;) 
> It's possible we can do better skips.
> on OS X 10.11, Java 7 (1.7.0_75-b13)
>      [exec] 3 fails unexpected:
>      [exec]     test_classpathimporter test_select test_sys_jy
> I then rerun any failed tests to see if flaky or not, with
> dist/bin/jython regrtest.py --verbose test_classpathimporter 
> test_select test_sys_jy
> At which point only test_classpathimporter fails (usually). There's an 
> open bug on test_classpathimporter, but for Windows only 
> (http://bugs.jython.org/issue2309). We need to look into this. In any 
> event, rerunning just the failed tests, individually as necessary, 
> makes sure we are not having flaky tests obscure important regressions.
> On Ubuntu 15.04 (I will try 15.10 in a few days...), I get the following:
>      [exec] 4 fails unexpected:
>      [exec]   test_classpathimporter test_jython_launcher test_select
>      [exec]     test_sys_jy
> Retrying, I observe test_jython_launcher is problematic on Ubuntu 
> 15.04, in addition to test_classpathimporter. We will want to look 
> into this failure.
> - Jim
> On Wed, Oct 21, 2015 at 8:28 PM, Stefan Richthofer 
> <Ste...@gm...> wrote:
>
>     Hello everybody,
>
>     on my last commit I missed an issue at first, which was mainly due
>     to the overwhelming
>     number of routinely failing regrtests.
>     regrtests are actually not meant to fail, are they? Correct me, if
>     I'm wrong, but our
>     workflow should be:
>
>     1) fix an issue
>     2) check if regrtests pass
>     3) no: Fix it; goto 2)
>        yes: commit
>
>
>     However, the workflow actually is:
>
>     1) fix an issue
>     2) note that so many regrtests fail that you can hardly assess
>     whether you caused some of this
>     3) obtain another clone of Jython in the state before your fix
>     4) run old regrtests
>     5) try to compare results before your fix and after
>     6) note that it is actually not deterministic which regrtests fail
>     7) run again and again to get a feeling for the whole set of tests
>     that potentially fail from time to time
>     8) Note regarding 7: Roughly half the runs hang, e.g. with [exec]
>     error: [Errno 24] Cannot allocate thread pool for server socket
>        In that case you must start over and it takes another 20
>     minutes to get another sample of resulting failures
>     9) try to assess whether there are tests that fail significantly
>     more often after your fix than before
>     10) check whether some of these might be caused by your change
>     ...
>     ?) finally merge and hope you got it right
>
>
>     I remember that we had the number of failing regrtests down to 4-6
>     in 2.7.0., which was still not ideal,
>     but at least somehow trackable. Now I get 22-26 failing regrtests
>     and lots of hangings by current Jython
>     repository version. E.g. one of my best runs this evening/night
>     resulted in:
>
>          [exec] 22 tests failed:
>          [exec]     test_classpathimporter test_cmd_line test_grp
>     test_httpservers
>          [exec]     test_jython_launcher test_logging test_mailbox
>     test_os test_os_jy
>          [exec]     test_posix test_posixpath test_pwd test_quopri
>     test_shutil
>          [exec]     test_site test_site_jy test_subprocess
>     test_subprocess_jy
>          [exec]     test_sys_jy test_tarfile test_weakset
>     test_zipimport_jy
>
>
>     Seriously, these are almost 500% more failing regrtests than in
>     2.7.0. This way regrtesting
>     is basically unusable for its intended purpose. Any suggestions
>     how to improve this? Can we
>     maybe please just remove routinely failing regrtests and make them
>     explicit issues? To get
>     back to the situation that regrtests don't fail per default?
>     I'm not expert enough about every Jython detail to tell which of
>     these tests fail for good
>     reasons, so this is something the community must agree on I
>     suppose. And please let's solve
>     this before 2.7.1 release.
>
>     Best
>
>     Stefan
>
>     ------------------------------------------------------------------------------
>     _______________________________________________
>     Jython-dev mailing list
>     Jyt...@li...
>     https://lists.sourceforge.net/lists/listinfo/jython-dev
>
>
>
> ------------------------------------------------------------------------------
>
>
> _______________________________________________
> Jython-dev mailing list
> Jyt...@li...
> https://lists.sourceforge.net/lists/listinfo/jython-dev