From: Jackson, A. <And...@bl...> - 2013-06-07 13:38:35
|
Hi Ilya, Thanks for the info. Just to be clear, I am not trying to argue that you or IA should have done anything differently. You are doing what you need to do to get your work done. It’s not reasonable for the rest of us to expect you to implement the tests that we need! Rather, I am trying to argue that if the IIPC members want portable, stable, predictable and heavily tested releases of these tools, then we all need to step up and make it happen (rather than passively rely on IA, or patching and fixing in private). This is not so hard, but does require some communication and technical effort. My preferred approach is that, starting with Wayback, we make the IIPC github fork into the ‘canonical’ version, and IIPC members work together to take over managing the roadmap, reviewing issues and pull request, testing and releases. Any member would be free to do their own thing, of course, but the IIPC roadmap would pin particular features (e.g. deduplication variants) to upcoming mayor/minor releases, and pool bug-fixes into point releases. IA would no longer be responsible for making official releases. Ideally, an IIPC funded role would oversee the development, communications, and the release process, so that everyone knows where we are. For example, in the case of the ACC-126 bug, that person could check if the issue was being dealt with, and could have marked the bug as ‘assigned’ and pinned the issue against for 1.8.0 release on the JIRA ACC roadmap [1]. This stuff take time and effort we don’t always have, but a little feedback makes all the difference to the success of an open source project like this one. It’s very disheartening to feel like your tickets and pull-requests are disappearing into a black hole. I imagine a pool of ‘core committers’ (drawn from the IIPC membership) would have to be set up to support this individual, agree the roadmap, help review difficult pull requests (e.g. one that mean changing tests), and so on. For example, the project lead might be responsible for ensuring that there is an up to date roadmap, but would *not* actually be responsible for creating it – the core committers would have to do that. That group could also define policies to make the development coordinator’s job easier, e.g. ‘if you pull request contains a new feature without a new test, it will be rejected’ [2]. I think we all want something like this to work, and that we all want to pool our resources as efficiently as possible (especially when we are all working around or patching the same bugs in private). We’re all just pressed for time, so I think things will work really well if IIPC can invest in making sure the information moves around, a roadmap can be agreed, the issues and pull-requests are reviewed, the test pass, and the releases happen. This is just my proposal for discussion, now and at the autumn meeting. I’ll happily go along with whatever structure or process makes this work. As for my integration tests, I’m planning to set them up in a separate project for now, to see how they work (https://github.com/ukwa/warc-explorer, specifically in the warc-explorer-wayback sub-project which takes the IA Wayback release and overlays it with a suitable config for local testing). Once this seems to be working, I’d be interested in patching it back into the main Wayback codebase. Thanks, Andy 1. https://webarchive.jira.com/browse/ACC#selectedTab=com.atlassian.jira.plugin.system.project%3Aroadmap-panel 2. See for example https://github.com/diaspora/diaspora/wiki/Pull-Request-Guidelines, https://django-admin2.readthedocs.org/en/latest/contributing.html, https://github.com/adobe/brackets/wiki/Pull-Request-Review-Checklist etc. From: Ilya Kreymer [mailto:il...@ar...] Sent: 06 June 2013 20:13 To: arc...@li... Subject: Re: [Archive-access-discuss] Wayback Indexer Hi Andy, I totally agree with you regarding the need for additional integration tests. We have unfortunately not had the resourcesto devote to ensuring full stability of the snapshot distributions, but we are now focusing on creating a stable 1.8.0 release in the upcoming month(s). If you have any integration tests you would like to contribute or suggest, please let me know. I am aware of this bug that was filed regarding url-agnostic dedup: https://webarchive.jira.com/browse/ACC-126 This is planned to be addressed before the 1.8.0 release. If there are other bug reports, feel free to file them under this JIRA. I believe the meeting in the fall is planned to better figure out how to ensure the stability of wayback in the long term for the IIPC. Thanks, Ilya Engineer IA On 06/06/2013 09:13 AM, Jackson, Andrew wrote: It's not just the indexer. The front-end logic and the coupling to H3 have all been problematic recently. We have suffered a range of problems deploying recent Wayback versions, due to unintended consequences of recent changes that break functionality that we require. As well as the de-duplication problems I mentioned in a separate email, we've also had issues with Memento access points (which don't return link-format timemaps as they should/used to) and the XML query endpoint failing under certain conditions (due to changes in URL handling/'cleaning'). In my opinion, one of the critical jobs for the future Wayback OS project is to set up proper, automated integration tests that exercise all the functionality the IIPC partners need, and will therefore detect if changes to the source code have unintentionally altered critical behaviour. It is technically fairly straightforward to make an integration test that, say, indexes a few WARCs, fires up a Wayback instance, and checks the responses to some queries. It does, of course, require some investment of time and effort. However, that investment would enable future modifications to the code base to be carried out with far more confidence. I've started doing some work in this area, but would appreciate knowing if anyone else is willing to put some effort into building up the testing framework. Thanks, Andy -----Original Message----- From: Jones, Gina [mailto:gj...@lo...] Sent: 06 June 2013 13:13 To: arc...@li... Subject: [Archive-access-discuss] Wayback Indexer I believe that the wayback indexer is the weakest link to longterm access to our collections. And it isn't obvious sometimes what is going on when you index content until you actually access that content. One of the projects I want to do this year (or next) is to take the available indexers and index a set of content that we have (2000-now) and review the output. gina ------------------------------------------------------------------------ ------ How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j _______________________________________________ Archive-access-discuss mailing list Arc...@li... https://lists.sourceforge.net/lists/listinfo/archive-access-discuss ************************************************************************** Experience the British Library online at http://www.bl.uk/ The British Library’s latest Annual Report and Accounts : http://www.bl.uk/aboutus/annrep/index.html Help the British Library conserve the world's knowledge. Adopt a Book. http://www.bl.uk/adoptabook The Library's St Pancras site is WiFi - enabled ************************************************************************* The information contained in this e-mail is confidential and may be legally privileged. It is intended for the addressee(s) only. If you are not the intended recipient, please delete this e-mail and notify the mailto:pos...@bl... : The contents of this e-mail must not be disclosed or copied without the sender's consent. The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of the British Library. The British Library does not take any responsibility for the views of the author. ************************************************************************* Think before you print ------------------------------------------------------------------------------ How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j _______________________________________________ Archive-access-discuss mailing list Arc...@li... https://lists.sourceforge.net/lists/listinfo/archive-access-discuss ************************************************************************** Experience the British Library online at http://www.bl.uk/ The British Library’s latest Annual Report and Accounts : http://www.bl.uk/aboutus/annrep/index.html Help the British Library conserve the world's knowledge. Adopt a Book. http://www.bl.uk/adoptabook The Library's St Pancras site is WiFi - enabled ************************************************************************* The information contained in this e-mail is confidential and may be legally privileged. It is intended for the addressee(s) only. If you are not the intended recipient, please delete this e-mail and notify the mailto:pos...@bl... : The contents of this e-mail must not be disclosed or copied without the sender's consent. The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of the British Library. The British Library does not take any responsibility for the views of the author. ************************************************************************* Think before you print |