|
From: Tomasz P. <tom...@gm...> - 2013-04-12 18:56:42
|
I've just committed more test cases. Out of the 6 none fail cause OOM anymore, which is marvellous. However case1 reports false but I'm positive these graphs are actually equal. Thanks, Tom On Fri, Apr 12, 2013 at 8:33 PM, Rob Vesse <rv...@do...> wrote: > Those would be useful > > Btw I closed the issue branch so please just add the tests to default > > Rob > > On 4/12/13 11:23 AM, "Tomasz Pluskiewicz" <tom...@gm...> > wrote: > >>Hi Rob >> >>Thanks so much. And yes, I do have 4 or 5 cases which stumble on this >>same issue. I will add all these to the test fixture. >> >>Tom >> >>On Fri, Apr 12, 2013 at 8:20 PM, Rob Vesse <rv...@do...> wrote: >>> Hey Tom >>> >>> This should now be fixed for your test case though I am not 100% >>>convinced >>> that brute forcing is not still broken >>> >>> What I have done to fix this is to add an intermediate step between the >>> rules based and brute force mapping which does a divide and conquer >>> approach >>> >>> What this does is break the unmapped blank node portions of the graph >>>into >>> its constituent isolated sub-graphs (those that share no blank nodes) >>>and >>> then recursively calls Equals() on the candidate matches for the >>> sub-graphs. This approach reduces the amount of work required and the >>> likelihood of needing to brute force at all though we still fall back in >>> the worst case. >>> >>> If you can come up with any more graphs that break GraphMatcher those >>> would be much appreciated >>> >>> Rob >>> >>> On 4/12/13 10:25 AM, "Rob Vesse" <rv...@do...> wrote: >>> >>>>s/not/now >>>> >>>>That should be "the test will now complete within the timeout" >>>> >>>>Rob >>>> >>>>On 4/12/13 10:23 AM, "Rob Vesse" <rv...@do...> wrote: >>>> >>>>>Hey Tom >>>>> >>>>>So the logic for generating the brute force mappings was completely >>>>>broken >>>>>causing it to get stuck in a memory sucking spin cycle :( >>>>> >>>>>I rewrote the GenerateMappings() method from scratch to use yield >>>>>return >>>>>and the test will not complete within the timeout but it fails so I >>>>>still >>>>>need to dig further >>>>> >>>>>We may still be generating incorrect possible mappings or the logic for >>>>>brute force may be flawed elsewhere >>>>> >>>>>Rob >>>>> >>>>>On 4/9/13 10:34 AM, "Rob Vesse" <rv...@do...> wrote: >>>>> >>>>>>Hey Tom >>>>>> >>>>>>The problem is that graph isomorphism is NP-hard so sometimes the only >>>>>>option we have is to attempt to brute force the problem >>>>>> >>>>>>I've started added some Debug.WriteLine() to GraphMatcher to track >>>>>>down >>>>>>where things go wrong >>>>>> >>>>>>For your graphs they may look trivially equal but to code they are >>>>>>not, >>>>>>the reason this worked prior to 0.8.0 is that one of the things we try >>>>>>is >>>>>>a trivial mapping (assume blank nodes have same IDs in both graphs) so >>>>>>in >>>>>>previous releases you would likely have hit this case and been fine. >>>>>> >>>>>>You have 33 blank nodes in the graph of which only 6 are uniquely >>>>>>identifiable and mappable. The matcher generates a candidate mapping >>>>>>for >>>>>>the whole graph but its best effort is incorrect, so then it falls >>>>>>back >>>>>>to >>>>>>brute force. I need to dig further into whether the candidate mapping >>>>>>could be improved but this is not trivial to debug and will take some >>>>>>time >>>>>>to resolve. >>>>>> >>>>>>We may be able to reduce the "memory leak" by using yield rather than >>>>>>pre-generating all possible mapping but this is a tricky refactor, >>>>>>it's >>>>>>been a long time since I wrote the code originally and I remember that >>>>>>doing the mapping in the yield form proved thorny at the time so I >>>>>>chose >>>>>>not to. The code itself for generating the mappings has some slightly >>>>>>strange things in it so I really need to spend a block of time >>>>>>refreshing >>>>>>myself on the logic there to check that it is sound before I attempt >>>>>>to >>>>>>refactor. >>>>>> >>>>>>Rob >>>>>> >>>>>>On 4/7/13 11:20 AM, "Tomasz Pluskiewicz" >>>>>><tom...@gm...> >>>>>>wrote: >>>>>> >>>>>>>Hm, I was wrong actually. >>>>>>> >>>>>>>I tried comparing the exact same graphs loaded from Turtle in >>>>>>>dotNetRDF test project but I got the unit test wrong. >>>>>>> >>>>>>>I have added the CORE-345 bug and committed a failing test case [1]. >>>>>>>Could you please have a look at this? >>>>>>> >>>>>>>Thanks, >>>>>>>Tom >>>>>>> >>>>>>>[1]: >>>>>>>https://bitbucket.org/dotnetrdf/dotnetrdf/commits/branch/CORE-345 >>>>>>> >>>>>>>On Sun, Apr 7, 2013 at 7:36 PM, Tomasz Pluskiewicz >>>>>>><tom...@gm...> wrote: >>>>>>>> Hi Rob >>>>>>>> >>>>>>>> I finally got back to R2RML to analyze why I am getting that memory >>>>>>>> leak. It seems connected to the changes you had to introduce for >>>>>>>> SPARQL 1.1. >>>>>>>> >>>>>>>> I have determined that it happens in GraphMatcher#GenerateMappings >>>>>>>> method. The graphs are equal and I'm not sure what causes the >>>>>>>>problem. >>>>>>>> As soon as TryBruteForceMapping is reached memory consumption >>>>>>>>explodes >>>>>>>> to gigabytes within minutes. >>>>>>>> >>>>>>>> The low-level problem is the mappings variable in the >>>>>>>> GenerateMappings, which within a few iteration contains thousands >>>>>>>>of >>>>>>>> elements. >>>>>>>> >>>>>>>> This problem no longer occurs on trunk. Have you actually been >>>>>>>> introducing any fixes around that area? >>>>>>>> >>>>>>>> Tom >>>>>>>> >>>>>>>> On Mon, Jan 14, 2013 at 12:32 PM, Rob Vesse <rv...@do...> >>>>>>>>wrote: >>>>>>>>> Comments inline: >>>>>>>>> >>>>>>>>> On 1/10/13 7:14 PM, "Tomek Pluskiewicz" <to...@pl...> >>>>>>>>>wrote: >>>>>>>>> >>>>>>>>>>Hi Rob >>>>>>>>>> >>>>>>>>>>I have just updated to latest dotNetRDF available on NuGet and I'm >>>>>>>>>>experiencing two issues. >>>>>>>>>> >>>>>>>>>>1. In my unit tests I relied on the way the library assigns blank >>>>>>>>>>node >>>>>>>>>>identifiers: autos1, autos2 and so on. When I run the tests >>>>>>>>>>separately >>>>>>>>>>each one passes but when I batch them they fail because in >>>>>>>>>>subsequent >>>>>>>>>>tests blank nodes are name autos2, autos3, etc. However they don't >>>>>>>>>>share the same graph or triple store. Have you changed this >>>>>>>>>>behavior >>>>>>>>>>delbierately? >>>>>>>>> >>>>>>>>> Yes this behavior changed in the 0.8.x releases, the change was >>>>>>>>>made >>>>>>>>>in >>>>>>>>> order to resolve a bug in SPARQL 1.1 Update support and also >>>>>>>>>uncovered >>>>>>>>>a >>>>>>>>> bug in graph isomorphism calculation which was fixed. >>>>>>>>> >>>>>>>>> You shouldn't rely on an internal implementation detail like how >>>>>>>>>the >>>>>>>>> library assigns blank node identifiers. Blank nodes should always >>>>>>>>>be >>>>>>>>> identifiable by the triples they appear in so it should be >>>>>>>>>possible >>>>>>>>>to >>>>>>>>> formulate API calls or SPARQL queries that validate that you have >>>>>>>>>produced >>>>>>>>> the data you expected. >>>>>>>>> >>>>>>>>>> >>>>>>>>>>2. There is a bad memory leak in during SPARQL execution of this: >>>>>>>>> >>>>>>>>> Define bad memory leak? >>>>>>>>> >>>>>>>>> Updates are transactional so it may be a side effect of the >>>>>>>>>library >>>>>>>>> maintaining the state necessary to rollback the transaction should >>>>>>>>>it >>>>>>>>>fail >>>>>>>>> or be aborted. Also the fact that you are replacing constant >>>>>>>>>nodes >>>>>>>>>with >>>>>>>>> blank nodes will assign a lot of new identifiers and those >>>>>>>>>identifiers >>>>>>>>> have to be tracked to prevent collisions. >>>>>>>>> >>>>>>>>>> >>>>>>>>>>PREFIX rr: <http://www.w3.org/ns/r2rml#> >>>>>>>>>>DELETE { ?map rr:graph ?value . } >>>>>>>>>>INSERT { ?map rr:graphMap [ rr:constant ?value ] . } >>>>>>>>>>WHERE { ?map rr:graph ?value } ; >>>>>>>>>> >>>>>>>>>>DELETE { ?map rr:object ?value . } >>>>>>>>>>INSERT { ?map rr:objectMap [ rr:constant ?value ] . } >>>>>>>>>>WHERE { ?map rr:object ?value } ; >>>>>>>>>> >>>>>>>>>>DELETE { ?map rr:predicate ?value . } >>>>>>>>>>INSERT { ?map rr:predicateMap [ rr:constant ?value ] . } >>>>>>>>>>WHERE { ?map rr:predicate ?value } ; >>>>>>>>>> >>>>>>>>>>DELETE { ?map rr:subject ?value . } >>>>>>>>>>INSERT { ?map rr:subjectMap [ rr:constant ?value ] . } >>>>>>>>>>WHERE { ?map rr:subject ?value } >>>>>>>>>> >>>>>>>>>>The full code is simply: >>>>>>>>>> >>>>>>>>>>var dataset = new InMemoryDataset(store, R2RMLMappings.BaseUri); >>>>>>>>>> ISparqlUpdateProcessor processor = new >>>>>>>>>>LeviathanUpdateProcessor(dataset); >>>>>>>>>> var updateParser = new SparqlUpdateParser(); >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>processor.ProcessCommandSet(updateParser.ParseFromString(ShortcutS >>>>>>>>>>ub >>>>>>>>>>m >>>>>>>>>>a >>>>>>>>>>p >>>>>>>>>>sRe >>>>>>>>>>placeSparql)); >>>>>>>>>> >>>>>>>>>>Is this a know problem and has been already fixed or should I >>>>>>>>>>investigate closely? >>>>>>>>> >>>>>>>>> This is not a known issue, I would also guess that the data being >>>>>>>>>used >>>>>>>>> would have some bearing on the severity of the problem. Please go >>>>>>>>>ahead >>>>>>>>> and investigate but I would suspect it is the two things I >>>>>>>>>outlined >>>>>>>>>above >>>>>>>>> which are the culprits here. >>>>>>>>> >>>>>>>>> Rob >>>>>>>>> >>>>>>>>>> >>>>>>>>>>Thanks, >>>>>>>>>>Tom >>>>>>>>>> >>>>>>>>>>------------------------------------------------------------------ >>>>>>>>>>-- >>>>>>>>>>- >>>>>>>>>>- >>>>>>>>>>- >>>>>>>>>>--- >>>>>>>>>>---- >>>>>>>>>>Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >>>>>>>>>>CSS, >>>>>>>>>>MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >>>>>>>>>>current >>>>>>>>>>with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >>>>>>>>>>MVPs and experts. ON SALE this month only -- learn more at: >>>>>>>>>>http://p.sf.net/sfu/learnmore_122712 >>>>>>>>>>_______________________________________________ >>>>>>>>>>dotNetRDF-bugs mailing list >>>>>>>>>>dot...@li... >>>>>>>>>>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>------------------------------------------------------------------- >>>>>>>>>-- >>>>>>>>>- >>>>>>>>>- >>>>>>>>>- >>>>>>>>>------ >>>>>>>>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >>>>>>>>>CSS, >>>>>>>>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >>>>>>>>>current >>>>>>>>> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >>>>>>>>> MVPs and experts. SALE $99.99 this month only -- learn more at: >>>>>>>>> http://p.sf.net/sfu/learnmore_122412 >>>>>>>>> _______________________________________________ >>>>>>>>> dotNetRDF-bugs mailing list >>>>>>>>> dot...@li... >>>>>>>>> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >>>>>>> >>>>>>>--------------------------------------------------------------------- >>>>>>>-- >>>>>>>- >>>>>>>- >>>>>>>- >>>>>>>---- >>>>>>>Minimize network downtime and maximize team effectiveness. >>>>>>>Reduce network management and security costs.Learn how to hire >>>>>>>the most talented Cisco Certified professionals. Visit the >>>>>>>Employer Resources Portal >>>>>>>http://www.cisco.com/web/learning/employer_resources/index.html >>>>>>>_______________________________________________ >>>>>>>dotNetRDF-bugs mailing list >>>>>>>dot...@li... >>>>>>>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>---------------------------------------------------------------------- >>>>>>-- >>>>>>- >>>>>>- >>>>>>---- >>>>>>Precog is a next-generation analytics platform capable of advanced >>>>>>analytics on semi-structured data. The platform includes APIs for >>>>>>building >>>>>>apps and a phenomenal toolset for data science. Developers can use >>>>>>our toolset for easy data analysis & visualization. Get a free >>>>>>account! >>>>>>http://www2.precog.com/precogplatform/slashdotnewsletter >>>>>>_______________________________________________ >>>>>>dotNetRDF-bugs mailing list >>>>>>dot...@li... >>>>>>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>----------------------------------------------------------------------- >>>>>-- >>>>>- >>>>>---- >>>>>Precog is a next-generation analytics platform capable of advanced >>>>>analytics on semi-structured data. The platform includes APIs for >>>>>building >>>>>apps and a phenomenal toolset for data science. Developers can use >>>>>our toolset for easy data analysis & visualization. Get a free account! >>>>>http://www2.precog.com/precogplatform/slashdotnewsletter >>>>>_______________________________________________ >>>>>dotNetRDF-bugs mailing list >>>>>dot...@li... >>>>>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >>>> >>>> >>>> >>>> >>>> >>>>------------------------------------------------------------------------ >>>>-- >>>>---- >>>>Precog is a next-generation analytics platform capable of advanced >>>>analytics on semi-structured data. The platform includes APIs for >>>>building >>>>apps and a phenomenal toolset for data science. Developers can use >>>>our toolset for easy data analysis & visualization. Get a free account! >>>>http://www2.precog.com/precogplatform/slashdotnewsletter >>>>_______________________________________________ >>>>dotNetRDF-bugs mailing list >>>>dot...@li... >>>>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >>> >>> >>> >>> >>> >>> >>>------------------------------------------------------------------------- >>>----- >>> Precog is a next-generation analytics platform capable of advanced >>> analytics on semi-structured data. The platform includes APIs for >>>building >>> apps and a phenomenal toolset for data science. Developers can use >>> our toolset for easy data analysis & visualization. Get a free account! >>> http://www2.precog.com/precogplatform/slashdotnewsletter >>> _______________________________________________ >>> dotNetRDF-bugs mailing list >>> dot...@li... >>> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >> >>-------------------------------------------------------------------------- >>---- >>Precog is a next-generation analytics platform capable of advanced >>analytics on semi-structured data. The platform includes APIs for building >>apps and a phenomenal toolset for data science. Developers can use >>our toolset for easy data analysis & visualization. Get a free account! >>http://www2.precog.com/precogplatform/slashdotnewsletter >>_______________________________________________ >>dotNetRDF-bugs mailing list >>dot...@li... >>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs > > > > > > ------------------------------------------------------------------------------ > Precog is a next-generation analytics platform capable of advanced > analytics on semi-structured data. The platform includes APIs for building > apps and a phenomenal toolset for data science. Developers can use > our toolset for easy data analysis & visualization. Get a free account! > http://www2.precog.com/precogplatform/slashdotnewsletter > _______________________________________________ > dotNetRDF-bugs mailing list > dot...@li... > https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs |