|
From: Tomasz P. <tom...@gm...> - 2013-04-12 18:24:12
|
Hi Rob Thanks so much. And yes, I do have 4 or 5 cases which stumble on this same issue. I will add all these to the test fixture. Tom On Fri, Apr 12, 2013 at 8:20 PM, Rob Vesse <rv...@do...> wrote: > Hey Tom > > This should now be fixed for your test case though I am not 100% convinced > that brute forcing is not still broken > > What I have done to fix this is to add an intermediate step between the > rules based and brute force mapping which does a divide and conquer > approach > > What this does is break the unmapped blank node portions of the graph into > its constituent isolated sub-graphs (those that share no blank nodes) and > then recursively calls Equals() on the candidate matches for the > sub-graphs. This approach reduces the amount of work required and the > likelihood of needing to brute force at all though we still fall back in > the worst case. > > If you can come up with any more graphs that break GraphMatcher those > would be much appreciated > > Rob > > On 4/12/13 10:25 AM, "Rob Vesse" <rv...@do...> wrote: > >>s/not/now >> >>That should be "the test will now complete within the timeout" >> >>Rob >> >>On 4/12/13 10:23 AM, "Rob Vesse" <rv...@do...> wrote: >> >>>Hey Tom >>> >>>So the logic for generating the brute force mappings was completely >>>broken >>>causing it to get stuck in a memory sucking spin cycle :( >>> >>>I rewrote the GenerateMappings() method from scratch to use yield return >>>and the test will not complete within the timeout but it fails so I still >>>need to dig further >>> >>>We may still be generating incorrect possible mappings or the logic for >>>brute force may be flawed elsewhere >>> >>>Rob >>> >>>On 4/9/13 10:34 AM, "Rob Vesse" <rv...@do...> wrote: >>> >>>>Hey Tom >>>> >>>>The problem is that graph isomorphism is NP-hard so sometimes the only >>>>option we have is to attempt to brute force the problem >>>> >>>>I've started added some Debug.WriteLine() to GraphMatcher to track down >>>>where things go wrong >>>> >>>>For your graphs they may look trivially equal but to code they are not, >>>>the reason this worked prior to 0.8.0 is that one of the things we try >>>>is >>>>a trivial mapping (assume blank nodes have same IDs in both graphs) so >>>>in >>>>previous releases you would likely have hit this case and been fine. >>>> >>>>You have 33 blank nodes in the graph of which only 6 are uniquely >>>>identifiable and mappable. The matcher generates a candidate mapping >>>>for >>>>the whole graph but its best effort is incorrect, so then it falls back >>>>to >>>>brute force. I need to dig further into whether the candidate mapping >>>>could be improved but this is not trivial to debug and will take some >>>>time >>>>to resolve. >>>> >>>>We may be able to reduce the "memory leak" by using yield rather than >>>>pre-generating all possible mapping but this is a tricky refactor, it's >>>>been a long time since I wrote the code originally and I remember that >>>>doing the mapping in the yield form proved thorny at the time so I chose >>>>not to. The code itself for generating the mappings has some slightly >>>>strange things in it so I really need to spend a block of time >>>>refreshing >>>>myself on the logic there to check that it is sound before I attempt to >>>>refactor. >>>> >>>>Rob >>>> >>>>On 4/7/13 11:20 AM, "Tomasz Pluskiewicz" <tom...@gm...> >>>>wrote: >>>> >>>>>Hm, I was wrong actually. >>>>> >>>>>I tried comparing the exact same graphs loaded from Turtle in >>>>>dotNetRDF test project but I got the unit test wrong. >>>>> >>>>>I have added the CORE-345 bug and committed a failing test case [1]. >>>>>Could you please have a look at this? >>>>> >>>>>Thanks, >>>>>Tom >>>>> >>>>>[1]: https://bitbucket.org/dotnetrdf/dotnetrdf/commits/branch/CORE-345 >>>>> >>>>>On Sun, Apr 7, 2013 at 7:36 PM, Tomasz Pluskiewicz >>>>><tom...@gm...> wrote: >>>>>> Hi Rob >>>>>> >>>>>> I finally got back to R2RML to analyze why I am getting that memory >>>>>> leak. It seems connected to the changes you had to introduce for >>>>>> SPARQL 1.1. >>>>>> >>>>>> I have determined that it happens in GraphMatcher#GenerateMappings >>>>>> method. The graphs are equal and I'm not sure what causes the >>>>>>problem. >>>>>> As soon as TryBruteForceMapping is reached memory consumption >>>>>>explodes >>>>>> to gigabytes within minutes. >>>>>> >>>>>> The low-level problem is the mappings variable in the >>>>>> GenerateMappings, which within a few iteration contains thousands of >>>>>> elements. >>>>>> >>>>>> This problem no longer occurs on trunk. Have you actually been >>>>>> introducing any fixes around that area? >>>>>> >>>>>> Tom >>>>>> >>>>>> On Mon, Jan 14, 2013 at 12:32 PM, Rob Vesse <rv...@do...> >>>>>>wrote: >>>>>>> Comments inline: >>>>>>> >>>>>>> On 1/10/13 7:14 PM, "Tomek Pluskiewicz" <to...@pl...> >>>>>>>wrote: >>>>>>> >>>>>>>>Hi Rob >>>>>>>> >>>>>>>>I have just updated to latest dotNetRDF available on NuGet and I'm >>>>>>>>experiencing two issues. >>>>>>>> >>>>>>>>1. In my unit tests I relied on the way the library assigns blank >>>>>>>>node >>>>>>>>identifiers: autos1, autos2 and so on. When I run the tests >>>>>>>>separately >>>>>>>>each one passes but when I batch them they fail because in >>>>>>>>subsequent >>>>>>>>tests blank nodes are name autos2, autos3, etc. However they don't >>>>>>>>share the same graph or triple store. Have you changed this behavior >>>>>>>>delbierately? >>>>>>> >>>>>>> Yes this behavior changed in the 0.8.x releases, the change was made >>>>>>>in >>>>>>> order to resolve a bug in SPARQL 1.1 Update support and also >>>>>>>uncovered >>>>>>>a >>>>>>> bug in graph isomorphism calculation which was fixed. >>>>>>> >>>>>>> You shouldn't rely on an internal implementation detail like how the >>>>>>> library assigns blank node identifiers. Blank nodes should always >>>>>>>be >>>>>>> identifiable by the triples they appear in so it should be possible >>>>>>>to >>>>>>> formulate API calls or SPARQL queries that validate that you have >>>>>>>produced >>>>>>> the data you expected. >>>>>>> >>>>>>>> >>>>>>>>2. There is a bad memory leak in during SPARQL execution of this: >>>>>>> >>>>>>> Define bad memory leak? >>>>>>> >>>>>>> Updates are transactional so it may be a side effect of the library >>>>>>> maintaining the state necessary to rollback the transaction should >>>>>>>it >>>>>>>fail >>>>>>> or be aborted. Also the fact that you are replacing constant nodes >>>>>>>with >>>>>>> blank nodes will assign a lot of new identifiers and those >>>>>>>identifiers >>>>>>> have to be tracked to prevent collisions. >>>>>>> >>>>>>>> >>>>>>>>PREFIX rr: <http://www.w3.org/ns/r2rml#> >>>>>>>>DELETE { ?map rr:graph ?value . } >>>>>>>>INSERT { ?map rr:graphMap [ rr:constant ?value ] . } >>>>>>>>WHERE { ?map rr:graph ?value } ; >>>>>>>> >>>>>>>>DELETE { ?map rr:object ?value . } >>>>>>>>INSERT { ?map rr:objectMap [ rr:constant ?value ] . } >>>>>>>>WHERE { ?map rr:object ?value } ; >>>>>>>> >>>>>>>>DELETE { ?map rr:predicate ?value . } >>>>>>>>INSERT { ?map rr:predicateMap [ rr:constant ?value ] . } >>>>>>>>WHERE { ?map rr:predicate ?value } ; >>>>>>>> >>>>>>>>DELETE { ?map rr:subject ?value . } >>>>>>>>INSERT { ?map rr:subjectMap [ rr:constant ?value ] . } >>>>>>>>WHERE { ?map rr:subject ?value } >>>>>>>> >>>>>>>>The full code is simply: >>>>>>>> >>>>>>>>var dataset = new InMemoryDataset(store, R2RMLMappings.BaseUri); >>>>>>>> ISparqlUpdateProcessor processor = new >>>>>>>>LeviathanUpdateProcessor(dataset); >>>>>>>> var updateParser = new SparqlUpdateParser(); >>>>>>>> >>>>>>>> >>>>>>>>processor.ProcessCommandSet(updateParser.ParseFromString(ShortcutSub >>>>>>>>m >>>>>>>>a >>>>>>>>p >>>>>>>>sRe >>>>>>>>placeSparql)); >>>>>>>> >>>>>>>>Is this a know problem and has been already fixed or should I >>>>>>>>investigate closely? >>>>>>> >>>>>>> This is not a known issue, I would also guess that the data being >>>>>>>used >>>>>>> would have some bearing on the severity of the problem. Please go >>>>>>>ahead >>>>>>> and investigate but I would suspect it is the two things I outlined >>>>>>>above >>>>>>> which are the culprits here. >>>>>>> >>>>>>> Rob >>>>>>> >>>>>>>> >>>>>>>>Thanks, >>>>>>>>Tom >>>>>>>> >>>>>>>>-------------------------------------------------------------------- >>>>>>>>- >>>>>>>>- >>>>>>>>- >>>>>>>>--- >>>>>>>>---- >>>>>>>>Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >>>>>>>>MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >>>>>>>>current >>>>>>>>with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >>>>>>>>MVPs and experts. ON SALE this month only -- learn more at: >>>>>>>>http://p.sf.net/sfu/learnmore_122712 >>>>>>>>_______________________________________________ >>>>>>>>dotNetRDF-bugs mailing list >>>>>>>>dot...@li... >>>>>>>>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>--------------------------------------------------------------------- >>>>>>>- >>>>>>>- >>>>>>>- >>>>>>>------ >>>>>>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >>>>>>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >>>>>>>current >>>>>>> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >>>>>>> MVPs and experts. SALE $99.99 this month only -- learn more at: >>>>>>> http://p.sf.net/sfu/learnmore_122412 >>>>>>> _______________________________________________ >>>>>>> dotNetRDF-bugs mailing list >>>>>>> dot...@li... >>>>>>> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >>>>> >>>>>----------------------------------------------------------------------- >>>>>- >>>>>- >>>>>- >>>>>---- >>>>>Minimize network downtime and maximize team effectiveness. >>>>>Reduce network management and security costs.Learn how to hire >>>>>the most talented Cisco Certified professionals. Visit the >>>>>Employer Resources Portal >>>>>http://www.cisco.com/web/learning/employer_resources/index.html >>>>>_______________________________________________ >>>>>dotNetRDF-bugs mailing list >>>>>dot...@li... >>>>>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >>>> >>>> >>>> >>>> >>>> >>>>------------------------------------------------------------------------ >>>>- >>>>- >>>>---- >>>>Precog is a next-generation analytics platform capable of advanced >>>>analytics on semi-structured data. The platform includes APIs for >>>>building >>>>apps and a phenomenal toolset for data science. Developers can use >>>>our toolset for easy data analysis & visualization. Get a free account! >>>>http://www2.precog.com/precogplatform/slashdotnewsletter >>>>_______________________________________________ >>>>dotNetRDF-bugs mailing list >>>>dot...@li... >>>>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >>> >>> >>> >>> >>> >>>------------------------------------------------------------------------- >>>- >>>---- >>>Precog is a next-generation analytics platform capable of advanced >>>analytics on semi-structured data. The platform includes APIs for >>>building >>>apps and a phenomenal toolset for data science. Developers can use >>>our toolset for easy data analysis & visualization. Get a free account! >>>http://www2.precog.com/precogplatform/slashdotnewsletter >>>_______________________________________________ >>>dotNetRDF-bugs mailing list >>>dot...@li... >>>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >> >> >> >> >> >>-------------------------------------------------------------------------- >>---- >>Precog is a next-generation analytics platform capable of advanced >>analytics on semi-structured data. The platform includes APIs for building >>apps and a phenomenal toolset for data science. Developers can use >>our toolset for easy data analysis & visualization. Get a free account! >>http://www2.precog.com/precogplatform/slashdotnewsletter >>_______________________________________________ >>dotNetRDF-bugs mailing list >>dot...@li... >>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs > > > > > > ------------------------------------------------------------------------------ > Precog is a next-generation analytics platform capable of advanced > analytics on semi-structured data. The platform includes APIs for building > apps and a phenomenal toolset for data science. Developers can use > our toolset for easy data analysis & visualization. Get a free account! > http://www2.precog.com/precogplatform/slashdotnewsletter > _______________________________________________ > dotNetRDF-bugs mailing list > dot...@li... > https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs |