|
From: Rob V. <rv...@do...> - 2013-04-12 17:26:47
|
s/not/now That should be "the test will now complete within the timeout" Rob On 4/12/13 10:23 AM, "Rob Vesse" <rv...@do...> wrote: >Hey Tom > >So the logic for generating the brute force mappings was completely broken >causing it to get stuck in a memory sucking spin cycle :( > >I rewrote the GenerateMappings() method from scratch to use yield return >and the test will not complete within the timeout but it fails so I still >need to dig further > >We may still be generating incorrect possible mappings or the logic for >brute force may be flawed elsewhere > >Rob > >On 4/9/13 10:34 AM, "Rob Vesse" <rv...@do...> wrote: > >>Hey Tom >> >>The problem is that graph isomorphism is NP-hard so sometimes the only >>option we have is to attempt to brute force the problem >> >>I've started added some Debug.WriteLine() to GraphMatcher to track down >>where things go wrong >> >>For your graphs they may look trivially equal but to code they are not, >>the reason this worked prior to 0.8.0 is that one of the things we try is >>a trivial mapping (assume blank nodes have same IDs in both graphs) so in >>previous releases you would likely have hit this case and been fine. >> >>You have 33 blank nodes in the graph of which only 6 are uniquely >>identifiable and mappable. The matcher generates a candidate mapping for >>the whole graph but its best effort is incorrect, so then it falls back >>to >>brute force. I need to dig further into whether the candidate mapping >>could be improved but this is not trivial to debug and will take some >>time >>to resolve. >> >>We may be able to reduce the "memory leak" by using yield rather than >>pre-generating all possible mapping but this is a tricky refactor, it's >>been a long time since I wrote the code originally and I remember that >>doing the mapping in the yield form proved thorny at the time so I chose >>not to. The code itself for generating the mappings has some slightly >>strange things in it so I really need to spend a block of time refreshing >>myself on the logic there to check that it is sound before I attempt to >>refactor. >> >>Rob >> >>On 4/7/13 11:20 AM, "Tomasz Pluskiewicz" <tom...@gm...> >>wrote: >> >>>Hm, I was wrong actually. >>> >>>I tried comparing the exact same graphs loaded from Turtle in >>>dotNetRDF test project but I got the unit test wrong. >>> >>>I have added the CORE-345 bug and committed a failing test case [1]. >>>Could you please have a look at this? >>> >>>Thanks, >>>Tom >>> >>>[1]: https://bitbucket.org/dotnetrdf/dotnetrdf/commits/branch/CORE-345 >>> >>>On Sun, Apr 7, 2013 at 7:36 PM, Tomasz Pluskiewicz >>><tom...@gm...> wrote: >>>> Hi Rob >>>> >>>> I finally got back to R2RML to analyze why I am getting that memory >>>> leak. It seems connected to the changes you had to introduce for >>>> SPARQL 1.1. >>>> >>>> I have determined that it happens in GraphMatcher#GenerateMappings >>>> method. The graphs are equal and I'm not sure what causes the problem. >>>> As soon as TryBruteForceMapping is reached memory consumption explodes >>>> to gigabytes within minutes. >>>> >>>> The low-level problem is the mappings variable in the >>>> GenerateMappings, which within a few iteration contains thousands of >>>> elements. >>>> >>>> This problem no longer occurs on trunk. Have you actually been >>>> introducing any fixes around that area? >>>> >>>> Tom >>>> >>>> On Mon, Jan 14, 2013 at 12:32 PM, Rob Vesse <rv...@do...> >>>>wrote: >>>>> Comments inline: >>>>> >>>>> On 1/10/13 7:14 PM, "Tomek Pluskiewicz" <to...@pl...> >>>>>wrote: >>>>> >>>>>>Hi Rob >>>>>> >>>>>>I have just updated to latest dotNetRDF available on NuGet and I'm >>>>>>experiencing two issues. >>>>>> >>>>>>1. In my unit tests I relied on the way the library assigns blank >>>>>>node >>>>>>identifiers: autos1, autos2 and so on. When I run the tests >>>>>>separately >>>>>>each one passes but when I batch them they fail because in subsequent >>>>>>tests blank nodes are name autos2, autos3, etc. However they don't >>>>>>share the same graph or triple store. Have you changed this behavior >>>>>>delbierately? >>>>> >>>>> Yes this behavior changed in the 0.8.x releases, the change was made >>>>>in >>>>> order to resolve a bug in SPARQL 1.1 Update support and also >>>>>uncovered >>>>>a >>>>> bug in graph isomorphism calculation which was fixed. >>>>> >>>>> You shouldn't rely on an internal implementation detail like how the >>>>> library assigns blank node identifiers. Blank nodes should always be >>>>> identifiable by the triples they appear in so it should be possible >>>>>to >>>>> formulate API calls or SPARQL queries that validate that you have >>>>>produced >>>>> the data you expected. >>>>> >>>>>> >>>>>>2. There is a bad memory leak in during SPARQL execution of this: >>>>> >>>>> Define bad memory leak? >>>>> >>>>> Updates are transactional so it may be a side effect of the library >>>>> maintaining the state necessary to rollback the transaction should it >>>>>fail >>>>> or be aborted. Also the fact that you are replacing constant nodes >>>>>with >>>>> blank nodes will assign a lot of new identifiers and those >>>>>identifiers >>>>> have to be tracked to prevent collisions. >>>>> >>>>>> >>>>>>PREFIX rr: <http://www.w3.org/ns/r2rml#> >>>>>>DELETE { ?map rr:graph ?value . } >>>>>>INSERT { ?map rr:graphMap [ rr:constant ?value ] . } >>>>>>WHERE { ?map rr:graph ?value } ; >>>>>> >>>>>>DELETE { ?map rr:object ?value . } >>>>>>INSERT { ?map rr:objectMap [ rr:constant ?value ] . } >>>>>>WHERE { ?map rr:object ?value } ; >>>>>> >>>>>>DELETE { ?map rr:predicate ?value . } >>>>>>INSERT { ?map rr:predicateMap [ rr:constant ?value ] . } >>>>>>WHERE { ?map rr:predicate ?value } ; >>>>>> >>>>>>DELETE { ?map rr:subject ?value . } >>>>>>INSERT { ?map rr:subjectMap [ rr:constant ?value ] . } >>>>>>WHERE { ?map rr:subject ?value } >>>>>> >>>>>>The full code is simply: >>>>>> >>>>>>var dataset = new InMemoryDataset(store, R2RMLMappings.BaseUri); >>>>>> ISparqlUpdateProcessor processor = new >>>>>>LeviathanUpdateProcessor(dataset); >>>>>> var updateParser = new SparqlUpdateParser(); >>>>>> >>>>>> >>>>>>processor.ProcessCommandSet(updateParser.ParseFromString(ShortcutSubm >>>>>>a >>>>>>p >>>>>>sRe >>>>>>placeSparql)); >>>>>> >>>>>>Is this a know problem and has been already fixed or should I >>>>>>investigate closely? >>>>> >>>>> This is not a known issue, I would also guess that the data being >>>>>used >>>>> would have some bearing on the severity of the problem. Please go >>>>>ahead >>>>> and investigate but I would suspect it is the two things I outlined >>>>>above >>>>> which are the culprits here. >>>>> >>>>> Rob >>>>> >>>>>> >>>>>>Thanks, >>>>>>Tom >>>>>> >>>>>>--------------------------------------------------------------------- >>>>>>- >>>>>>- >>>>>>--- >>>>>>---- >>>>>>Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >>>>>>MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >>>>>>current >>>>>>with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >>>>>>MVPs and experts. ON SALE this month only -- learn more at: >>>>>>http://p.sf.net/sfu/learnmore_122712 >>>>>>_______________________________________________ >>>>>>dotNetRDF-bugs mailing list >>>>>>dot...@li... >>>>>>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>---------------------------------------------------------------------- >>>>>- >>>>>- >>>>>------ >>>>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >>>>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >>>>>current >>>>> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >>>>> MVPs and experts. SALE $99.99 this month only -- learn more at: >>>>> http://p.sf.net/sfu/learnmore_122412 >>>>> _______________________________________________ >>>>> dotNetRDF-bugs mailing list >>>>> dot...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >>> >>>------------------------------------------------------------------------ >>>- >>>- >>>---- >>>Minimize network downtime and maximize team effectiveness. >>>Reduce network management and security costs.Learn how to hire >>>the most talented Cisco Certified professionals. Visit the >>>Employer Resources Portal >>>http://www.cisco.com/web/learning/employer_resources/index.html >>>_______________________________________________ >>>dotNetRDF-bugs mailing list >>>dot...@li... >>>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >> >> >> >> >> >>------------------------------------------------------------------------- >>- >>---- >>Precog is a next-generation analytics platform capable of advanced >>analytics on semi-structured data. The platform includes APIs for >>building >>apps and a phenomenal toolset for data science. Developers can use >>our toolset for easy data analysis & visualization. Get a free account! >>http://www2.precog.com/precogplatform/slashdotnewsletter >>_______________________________________________ >>dotNetRDF-bugs mailing list >>dot...@li... >>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs > > > > > >-------------------------------------------------------------------------- >---- >Precog is a next-generation analytics platform capable of advanced >analytics on semi-structured data. The platform includes APIs for building >apps and a phenomenal toolset for data science. Developers can use >our toolset for easy data analysis & visualization. Get a free account! >http://www2.precog.com/precogplatform/slashdotnewsletter >_______________________________________________ >dotNetRDF-bugs mailing list >dot...@li... >https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs |