|
From: Rob V. <rv...@do...> - 2013-04-09 17:35:31
|
Hey Tom The problem is that graph isomorphism is NP-hard so sometimes the only option we have is to attempt to brute force the problem I've started added some Debug.WriteLine() to GraphMatcher to track down where things go wrong For your graphs they may look trivially equal but to code they are not, the reason this worked prior to 0.8.0 is that one of the things we try is a trivial mapping (assume blank nodes have same IDs in both graphs) so in previous releases you would likely have hit this case and been fine. You have 33 blank nodes in the graph of which only 6 are uniquely identifiable and mappable. The matcher generates a candidate mapping for the whole graph but its best effort is incorrect, so then it falls back to brute force. I need to dig further into whether the candidate mapping could be improved but this is not trivial to debug and will take some time to resolve. We may be able to reduce the "memory leak" by using yield rather than pre-generating all possible mapping but this is a tricky refactor, it's been a long time since I wrote the code originally and I remember that doing the mapping in the yield form proved thorny at the time so I chose not to. The code itself for generating the mappings has some slightly strange things in it so I really need to spend a block of time refreshing myself on the logic there to check that it is sound before I attempt to refactor. Rob On 4/7/13 11:20 AM, "Tomasz Pluskiewicz" <tom...@gm...> wrote: >Hm, I was wrong actually. > >I tried comparing the exact same graphs loaded from Turtle in >dotNetRDF test project but I got the unit test wrong. > >I have added the CORE-345 bug and committed a failing test case [1]. >Could you please have a look at this? > >Thanks, >Tom > >[1]: https://bitbucket.org/dotnetrdf/dotnetrdf/commits/branch/CORE-345 > >On Sun, Apr 7, 2013 at 7:36 PM, Tomasz Pluskiewicz ><tom...@gm...> wrote: >> Hi Rob >> >> I finally got back to R2RML to analyze why I am getting that memory >> leak. It seems connected to the changes you had to introduce for >> SPARQL 1.1. >> >> I have determined that it happens in GraphMatcher#GenerateMappings >> method. The graphs are equal and I'm not sure what causes the problem. >> As soon as TryBruteForceMapping is reached memory consumption explodes >> to gigabytes within minutes. >> >> The low-level problem is the mappings variable in the >> GenerateMappings, which within a few iteration contains thousands of >> elements. >> >> This problem no longer occurs on trunk. Have you actually been >> introducing any fixes around that area? >> >> Tom >> >> On Mon, Jan 14, 2013 at 12:32 PM, Rob Vesse <rv...@do...> >>wrote: >>> Comments inline: >>> >>> On 1/10/13 7:14 PM, "Tomek Pluskiewicz" <to...@pl...> wrote: >>> >>>>Hi Rob >>>> >>>>I have just updated to latest dotNetRDF available on NuGet and I'm >>>>experiencing two issues. >>>> >>>>1. In my unit tests I relied on the way the library assigns blank node >>>>identifiers: autos1, autos2 and so on. When I run the tests separately >>>>each one passes but when I batch them they fail because in subsequent >>>>tests blank nodes are name autos2, autos3, etc. However they don't >>>>share the same graph or triple store. Have you changed this behavior >>>>delbierately? >>> >>> Yes this behavior changed in the 0.8.x releases, the change was made in >>> order to resolve a bug in SPARQL 1.1 Update support and also uncovered >>>a >>> bug in graph isomorphism calculation which was fixed. >>> >>> You shouldn't rely on an internal implementation detail like how the >>> library assigns blank node identifiers. Blank nodes should always be >>> identifiable by the triples they appear in so it should be possible to >>> formulate API calls or SPARQL queries that validate that you have >>>produced >>> the data you expected. >>> >>>> >>>>2. There is a bad memory leak in during SPARQL execution of this: >>> >>> Define bad memory leak? >>> >>> Updates are transactional so it may be a side effect of the library >>> maintaining the state necessary to rollback the transaction should it >>>fail >>> or be aborted. Also the fact that you are replacing constant nodes >>>with >>> blank nodes will assign a lot of new identifiers and those identifiers >>> have to be tracked to prevent collisions. >>> >>>> >>>>PREFIX rr: <http://www.w3.org/ns/r2rml#> >>>>DELETE { ?map rr:graph ?value . } >>>>INSERT { ?map rr:graphMap [ rr:constant ?value ] . } >>>>WHERE { ?map rr:graph ?value } ; >>>> >>>>DELETE { ?map rr:object ?value . } >>>>INSERT { ?map rr:objectMap [ rr:constant ?value ] . } >>>>WHERE { ?map rr:object ?value } ; >>>> >>>>DELETE { ?map rr:predicate ?value . } >>>>INSERT { ?map rr:predicateMap [ rr:constant ?value ] . } >>>>WHERE { ?map rr:predicate ?value } ; >>>> >>>>DELETE { ?map rr:subject ?value . } >>>>INSERT { ?map rr:subjectMap [ rr:constant ?value ] . } >>>>WHERE { ?map rr:subject ?value } >>>> >>>>The full code is simply: >>>> >>>>var dataset = new InMemoryDataset(store, R2RMLMappings.BaseUri); >>>> ISparqlUpdateProcessor processor = new >>>>LeviathanUpdateProcessor(dataset); >>>> var updateParser = new SparqlUpdateParser(); >>>> >>>> >>>>processor.ProcessCommandSet(updateParser.ParseFromString(ShortcutSubmap >>>>sRe >>>>placeSparql)); >>>> >>>>Is this a know problem and has been already fixed or should I >>>>investigate closely? >>> >>> This is not a known issue, I would also guess that the data being used >>> would have some bearing on the severity of the problem. Please go >>>ahead >>> and investigate but I would suspect it is the two things I outlined >>>above >>> which are the culprits here. >>> >>> Rob >>> >>>> >>>>Thanks, >>>>Tom >>>> >>>>----------------------------------------------------------------------- >>>>--- >>>>---- >>>>Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >>>>MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >>>>with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >>>>MVPs and experts. ON SALE this month only -- learn more at: >>>>http://p.sf.net/sfu/learnmore_122712 >>>>_______________________________________________ >>>>dotNetRDF-bugs mailing list >>>>dot...@li... >>>>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >>> >>> >>> >>> >>> >>> >>>------------------------------------------------------------------------ >>>------ >>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >>> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >>> MVPs and experts. SALE $99.99 this month only -- learn more at: >>> http://p.sf.net/sfu/learnmore_122412 >>> _______________________________________________ >>> dotNetRDF-bugs mailing list >>> dot...@li... >>> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs > >-------------------------------------------------------------------------- >---- >Minimize network downtime and maximize team effectiveness. >Reduce network management and security costs.Learn how to hire >the most talented Cisco Certified professionals. Visit the >Employer Resources Portal >http://www.cisco.com/web/learning/employer_resources/index.html >_______________________________________________ >dotNetRDF-bugs mailing list >dot...@li... >https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs |