|
From: Tomek P. <to...@pl...> - 2013-04-12 19:05:47
|
I did with a little delay. Please check now. Tom On Apr 12, 2013 8:59 PM, "Rob Vesse" <rv...@do...> wrote: > Ok > > Can you push the commits up so I can pull them down and take a look at the > new test cases > > Rob > > On 4/12/13 11:55 AM, "Tomasz Pluskiewicz" <tom...@gm...> > wrote: > > >I've just committed more test cases. Out of the 6 none fail cause OOM > >anymore, which is marvellous. > > > >However case1 reports false but I'm positive these graphs are actually > >equal. > > > >Thanks, > >Tom > > > >On Fri, Apr 12, 2013 at 8:33 PM, Rob Vesse <rv...@do...> wrote: > >> Those would be useful > >> > >> Btw I closed the issue branch so please just add the tests to default > >> > >> Rob > >> > >> On 4/12/13 11:23 AM, "Tomasz Pluskiewicz" <tom...@gm... > > > >> wrote: > >> > >>>Hi Rob > >>> > >>>Thanks so much. And yes, I do have 4 or 5 cases which stumble on this > >>>same issue. I will add all these to the test fixture. > >>> > >>>Tom > >>> > >>>On Fri, Apr 12, 2013 at 8:20 PM, Rob Vesse <rv...@do...> > wrote: > >>>> Hey Tom > >>>> > >>>> This should now be fixed for your test case though I am not 100% > >>>>convinced > >>>> that brute forcing is not still broken > >>>> > >>>> What I have done to fix this is to add an intermediate step between > >>>>the > >>>> rules based and brute force mapping which does a divide and conquer > >>>> approach > >>>> > >>>> What this does is break the unmapped blank node portions of the graph > >>>>into > >>>> its constituent isolated sub-graphs (those that share no blank nodes) > >>>>and > >>>> then recursively calls Equals() on the candidate matches for the > >>>> sub-graphs. This approach reduces the amount of work required and the > >>>> likelihood of needing to brute force at all though we still fall back > >>>>in > >>>> the worst case. > >>>> > >>>> If you can come up with any more graphs that break GraphMatcher those > >>>> would be much appreciated > >>>> > >>>> Rob > >>>> > >>>> On 4/12/13 10:25 AM, "Rob Vesse" <rv...@do...> wrote: > >>>> > >>>>>s/not/now > >>>>> > >>>>>That should be "the test will now complete within the timeout" > >>>>> > >>>>>Rob > >>>>> > >>>>>On 4/12/13 10:23 AM, "Rob Vesse" <rv...@do...> wrote: > >>>>> > >>>>>>Hey Tom > >>>>>> > >>>>>>So the logic for generating the brute force mappings was completely > >>>>>>broken > >>>>>>causing it to get stuck in a memory sucking spin cycle :( > >>>>>> > >>>>>>I rewrote the GenerateMappings() method from scratch to use yield > >>>>>>return > >>>>>>and the test will not complete within the timeout but it fails so I > >>>>>>still > >>>>>>need to dig further > >>>>>> > >>>>>>We may still be generating incorrect possible mappings or the logic > >>>>>>for > >>>>>>brute force may be flawed elsewhere > >>>>>> > >>>>>>Rob > >>>>>> > >>>>>>On 4/9/13 10:34 AM, "Rob Vesse" <rv...@do...> wrote: > >>>>>> > >>>>>>>Hey Tom > >>>>>>> > >>>>>>>The problem is that graph isomorphism is NP-hard so sometimes the > >>>>>>>only > >>>>>>>option we have is to attempt to brute force the problem > >>>>>>> > >>>>>>>I've started added some Debug.WriteLine() to GraphMatcher to track > >>>>>>>down > >>>>>>>where things go wrong > >>>>>>> > >>>>>>>For your graphs they may look trivially equal but to code they are > >>>>>>>not, > >>>>>>>the reason this worked prior to 0.8.0 is that one of the things we > >>>>>>>try > >>>>>>>is > >>>>>>>a trivial mapping (assume blank nodes have same IDs in both graphs) > >>>>>>>so > >>>>>>>in > >>>>>>>previous releases you would likely have hit this case and been fine. > >>>>>>> > >>>>>>>You have 33 blank nodes in the graph of which only 6 are uniquely > >>>>>>>identifiable and mappable. The matcher generates a candidate > >>>>>>>mapping > >>>>>>>for > >>>>>>>the whole graph but its best effort is incorrect, so then it falls > >>>>>>>back > >>>>>>>to > >>>>>>>brute force. I need to dig further into whether the candidate > >>>>>>>mapping > >>>>>>>could be improved but this is not trivial to debug and will take > >>>>>>>some > >>>>>>>time > >>>>>>>to resolve. > >>>>>>> > >>>>>>>We may be able to reduce the "memory leak" by using yield rather > >>>>>>>than > >>>>>>>pre-generating all possible mapping but this is a tricky refactor, > >>>>>>>it's > >>>>>>>been a long time since I wrote the code originally and I remember > >>>>>>>that > >>>>>>>doing the mapping in the yield form proved thorny at the time so I > >>>>>>>chose > >>>>>>>not to. The code itself for generating the mappings has some > >>>>>>>slightly > >>>>>>>strange things in it so I really need to spend a block of time > >>>>>>>refreshing > >>>>>>>myself on the logic there to check that it is sound before I attempt > >>>>>>>to > >>>>>>>refactor. > >>>>>>> > >>>>>>>Rob > >>>>>>> > >>>>>>>On 4/7/13 11:20 AM, "Tomasz Pluskiewicz" > >>>>>>><tom...@gm...> > >>>>>>>wrote: > >>>>>>> > >>>>>>>>Hm, I was wrong actually. > >>>>>>>> > >>>>>>>>I tried comparing the exact same graphs loaded from Turtle in > >>>>>>>>dotNetRDF test project but I got the unit test wrong. > >>>>>>>> > >>>>>>>>I have added the CORE-345 bug and committed a failing test case > >>>>>>>>[1]. > >>>>>>>>Could you please have a look at this? > >>>>>>>> > >>>>>>>>Thanks, > >>>>>>>>Tom > >>>>>>>> > >>>>>>>>[1]: > >>>>>>>>https://bitbucket.org/dotnetrdf/dotnetrdf/commits/branch/CORE-345 > >>>>>>>> > >>>>>>>>On Sun, Apr 7, 2013 at 7:36 PM, Tomasz Pluskiewicz > >>>>>>>><tom...@gm...> wrote: > >>>>>>>>> Hi Rob > >>>>>>>>> > >>>>>>>>> I finally got back to R2RML to analyze why I am getting that > >>>>>>>>>memory > >>>>>>>>> leak. It seems connected to the changes you had to introduce for > >>>>>>>>> SPARQL 1.1. > >>>>>>>>> > >>>>>>>>> I have determined that it happens in > >>>>>>>>>GraphMatcher#GenerateMappings > >>>>>>>>> method. The graphs are equal and I'm not sure what causes the > >>>>>>>>>problem. > >>>>>>>>> As soon as TryBruteForceMapping is reached memory consumption > >>>>>>>>>explodes > >>>>>>>>> to gigabytes within minutes. > >>>>>>>>> > >>>>>>>>> The low-level problem is the mappings variable in the > >>>>>>>>> GenerateMappings, which within a few iteration contains thousands > >>>>>>>>>of > >>>>>>>>> elements. > >>>>>>>>> > >>>>>>>>> This problem no longer occurs on trunk. Have you actually been > >>>>>>>>> introducing any fixes around that area? > >>>>>>>>> > >>>>>>>>> Tom > >>>>>>>>> > >>>>>>>>> On Mon, Jan 14, 2013 at 12:32 PM, Rob Vesse > >>>>>>>>><rv...@do...> > >>>>>>>>>wrote: > >>>>>>>>>> Comments inline: > >>>>>>>>>> > >>>>>>>>>> On 1/10/13 7:14 PM, "Tomek Pluskiewicz" <to...@pl...> > >>>>>>>>>>wrote: > >>>>>>>>>> > >>>>>>>>>>>Hi Rob > >>>>>>>>>>> > >>>>>>>>>>>I have just updated to latest dotNetRDF available on NuGet and > >>>>>>>>>>>I'm > >>>>>>>>>>>experiencing two issues. > >>>>>>>>>>> > >>>>>>>>>>>1. In my unit tests I relied on the way the library assigns > >>>>>>>>>>>blank > >>>>>>>>>>>node > >>>>>>>>>>>identifiers: autos1, autos2 and so on. When I run the tests > >>>>>>>>>>>separately > >>>>>>>>>>>each one passes but when I batch them they fail because in > >>>>>>>>>>>subsequent > >>>>>>>>>>>tests blank nodes are name autos2, autos3, etc. However they > >>>>>>>>>>>don't > >>>>>>>>>>>share the same graph or triple store. Have you changed this > >>>>>>>>>>>behavior > >>>>>>>>>>>delbierately? > >>>>>>>>>> > >>>>>>>>>> Yes this behavior changed in the 0.8.x releases, the change was > >>>>>>>>>>made > >>>>>>>>>>in > >>>>>>>>>> order to resolve a bug in SPARQL 1.1 Update support and also > >>>>>>>>>>uncovered > >>>>>>>>>>a > >>>>>>>>>> bug in graph isomorphism calculation which was fixed. > >>>>>>>>>> > >>>>>>>>>> You shouldn't rely on an internal implementation detail like how > >>>>>>>>>>the > >>>>>>>>>> library assigns blank node identifiers. Blank nodes should > >>>>>>>>>>always > >>>>>>>>>>be > >>>>>>>>>> identifiable by the triples they appear in so it should be > >>>>>>>>>>possible > >>>>>>>>>>to > >>>>>>>>>> formulate API calls or SPARQL queries that validate that you > >>>>>>>>>>have > >>>>>>>>>>produced > >>>>>>>>>> the data you expected. > >>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>2. There is a bad memory leak in during SPARQL execution of > >>>>>>>>>>>this: > >>>>>>>>>> > >>>>>>>>>> Define bad memory leak? > >>>>>>>>>> > >>>>>>>>>> Updates are transactional so it may be a side effect of the > >>>>>>>>>>library > >>>>>>>>>> maintaining the state necessary to rollback the transaction > >>>>>>>>>>should > >>>>>>>>>>it > >>>>>>>>>>fail > >>>>>>>>>> or be aborted. Also the fact that you are replacing constant > >>>>>>>>>>nodes > >>>>>>>>>>with > >>>>>>>>>> blank nodes will assign a lot of new identifiers and those > >>>>>>>>>>identifiers > >>>>>>>>>> have to be tracked to prevent collisions. > >>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>PREFIX rr: <http://www.w3.org/ns/r2rml#> > >>>>>>>>>>>DELETE { ?map rr:graph ?value . } > >>>>>>>>>>>INSERT { ?map rr:graphMap [ rr:constant ?value ] . } > >>>>>>>>>>>WHERE { ?map rr:graph ?value } ; > >>>>>>>>>>> > >>>>>>>>>>>DELETE { ?map rr:object ?value . } > >>>>>>>>>>>INSERT { ?map rr:objectMap [ rr:constant ?value ] . } > >>>>>>>>>>>WHERE { ?map rr:object ?value } ; > >>>>>>>>>>> > >>>>>>>>>>>DELETE { ?map rr:predicate ?value . } > >>>>>>>>>>>INSERT { ?map rr:predicateMap [ rr:constant ?value ] . } > >>>>>>>>>>>WHERE { ?map rr:predicate ?value } ; > >>>>>>>>>>> > >>>>>>>>>>>DELETE { ?map rr:subject ?value . } > >>>>>>>>>>>INSERT { ?map rr:subjectMap [ rr:constant ?value ] . } > >>>>>>>>>>>WHERE { ?map rr:subject ?value } > >>>>>>>>>>> > >>>>>>>>>>>The full code is simply: > >>>>>>>>>>> > >>>>>>>>>>>var dataset = new InMemoryDataset(store, R2RMLMappings.BaseUri); > >>>>>>>>>>> ISparqlUpdateProcessor processor = new > >>>>>>>>>>>LeviathanUpdateProcessor(dataset); > >>>>>>>>>>> var updateParser = new SparqlUpdateParser(); > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>processor.ProcessCommandSet(updateParser.ParseFromString(Shortcu > >>>>>>>>>>>tS > >>>>>>>>>>>ub > >>>>>>>>>>>m > >>>>>>>>>>>a > >>>>>>>>>>>p > >>>>>>>>>>>sRe > >>>>>>>>>>>placeSparql)); > >>>>>>>>>>> > >>>>>>>>>>>Is this a know problem and has been already fixed or should I > >>>>>>>>>>>investigate closely? > >>>>>>>>>> > >>>>>>>>>> This is not a known issue, I would also guess that the data > >>>>>>>>>>being > >>>>>>>>>>used > >>>>>>>>>> would have some bearing on the severity of the problem. Please > >>>>>>>>>>go > >>>>>>>>>>ahead > >>>>>>>>>> and investigate but I would suspect it is the two things I > >>>>>>>>>>outlined > >>>>>>>>>>above > >>>>>>>>>> which are the culprits here. > >>>>>>>>>> > >>>>>>>>>> Rob > >>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>Thanks, > >>>>>>>>>>>Tom > >>>>>>>>>>> > >>>>>>>>>>>---------------------------------------------------------------- > >>>>>>>>>>>-- > >>>>>>>>>>>-- > >>>>>>>>>>>- > >>>>>>>>>>>- > >>>>>>>>>>>- > >>>>>>>>>>>--- > >>>>>>>>>>>---- > >>>>>>>>>>>Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, > >>>>>>>>>>>CSS, > >>>>>>>>>>>MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > >>>>>>>>>>>current > >>>>>>>>>>>with LearnDevNow - 3,200 step-by-step video tutorials by > >>>>>>>>>>>Microsoft > >>>>>>>>>>>MVPs and experts. ON SALE this month only -- learn more at: > >>>>>>>>>>>http://p.sf.net/sfu/learnmore_122712 > >>>>>>>>>>>_______________________________________________ > >>>>>>>>>>>dotNetRDF-bugs mailing list > >>>>>>>>>>>dot...@li... > >>>>>>>>>>>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>----------------------------------------------------------------- > >>>>>>>>>>-- > >>>>>>>>>>-- > >>>>>>>>>>- > >>>>>>>>>>- > >>>>>>>>>>- > >>>>>>>>>>------ > >>>>>>>>>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, > >>>>>>>>>>CSS, > >>>>>>>>>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills > >>>>>>>>>>current > >>>>>>>>>> with LearnDevNow - 3,200 step-by-step video tutorials by > >>>>>>>>>>Microsoft > >>>>>>>>>> MVPs and experts. SALE $99.99 this month only -- learn more at: > >>>>>>>>>> http://p.sf.net/sfu/learnmore_122412 > >>>>>>>>>> _______________________________________________ > >>>>>>>>>> dotNetRDF-bugs mailing list > >>>>>>>>>> dot...@li... > >>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs > >>>>>>>> > >>>>>>>>------------------------------------------------------------------- > >>>>>>>>-- > >>>>>>>>-- > >>>>>>>>- > >>>>>>>>- > >>>>>>>>- > >>>>>>>>---- > >>>>>>>>Minimize network downtime and maximize team effectiveness. > >>>>>>>>Reduce network management and security costs.Learn how to hire > >>>>>>>>the most talented Cisco Certified professionals. Visit the > >>>>>>>>Employer Resources Portal > >>>>>>>>http://www.cisco.com/web/learning/employer_resources/index.html > >>>>>>>>_______________________________________________ > >>>>>>>>dotNetRDF-bugs mailing list > >>>>>>>>dot...@li... > >>>>>>>>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>-------------------------------------------------------------------- > >>>>>>>-- > >>>>>>>-- > >>>>>>>- > >>>>>>>- > >>>>>>>---- > >>>>>>>Precog is a next-generation analytics platform capable of advanced > >>>>>>>analytics on semi-structured data. The platform includes APIs for > >>>>>>>building > >>>>>>>apps and a phenomenal toolset for data science. Developers can use > >>>>>>>our toolset for easy data analysis & visualization. Get a free > >>>>>>>account! > >>>>>>>http://www2.precog.com/precogplatform/slashdotnewsletter > >>>>>>>_______________________________________________ > >>>>>>>dotNetRDF-bugs mailing list > >>>>>>>dot...@li... > >>>>>>>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>--------------------------------------------------------------------- > >>>>>>-- > >>>>>>-- > >>>>>>- > >>>>>>---- > >>>>>>Precog is a next-generation analytics platform capable of advanced > >>>>>>analytics on semi-structured data. The platform includes APIs for > >>>>>>building > >>>>>>apps and a phenomenal toolset for data science. Developers can use > >>>>>>our toolset for easy data analysis & visualization. Get a free > >>>>>>account! > >>>>>>http://www2.precog.com/precogplatform/slashdotnewsletter > >>>>>>_______________________________________________ > >>>>>>dotNetRDF-bugs mailing list > >>>>>>dot...@li... > >>>>>>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>---------------------------------------------------------------------- > >>>>>-- > >>>>>-- > >>>>>---- > >>>>>Precog is a next-generation analytics platform capable of advanced > >>>>>analytics on semi-structured data. The platform includes APIs for > >>>>>building > >>>>>apps and a phenomenal toolset for data science. Developers can use > >>>>>our toolset for easy data analysis & visualization. Get a free > >>>>>account! > >>>>>http://www2.precog.com/precogplatform/slashdotnewsletter > >>>>>_______________________________________________ > >>>>>dotNetRDF-bugs mailing list > >>>>>dot...@li... > >>>>>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>>----------------------------------------------------------------------- > >>>>-- > >>>>----- > >>>> Precog is a next-generation analytics platform capable of advanced > >>>> analytics on semi-structured data. The platform includes APIs for > >>>>building > >>>> apps and a phenomenal toolset for data science. Developers can use > >>>> our toolset for easy data analysis & visualization. Get a free > >>>>account! > >>>> http://www2.precog.com/precogplatform/slashdotnewsletter > >>>> _______________________________________________ > >>>> dotNetRDF-bugs mailing list > >>>> dot...@li... > >>>> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs > >>> > >>>------------------------------------------------------------------------ > >>>-- > >>>---- > >>>Precog is a next-generation analytics platform capable of advanced > >>>analytics on semi-structured data. The platform includes APIs for > >>>building > >>>apps and a phenomenal toolset for data science. Developers can use > >>>our toolset for easy data analysis & visualization. Get a free account! > >>>http://www2.precog.com/precogplatform/slashdotnewsletter > >>>_______________________________________________ > >>>dotNetRDF-bugs mailing list > >>>dot...@li... > >>>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs > >> > >> > >> > >> > >> > >> > >>------------------------------------------------------------------------- > >>----- > >> Precog is a next-generation analytics platform capable of advanced > >> analytics on semi-structured data. The platform includes APIs for > >>building > >> apps and a phenomenal toolset for data science. Developers can use > >> our toolset for easy data analysis & visualization. Get a free account! > >> http://www2.precog.com/precogplatform/slashdotnewsletter > >> _______________________________________________ > >> dotNetRDF-bugs mailing list > >> dot...@li... > >> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs > > > >-------------------------------------------------------------------------- > >---- > >Precog is a next-generation analytics platform capable of advanced > >analytics on semi-structured data. The platform includes APIs for building > >apps and a phenomenal toolset for data science. Developers can use > >our toolset for easy data analysis & visualization. Get a free account! > >http://www2.precog.com/precogplatform/slashdotnewsletter > >_______________________________________________ > >dotNetRDF-bugs mailing list > >dot...@li... > >https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs > > > > > > > ------------------------------------------------------------------------------ > Precog is a next-generation analytics platform capable of advanced > analytics on semi-structured data. The platform includes APIs for building > apps and a phenomenal toolset for data science. Developers can use > our toolset for easy data analysis & visualization. Get a free account! > http://www2.precog.com/precogplatform/slashdotnewsletter > _______________________________________________ > dotNetRDF-bugs mailing list > dot...@li... > https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs > |