|
From: Rob V. <rv...@do...> - 2013-04-12 18:59:03
|
Ok Can you push the commits up so I can pull them down and take a look at the new test cases Rob On 4/12/13 11:55 AM, "Tomasz Pluskiewicz" <tom...@gm...> wrote: >I've just committed more test cases. Out of the 6 none fail cause OOM >anymore, which is marvellous. > >However case1 reports false but I'm positive these graphs are actually >equal. > >Thanks, >Tom > >On Fri, Apr 12, 2013 at 8:33 PM, Rob Vesse <rv...@do...> wrote: >> Those would be useful >> >> Btw I closed the issue branch so please just add the tests to default >> >> Rob >> >> On 4/12/13 11:23 AM, "Tomasz Pluskiewicz" <tom...@gm...> >> wrote: >> >>>Hi Rob >>> >>>Thanks so much. And yes, I do have 4 or 5 cases which stumble on this >>>same issue. I will add all these to the test fixture. >>> >>>Tom >>> >>>On Fri, Apr 12, 2013 at 8:20 PM, Rob Vesse <rv...@do...> wrote: >>>> Hey Tom >>>> >>>> This should now be fixed for your test case though I am not 100% >>>>convinced >>>> that brute forcing is not still broken >>>> >>>> What I have done to fix this is to add an intermediate step between >>>>the >>>> rules based and brute force mapping which does a divide and conquer >>>> approach >>>> >>>> What this does is break the unmapped blank node portions of the graph >>>>into >>>> its constituent isolated sub-graphs (those that share no blank nodes) >>>>and >>>> then recursively calls Equals() on the candidate matches for the >>>> sub-graphs. This approach reduces the amount of work required and the >>>> likelihood of needing to brute force at all though we still fall back >>>>in >>>> the worst case. >>>> >>>> If you can come up with any more graphs that break GraphMatcher those >>>> would be much appreciated >>>> >>>> Rob >>>> >>>> On 4/12/13 10:25 AM, "Rob Vesse" <rv...@do...> wrote: >>>> >>>>>s/not/now >>>>> >>>>>That should be "the test will now complete within the timeout" >>>>> >>>>>Rob >>>>> >>>>>On 4/12/13 10:23 AM, "Rob Vesse" <rv...@do...> wrote: >>>>> >>>>>>Hey Tom >>>>>> >>>>>>So the logic for generating the brute force mappings was completely >>>>>>broken >>>>>>causing it to get stuck in a memory sucking spin cycle :( >>>>>> >>>>>>I rewrote the GenerateMappings() method from scratch to use yield >>>>>>return >>>>>>and the test will not complete within the timeout but it fails so I >>>>>>still >>>>>>need to dig further >>>>>> >>>>>>We may still be generating incorrect possible mappings or the logic >>>>>>for >>>>>>brute force may be flawed elsewhere >>>>>> >>>>>>Rob >>>>>> >>>>>>On 4/9/13 10:34 AM, "Rob Vesse" <rv...@do...> wrote: >>>>>> >>>>>>>Hey Tom >>>>>>> >>>>>>>The problem is that graph isomorphism is NP-hard so sometimes the >>>>>>>only >>>>>>>option we have is to attempt to brute force the problem >>>>>>> >>>>>>>I've started added some Debug.WriteLine() to GraphMatcher to track >>>>>>>down >>>>>>>where things go wrong >>>>>>> >>>>>>>For your graphs they may look trivially equal but to code they are >>>>>>>not, >>>>>>>the reason this worked prior to 0.8.0 is that one of the things we >>>>>>>try >>>>>>>is >>>>>>>a trivial mapping (assume blank nodes have same IDs in both graphs) >>>>>>>so >>>>>>>in >>>>>>>previous releases you would likely have hit this case and been fine. >>>>>>> >>>>>>>You have 33 blank nodes in the graph of which only 6 are uniquely >>>>>>>identifiable and mappable. The matcher generates a candidate >>>>>>>mapping >>>>>>>for >>>>>>>the whole graph but its best effort is incorrect, so then it falls >>>>>>>back >>>>>>>to >>>>>>>brute force. I need to dig further into whether the candidate >>>>>>>mapping >>>>>>>could be improved but this is not trivial to debug and will take >>>>>>>some >>>>>>>time >>>>>>>to resolve. >>>>>>> >>>>>>>We may be able to reduce the "memory leak" by using yield rather >>>>>>>than >>>>>>>pre-generating all possible mapping but this is a tricky refactor, >>>>>>>it's >>>>>>>been a long time since I wrote the code originally and I remember >>>>>>>that >>>>>>>doing the mapping in the yield form proved thorny at the time so I >>>>>>>chose >>>>>>>not to. The code itself for generating the mappings has some >>>>>>>slightly >>>>>>>strange things in it so I really need to spend a block of time >>>>>>>refreshing >>>>>>>myself on the logic there to check that it is sound before I attempt >>>>>>>to >>>>>>>refactor. >>>>>>> >>>>>>>Rob >>>>>>> >>>>>>>On 4/7/13 11:20 AM, "Tomasz Pluskiewicz" >>>>>>><tom...@gm...> >>>>>>>wrote: >>>>>>> >>>>>>>>Hm, I was wrong actually. >>>>>>>> >>>>>>>>I tried comparing the exact same graphs loaded from Turtle in >>>>>>>>dotNetRDF test project but I got the unit test wrong. >>>>>>>> >>>>>>>>I have added the CORE-345 bug and committed a failing test case >>>>>>>>[1]. >>>>>>>>Could you please have a look at this? >>>>>>>> >>>>>>>>Thanks, >>>>>>>>Tom >>>>>>>> >>>>>>>>[1]: >>>>>>>>https://bitbucket.org/dotnetrdf/dotnetrdf/commits/branch/CORE-345 >>>>>>>> >>>>>>>>On Sun, Apr 7, 2013 at 7:36 PM, Tomasz Pluskiewicz >>>>>>>><tom...@gm...> wrote: >>>>>>>>> Hi Rob >>>>>>>>> >>>>>>>>> I finally got back to R2RML to analyze why I am getting that >>>>>>>>>memory >>>>>>>>> leak. It seems connected to the changes you had to introduce for >>>>>>>>> SPARQL 1.1. >>>>>>>>> >>>>>>>>> I have determined that it happens in >>>>>>>>>GraphMatcher#GenerateMappings >>>>>>>>> method. The graphs are equal and I'm not sure what causes the >>>>>>>>>problem. >>>>>>>>> As soon as TryBruteForceMapping is reached memory consumption >>>>>>>>>explodes >>>>>>>>> to gigabytes within minutes. >>>>>>>>> >>>>>>>>> The low-level problem is the mappings variable in the >>>>>>>>> GenerateMappings, which within a few iteration contains thousands >>>>>>>>>of >>>>>>>>> elements. >>>>>>>>> >>>>>>>>> This problem no longer occurs on trunk. Have you actually been >>>>>>>>> introducing any fixes around that area? >>>>>>>>> >>>>>>>>> Tom >>>>>>>>> >>>>>>>>> On Mon, Jan 14, 2013 at 12:32 PM, Rob Vesse >>>>>>>>><rv...@do...> >>>>>>>>>wrote: >>>>>>>>>> Comments inline: >>>>>>>>>> >>>>>>>>>> On 1/10/13 7:14 PM, "Tomek Pluskiewicz" <to...@pl...> >>>>>>>>>>wrote: >>>>>>>>>> >>>>>>>>>>>Hi Rob >>>>>>>>>>> >>>>>>>>>>>I have just updated to latest dotNetRDF available on NuGet and >>>>>>>>>>>I'm >>>>>>>>>>>experiencing two issues. >>>>>>>>>>> >>>>>>>>>>>1. In my unit tests I relied on the way the library assigns >>>>>>>>>>>blank >>>>>>>>>>>node >>>>>>>>>>>identifiers: autos1, autos2 and so on. When I run the tests >>>>>>>>>>>separately >>>>>>>>>>>each one passes but when I batch them they fail because in >>>>>>>>>>>subsequent >>>>>>>>>>>tests blank nodes are name autos2, autos3, etc. However they >>>>>>>>>>>don't >>>>>>>>>>>share the same graph or triple store. Have you changed this >>>>>>>>>>>behavior >>>>>>>>>>>delbierately? >>>>>>>>>> >>>>>>>>>> Yes this behavior changed in the 0.8.x releases, the change was >>>>>>>>>>made >>>>>>>>>>in >>>>>>>>>> order to resolve a bug in SPARQL 1.1 Update support and also >>>>>>>>>>uncovered >>>>>>>>>>a >>>>>>>>>> bug in graph isomorphism calculation which was fixed. >>>>>>>>>> >>>>>>>>>> You shouldn't rely on an internal implementation detail like how >>>>>>>>>>the >>>>>>>>>> library assigns blank node identifiers. Blank nodes should >>>>>>>>>>always >>>>>>>>>>be >>>>>>>>>> identifiable by the triples they appear in so it should be >>>>>>>>>>possible >>>>>>>>>>to >>>>>>>>>> formulate API calls or SPARQL queries that validate that you >>>>>>>>>>have >>>>>>>>>>produced >>>>>>>>>> the data you expected. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>2. There is a bad memory leak in during SPARQL execution of >>>>>>>>>>>this: >>>>>>>>>> >>>>>>>>>> Define bad memory leak? >>>>>>>>>> >>>>>>>>>> Updates are transactional so it may be a side effect of the >>>>>>>>>>library >>>>>>>>>> maintaining the state necessary to rollback the transaction >>>>>>>>>>should >>>>>>>>>>it >>>>>>>>>>fail >>>>>>>>>> or be aborted. Also the fact that you are replacing constant >>>>>>>>>>nodes >>>>>>>>>>with >>>>>>>>>> blank nodes will assign a lot of new identifiers and those >>>>>>>>>>identifiers >>>>>>>>>> have to be tracked to prevent collisions. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>PREFIX rr: <http://www.w3.org/ns/r2rml#> >>>>>>>>>>>DELETE { ?map rr:graph ?value . } >>>>>>>>>>>INSERT { ?map rr:graphMap [ rr:constant ?value ] . } >>>>>>>>>>>WHERE { ?map rr:graph ?value } ; >>>>>>>>>>> >>>>>>>>>>>DELETE { ?map rr:object ?value . } >>>>>>>>>>>INSERT { ?map rr:objectMap [ rr:constant ?value ] . } >>>>>>>>>>>WHERE { ?map rr:object ?value } ; >>>>>>>>>>> >>>>>>>>>>>DELETE { ?map rr:predicate ?value . } >>>>>>>>>>>INSERT { ?map rr:predicateMap [ rr:constant ?value ] . } >>>>>>>>>>>WHERE { ?map rr:predicate ?value } ; >>>>>>>>>>> >>>>>>>>>>>DELETE { ?map rr:subject ?value . } >>>>>>>>>>>INSERT { ?map rr:subjectMap [ rr:constant ?value ] . } >>>>>>>>>>>WHERE { ?map rr:subject ?value } >>>>>>>>>>> >>>>>>>>>>>The full code is simply: >>>>>>>>>>> >>>>>>>>>>>var dataset = new InMemoryDataset(store, R2RMLMappings.BaseUri); >>>>>>>>>>> ISparqlUpdateProcessor processor = new >>>>>>>>>>>LeviathanUpdateProcessor(dataset); >>>>>>>>>>> var updateParser = new SparqlUpdateParser(); >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>processor.ProcessCommandSet(updateParser.ParseFromString(Shortcu >>>>>>>>>>>tS >>>>>>>>>>>ub >>>>>>>>>>>m >>>>>>>>>>>a >>>>>>>>>>>p >>>>>>>>>>>sRe >>>>>>>>>>>placeSparql)); >>>>>>>>>>> >>>>>>>>>>>Is this a know problem and has been already fixed or should I >>>>>>>>>>>investigate closely? >>>>>>>>>> >>>>>>>>>> This is not a known issue, I would also guess that the data >>>>>>>>>>being >>>>>>>>>>used >>>>>>>>>> would have some bearing on the severity of the problem. Please >>>>>>>>>>go >>>>>>>>>>ahead >>>>>>>>>> and investigate but I would suspect it is the two things I >>>>>>>>>>outlined >>>>>>>>>>above >>>>>>>>>> which are the culprits here. >>>>>>>>>> >>>>>>>>>> Rob >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>Thanks, >>>>>>>>>>>Tom >>>>>>>>>>> >>>>>>>>>>>---------------------------------------------------------------- >>>>>>>>>>>-- >>>>>>>>>>>-- >>>>>>>>>>>- >>>>>>>>>>>- >>>>>>>>>>>- >>>>>>>>>>>--- >>>>>>>>>>>---- >>>>>>>>>>>Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >>>>>>>>>>>CSS, >>>>>>>>>>>MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >>>>>>>>>>>current >>>>>>>>>>>with LearnDevNow - 3,200 step-by-step video tutorials by >>>>>>>>>>>Microsoft >>>>>>>>>>>MVPs and experts. ON SALE this month only -- learn more at: >>>>>>>>>>>http://p.sf.net/sfu/learnmore_122712 >>>>>>>>>>>_______________________________________________ >>>>>>>>>>>dotNetRDF-bugs mailing list >>>>>>>>>>>dot...@li... >>>>>>>>>>>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>----------------------------------------------------------------- >>>>>>>>>>-- >>>>>>>>>>-- >>>>>>>>>>- >>>>>>>>>>- >>>>>>>>>>- >>>>>>>>>>------ >>>>>>>>>> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, >>>>>>>>>>CSS, >>>>>>>>>> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills >>>>>>>>>>current >>>>>>>>>> with LearnDevNow - 3,200 step-by-step video tutorials by >>>>>>>>>>Microsoft >>>>>>>>>> MVPs and experts. SALE $99.99 this month only -- learn more at: >>>>>>>>>> http://p.sf.net/sfu/learnmore_122412 >>>>>>>>>> _______________________________________________ >>>>>>>>>> dotNetRDF-bugs mailing list >>>>>>>>>> dot...@li... >>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >>>>>>>> >>>>>>>>------------------------------------------------------------------- >>>>>>>>-- >>>>>>>>-- >>>>>>>>- >>>>>>>>- >>>>>>>>- >>>>>>>>---- >>>>>>>>Minimize network downtime and maximize team effectiveness. >>>>>>>>Reduce network management and security costs.Learn how to hire >>>>>>>>the most talented Cisco Certified professionals. Visit the >>>>>>>>Employer Resources Portal >>>>>>>>http://www.cisco.com/web/learning/employer_resources/index.html >>>>>>>>_______________________________________________ >>>>>>>>dotNetRDF-bugs mailing list >>>>>>>>dot...@li... >>>>>>>>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>-------------------------------------------------------------------- >>>>>>>-- >>>>>>>-- >>>>>>>- >>>>>>>- >>>>>>>---- >>>>>>>Precog is a next-generation analytics platform capable of advanced >>>>>>>analytics on semi-structured data. The platform includes APIs for >>>>>>>building >>>>>>>apps and a phenomenal toolset for data science. Developers can use >>>>>>>our toolset for easy data analysis & visualization. Get a free >>>>>>>account! >>>>>>>http://www2.precog.com/precogplatform/slashdotnewsletter >>>>>>>_______________________________________________ >>>>>>>dotNetRDF-bugs mailing list >>>>>>>dot...@li... >>>>>>>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>--------------------------------------------------------------------- >>>>>>-- >>>>>>-- >>>>>>- >>>>>>---- >>>>>>Precog is a next-generation analytics platform capable of advanced >>>>>>analytics on semi-structured data. The platform includes APIs for >>>>>>building >>>>>>apps and a phenomenal toolset for data science. Developers can use >>>>>>our toolset for easy data analysis & visualization. Get a free >>>>>>account! >>>>>>http://www2.precog.com/precogplatform/slashdotnewsletter >>>>>>_______________________________________________ >>>>>>dotNetRDF-bugs mailing list >>>>>>dot...@li... >>>>>>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>---------------------------------------------------------------------- >>>>>-- >>>>>-- >>>>>---- >>>>>Precog is a next-generation analytics platform capable of advanced >>>>>analytics on semi-structured data. The platform includes APIs for >>>>>building >>>>>apps and a phenomenal toolset for data science. Developers can use >>>>>our toolset for easy data analysis & visualization. Get a free >>>>>account! >>>>>http://www2.precog.com/precogplatform/slashdotnewsletter >>>>>_______________________________________________ >>>>>dotNetRDF-bugs mailing list >>>>>dot...@li... >>>>>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >>>> >>>> >>>> >>>> >>>> >>>> >>>>----------------------------------------------------------------------- >>>>-- >>>>----- >>>> Precog is a next-generation analytics platform capable of advanced >>>> analytics on semi-structured data. The platform includes APIs for >>>>building >>>> apps and a phenomenal toolset for data science. Developers can use >>>> our toolset for easy data analysis & visualization. Get a free >>>>account! >>>> http://www2.precog.com/precogplatform/slashdotnewsletter >>>> _______________________________________________ >>>> dotNetRDF-bugs mailing list >>>> dot...@li... >>>> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >>> >>>------------------------------------------------------------------------ >>>-- >>>---- >>>Precog is a next-generation analytics platform capable of advanced >>>analytics on semi-structured data. The platform includes APIs for >>>building >>>apps and a phenomenal toolset for data science. Developers can use >>>our toolset for easy data analysis & visualization. Get a free account! >>>http://www2.precog.com/precogplatform/slashdotnewsletter >>>_______________________________________________ >>>dotNetRDF-bugs mailing list >>>dot...@li... >>>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >> >> >> >> >> >> >>------------------------------------------------------------------------- >>----- >> Precog is a next-generation analytics platform capable of advanced >> analytics on semi-structured data. The platform includes APIs for >>building >> apps and a phenomenal toolset for data science. Developers can use >> our toolset for easy data analysis & visualization. Get a free account! >> http://www2.precog.com/precogplatform/slashdotnewsletter >> _______________________________________________ >> dotNetRDF-bugs mailing list >> dot...@li... >> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs > >-------------------------------------------------------------------------- >---- >Precog is a next-generation analytics platform capable of advanced >analytics on semi-structured data. The platform includes APIs for building >apps and a phenomenal toolset for data science. Developers can use >our toolset for easy data analysis & visualization. Get a free account! >http://www2.precog.com/precogplatform/slashdotnewsletter >_______________________________________________ >dotNetRDF-bugs mailing list >dot...@li... >https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs |