Re: [Plone-developers] publishTraverse, acquisition and multiple urls for the same content

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Okay, I'll take a short crack on this with some thoughts.

- The traversal side affects shouldn't matter unless you're directly
linking to them? How come your links are getting messed up?
- Changing zope2 traversal will likely have a lot of side affects. That
being said, it's unpredictable and unusual behavior for most people. There
is a lot of history and a lot of code that perhaps depends on it. Shooting
from the hip, tightening up traversal is something that may be possible in
the next major release of Plone(6). However, maybe by then we have some
ideas on moving the traversal over to pyramid? :)
- some people find acquisition powerful and use it to solve problems in
some wonky ways...
- I would think any changes like this to traversal are potentially very
high risk with low benefits. This might not be an important problem to
tackle for you right now :(

Anyways, just some thoughts...

-Nathan

On Tue, Aug 19, 2014 at 4:02 AM, Mauro Amico <mau...@gm...> wrote:

> I want to share a problem that I have with ''publishTraverse'' and
> ''acquisition''.
>
> The Problem
> -----------
>
> My problem with “acquisition” and publishTraverse is that the current
> method returns too many different URLs for the same content. For instance
> here is some potential url for the “kb” page of the plone.org website
>
> https://plone.org/documentation/kb
> https://plone.org/documentation/manual/kb
> https://plone.org/documentation/kb/manual/kb
> https://plone.org/documentation/manual/spinner.gif/kb
> ...
>
> and here is a generic "Plone" site with two content items "a" and "b"
> (folderish or not)
>
> http://example.com/Plone/a
> http://example.com/Plone/a/b/a
> http://example.com/Plone/a
> http://example.com/Plone/b/a
> ...
>
> All the urls above returns 200 with the same content, while I would like
> the "canonical url" to return 200 and the other to return 404.
>
> The behaviour described above constitute a problem because:
>
> * multiple url for the same content is a problem for SEO and is confusing
> to
>   people. For SEO, in the latest versions Plone introduced the canonical
> META,
>   but IMHO it's just a workaround. People are confused. For example:
> sometimes
>   some of my editors ask me: "I can't remove the
> http://example.com/Plone/a/b/a/
>   page. Can you do it for me?"
>
> * the page doesn’t seem really the same on all urls: if you open
>   https://plone.org/documentation/kb and
>   https://plone.org/documentation/manual/kb the second has a
>   portlet that the first is missing
>
> * removing page from external cache (varnish or squid), for example after a
>   content modification, will be a pain, because for the same content there
>   could be multiple urls without any control or rules
> (collective.purgebyid
>   solve this)
>
> * when using subsite (or multiple plone site on the same zope app) the
> problem is
>   even more annoying: suppose that "a" is a subsite (marked with
>   INavigationRoot) for http://a.example.org and "b" for
> http://b.example.org,
>   opening the url http://a.example.org/b will probably show the homepage
> of site
>   "a" inside the "b" site (collective.siteisolation and probably
> collective.lineage do
>   something to isolate subsite, but IMHO again are only workarounds)
>
> Are there other people with the same doubts and problems?
>
> Does anybody have a good and stable solution for that?
>
> My analysis
> -----------------
>
> I tried to look in depth and identified a possible source of the problem
> mentioned in:
>
>
> https://github.com/zopefoundation/Zope/blob/2.13.21/src/ZPublisher/BaseRequest.py#L122
>
>                 # And lastly, of there is no view, try acquired
> attributes, but
>                 # only if there is no __bobo_traverse__:
>                 try:
>                     subobject=getattr(object, name)
>                     # Again, clear any error status created by
> __bobo_traverse__
>                     # because we actually found something:
>                     request.response.setStatus(200)
>                 except AttributeError:
>                     pass
>
>
> I found many solutions (like collective.siteisolation) that work on higher
> level
> with IPublishTraverse adapter, but in my opinion the problem is with all
> traversing
> (e.g. https://plone.org/documentation/manual/spinner.gif/kb), so I think
> that at the end the best solution could be to modify the default traverser
> (or something
> like that).
>
> In a site in production using Plone 4.2, all content are Dexterity, no
> portlet, I added a log:
>
> +    import logging
> +    logger = logging.getLogger('analyze.publishTraverse')
>
> ...
>                 # And lastly, of there is no view, try acquired
> attributes, but
>                 # only if there is no __bobo_traverse__:
>                 try:
>                     subobject=getattr(object, name)
> +                    logger.warning("obj:%r name:%r meta_type:%r",
> +                        object, name, getattr(aq_base(subobject),
> 'meta_type, '-')
> +                    )
>                     # Again, clear any error status created by
> __bobo_traverse__
>                     # because we actually found something:
>                     request.response.setStatus(200)
>                 except AttributeError:
>                     pass
>
> After three weeks I checked the logs: some wrong urls that I preferred
> they responded 404 and many "false positive", which were fortunately all
> well known: portal_skins object("FileSystem Script", ...), Registry object
> ("portal_css", "portal_javascript", ...) and the "Virtual Host Monster"
> object.
>
> Probably I will extend logging period up to the second week of September,
> after that I'm thinking to monkey patch the method with something like:
>
>                 # And lastly, of there is no view, try acquired
> attributes, but
>                 # only if there is no __bobo_traverse__:
>                 try:
>                     subobject=getattr(object, name)
> +                    meta_type = getattr(aq_base(subobject), 'meta_type,
> None)
> +                    if meta_type.startswith('Dexterity ') or meta_type ==
> 'Plone Site':
> +                        subobject = None
> +                        raise AttributeError
>                     # Again, clear any error status created by
> __bobo_traverse__
>                     # because we actually found something:
>                     request.response.setStatus(200)
>                 except AttributeError:
>                     pass
>
> Or
>
>                 # And lastly, of there is no view, try acquired
> attributes, but
>                 # only if there is no __bobo_traverse__:
>                 try:
>                     subobject=getattr(object, name)
> +                    meta_type = getattr(aq_base(subobject), 'meta_type,
> None)
> +                    if meta_type.startswith('Filesystem '):
> +                        pass  # e.g. object inside portal_skins
> +                    if meta_type.endswith(' Registry'):
> +                        pass  # e.g. portal_css
> +                    if meta_type == 'Virtual Host Monster:
> +                        pass  # e.g. VHM
> +                    else:
> +                        subobject = None
> +                        raise AttributeError
>                     # Again, clear any error status created by
> __bobo_traverse__
>                     # because we actually found something:
>                     request.response.setStatus(200)
>                 except AttributeError:
>                     pass
>
>
> Opinions?
> Ideas?
> Better solutions (I really don’t like monkey patch Zope2’s ZPublisher)?
>
> Thanks for the patience to read until here.
>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Plone-developers mailing list
> Plo...@li...
> https://lists.sourceforge.net/lists/listinfo/plone-developers
>
>

-- 
Nathan Van Gheem
Solutions Architect
Wildcard Corp