thanks for bringing the issue and providing a first set of questions
and answers about AURSR (so, Active User Refinement of Search
My contribution lies below.
On Thu, Sep 08, 2011 at 01:19:09AM -0400, David Mackey wrote:
> In this RFC I will briefly outline the case for adding active user
> refinements to Seeks. I'm going to try to keep this RFC as brief as
> possible, but can flesh it out as objections, questions, or suggestions
> are raised.
> Basic Features:
> * Ability to remove results from search queries.
> * Ability to move specific results up/down within a page.
> * Must remember on a per-user basis the results of active user
As a start, it is correct that Seeks does not currently fully
implement any of those basic features.
The first feature, 'Ability to remove results from search queries' is
partially implemented as follows:
- it is possible to 'reject' a result for a given query (and with no
effect on other, even similar, queries). Rejection means that the
system does not boost that result up (anymore). It does not mean that
the result will not appear in the list, as obtained from an externel search
engine or data source feed.
The reason for not eradicating a result once and for all for a given
query is that we would need a reverse mecanism in case of an
erroneous deletion occurs.
- it is possible to ban a result or a set of results throughout *all*
queries. This can be achieved by writing matching regexps and adding
them to the plugins/websearch/patterns/reject file.
Now, ability to move specific results up/down is not implemented
mainly because I am not convinced that it does make sense. Let me
explain. Wikipedia articles are built to slowly 'converge' onto a kind
of argumented truth (if that ever truely exists). Search instead is
less about the facts, and more about the user context, at least
IMO. As such, there isn't really an 'Oracle' for search, as there is
for the kind of truth Wikipedia is after. This means that a user could
spend a decent time actively ranking results, in a way that would not be
approved by others. In this case, there would still be the need for a
semi-passive re-ranking algorithm based on each user active ranking.
Btw, google did try this up/down active ranking, and latter backed down,
Additionnally, somme studies indicate that users tend to read groups
of results at once, e.g. the first five. And that ranking among those
groups does not impact their satisfaction and/or searches.
This is to justify why it is not *yet* implemented. The project is
totally open to such changes though.
Finally, per-user storage can only be achieved by running a Seeks node for
each user, as for now. This could change in the future, and proposals
on how to move in that direction are welcome.
> Why It Matters:
> * While a large percentage of users are only interested in providing
> passive (if any) feedback to search engine a small, but significant
> minority are interested in active content curation. These individuals can
> provide results categorization which is not easily or quickly acquired via
> algorithmic mechanisms.
True. Though I'm concerned with the amount of data users may be
facing. Which is a reflection of the number of users actively
involved. Attracting users nowadays is difficult. This requires
state-of-the-art and well calibrated UIs. Both are out of the scope of
Seeks right now. That said, this does not prevent us from implementing
a form of AURSR.
> * Individuals interested in active curation currently have very limited
> and unsatisfactory options for active curation. Most current options are
> deficient in numerous ways and many are being or have been disbanded. The
> availability of an open source alternative would provide content curators
> with confidence in the longevity of their work.
Yes, I agree.
> * Results for an actively curated engine can quickly outpace those of a
> machine-only or passive-feedback engine on popular terms as users are able
> to quickly populate the best results.
> Preventing the Inevitable:
> The largest challenge to such an endeavor occurs with success. With
> success comes the enticement for users to abuse the active content
> curation (or passive for that matter) in an attempt to force results in
> which they have a commercial interest to rise to the top. This can be
> controlled by using a meritocracy based system in which users earn
> influence with the demonstration of knowledge, ability, and integrity.
> User's active curation results should always take precedence for that user
> over other results (they may have valid reasons for desiring their results
> to appear at the top for their own queries), but an aggregate of trusted
> curation results in combination with traditional passive user behavior and
> metasearch aggregation and analysis will result in the best results.
I agree here also.
> Personal Note:
> I have personally worked with a number of social search engines in
> actively curating content. In every instance I have been disappointed with
> the short-term lifespan of my data due to commercial refocusing. If Seeks
> where to add such active content curation abilities to the software I
> would immediately begin curating content and providing refined results for
> numerous topics.
I do believe that!
So, I would love to have other users taking the time to give their
opinion here, before any move to that direction.
Though a first step would be to take advantage of new forthcoming API
and see what kind of API calls would be needed for even basic AURSR.
As for now, the API embeds a POST call,
that allows to recommend a new result for a given query. This result
does not need to come from an external source such as a search engine.
Here, the poster is the source.
Result ranking (up/down) could probably be achieved by attaching more
parameters to this call. Deletion is on its way.
Note that this would not affect results obtained from external search
engines and feeds, but only those obtained form the P2P ring.
Finally, behind AURSR lies the emergence of a meritocracy, much like
that of Wikipedia. This would require a new thread of
discussion. Briefly, I'm convinced that there is a need for a social
layer on top of Seeks itself. Such a layer would be a great step
forward. Though I believe the amount of work required is above our
Let me know your thoughts,