CommonGround Code

An algorithm to bypass filter bubbles

Brought to you by: petebleackley

Tree [r11] /

History

HTTPS access

File	Date	Author	Commit
CommonGround.py	2017-01-27	petebleackley	[r11] Fixed addressing issues
Licence.txt	2016-12-13	petebleackley	[r1] Initial release of CommonGround reference imple...
README.md	2017-01-13	petebleackley	[r5] Switched to NLTK implementation of VADER sentim...
requirements.txt	2017-01-13	petebleackley	[r5] Switched to NLTK implementation of VADER sentim...

Read Me

The Common Ground Algorithm

This project is the reference implementation of the Common Ground algorithm, which is intended to provide a means of bypassing filter bubbles in social media recommendation systems.

The Algorithm

Purpose

Confirmation bias, the tendency of people to seek out opinions they find agreeable and avoid those they find disagreeable, can be trained into a recommendation system, leading to the phenomenon known as a filter bubble, where social media will show people only what they agree with to start with, thus reinforcing their entrenched opinions. To counteract this effect, it is desirable to create a recommendation system that will connect people whose opinions differ.

However, simply exposing people to content from those with opposing opinions will not in itself overcome confirmation bias - indeed, it risks reinforcing it by provoking hostile reactions. Therefore, it is necessary to identify common ground between such people - those topics on which they are likely to agree with each other despite their overall differences.

The Attitude Vector

The algorithm represents a user's opinions in terms of an Attitude Vector.

Consider a set of Documents, $<msub>D i</msub>$ , posted by a given user. Each of these may be associated with a topic vector

<msub>T ij</msub>

and a sentiment
$<msub>S i</msub>$

. The means by which these are calculated is specific to the implementation. A sentiment is defined to be positive when the emotions expressed in the document are

The Attitude Vector is then defined as $A_{j} = S_{i} \cdot T_{ij}$ .

Sympathy

The sympathy between two users u and v is defined as the cosine similarity of their Attitude Vectors, ie

{Sym}_{uv} = \frac{A_{u} \cdot A_{v}}{| A_{u} | | A_{v} |}

.
This is positive when the users' opinions generally agree, and negative when the generally disagree.

The first stage of finding recommended content for a user is to filter the set of candidate users to those with whom the given user's sympathy score is negative.

The Common Ground Vector

For two users u and v, the Common Ground Vector is defined as the element-wise product of their Attitude Vectors, ie $C_{uvj} = A_{uj} A_{vj}$ . Positive components of this vector correspond to subjects on which users are likely to find common ground.

The Recommendation Score

Given a set of Documents posted by users v who have negative sympathy scores with respect to user u, the Recommendation Score is defined as

R_{vi} = - {Sym}_{uv} T_{vij} \cdot C_{uvj}

A high value of this indicates a document written by a person the given user is likely to disagree with overall, but which primarily concerns topics on which that user is likely to agree with them.

The Reference Implementation

The Reference implementation can be found in the file CommonGround.py. It is written in Python 2.7 and uses the following libraries.

Topic Modelling

Topic modelling is perfomed with gensim, using Latent Semantic Indexing with 256 topics. The topic model is initialised with a training corpus.

Documents are tokenized and optionally passed through a preprocessing stage. The tokenized document is then converted to a list of (token,TFδH) values,
defined as

TF δ H = \frac{N_{w}}{N} \sum_{D} \log_{2} (\frac{P (D ∣ w)}{P (D)})

where

<msub>M w</msub>

is the number of occurrences of token w in the document,
N is the total number of tokens in the document
P(D|w) is the probability that a instance of w randomly selected from the training corpus is found in document D
P(D) is the probability that a token randomly selected from the training corpus is found in document D.

Sentiment Analysis

Sentiment Analysis is carried out using nltk.sentiment.vader. Each document is split into sentences, and the sum of the compound scores for each sentence in the document is calculated.

API

class CommonGround(object):

Members

Topics A pandas.DataFrame containing the topic vectors for each document.
Sentiments A pandas.DataFrame containing the sentiment score for each document.
Attitudes A pandas.DataFrame containing the Attitude Vector for each user.
modulus A pandas.Series containing the magnitudes of the Attitude Vectors.
dH A pandas.Series containing the δ H weights used for weighting the topic model.
dictionary A gensim.corpora.dictionary.Dictionary object mapping words to token ids.
model A gensim.models.LsiModel object which performs topic modelling (Latent Semantic Indexing).
preprocess An optional callable which preprocesses documents prior to topic modelling.

Methods

    def __init__(self,training_corpus,processing_pipeline=None):
         """sets up data structures
            training_corpus is an iterable of documents
            preprocessing_pipeline is a optional callable that performs tasks
            such as stemming, POS tagging and word sense disambiguation on a
            tokenized document"""

Parameters

training_corpus is used to initialise the topic model. It is an iterable containing documents.
processing_pipeline (optional) is a callable used to perform preprocessing on documents prior to topic modelling. This may involve stemming, tagging, word sense disambiguation, or similar. It should accept a list of strings and return a list of hashable objects.

    def __call__(self,user,n=10):
        """Finds content from people that the user normally disagrees with
           that reflects his/her common ground with those people"""

Parameters

user a hashable object identifying a user for whom recommendations are to be found
n (optional, default=10) is the number of results to return

Returns

A pandas.Series indexed with (user,uri) for the n documents with the highest Recommedation Score for the given user, and containing their Recommedation Scores.

    def add_document(self,user,uri,document):
        """Calculates the sentiment score, and topic vector for the document,
           and updates the users's attitude vector"""

Parameters

user A hashable object identifying the user who posted the document
uri A string representing the uri of the document
document The document itself (a string)

    def SetupDH(self,training_corpus):
        """Sets up the deltaH weights used for topic modelling"""

Called by __init__, using the training_corpus.

    def tokenize(self,document):
        """For a document, returns a list of tokens.
           For a corpus (list of documents), returns a list of tokenized documents.
           Performs preprocessing if a pipeline was passed in the constructor."""

Parameters

document A string representing a document or a list of strings representing a corpus of documents

Returns

A list of (optionally preprocessed) words if document was a string, or a nested list if it was a list.

    def get_features(self,document,is_corpus=False):
        """Extracts features, weighted by TFdH for a document or corpus"""

Parameters

document A list of (optionally preprocessed) words representing a document, or a nested list of these representing a corpus
is_corpus (optional,default False) indicates whether document represents a corpus

Returns

A list of (token,TFdH) tuples representing the document, or a nested list of these representing the corpus

Licence

This software is released under the MIT licence

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

CommonGround Code

An algorithm to bypass filter bubbles

Tree [r11] / Download Snapshot History

Read Me

The Common Ground Algorithm

The Algorithm

Purpose

The Attitude Vector

Sympathy

The Common Ground Vector

The Recommendation Score

The Reference Implementation

Topic Modelling

Sentiment Analysis

API

Members

Methods

Parameters

Parameters

Returns

Parameters

Parameters

Returns

Parameters

Returns

Licence

Tree [r11] /

History