The problem
Today, almost all websites have some sort of an RSS feed to keep you informed of their latest news.
On the other hand, users may be overloaded by all sorts of irrelevant information that arise from this trend.
The other part of the problem would be a magazine website with multiple areas of interest.
Such a magazine would report hardware as well as software news, Apple as well as Android information, etc.
But what if we are only interested in part of the information provided by RSS?
The solution
Some of the newest RSS readers started to address the issue by providing filtering options that can be used to filter out words, authors, categories or tags that do not interest you.
Their biggest disadvantage is their learning curve. You need to mark each and every part of each post for tags, authors, categories and words that makes that single post irrelevant to you.
jReader tries to address this problem by:
- employing an intelligent algorithms that automatically scans tags, categories, authors and titles of each post for repeating patterns
- allowing users to mark a full posting as irrelevant (this is where the algorithm kicks in)
- at the same time, allowing users to mark parts of the post that are always irrelevant (like a category, author or a tag)
The technology
jReader's ideology comes from the simple fact that everything has already been invented.
It would be a waste of time trying to reinvent the wheel, when we have these technologies coming to help:
- Google Feed API, a service which can:
- read just about any RSS data and generate a valid output from them
- when possible, serves a cached RSS data via a CDN network - making feed delivery lightning fast
- Yahoo! Pipes, serving as a solution for:
- condensing all user's feeds into one, while passing this data back to jReader's server for filtering and archival purposes. The server will then expose this data and pass it over to Google Feed API for faster delivery.
- situations when Google Feed API does not recognize the feed in question (like when user enters Web URL instead of Feed URL)
- jQuery JavaScript library, providing an excellent base for:
- display of feeds data directly on user's device from above-mentioned sources
- a jQuery Mobile framework used to deliver jReader experience to various mobile platforms
- MySQL database backend, storing:
- usernames and emails of all "registered" users
- basic information about user's predefined feeds and all posts
- flat-file storage, used to:
- store copies of posts, so their snippets can be displayed without putting additional strain to the Google / Yahoo! APIs or the RSS feed itself
The concept
Now that we have full technology tree listed, let's have a look at how my imagination puts jReader's internals together:
(also available as MindMap and Wireframe)
- upon first visit, user will choose a username and enter a valid e-mail
- subsequent sessions can be started by entering a username and then entering a random text sent to user's e-mail upon submitting username to the server
- this removes the need to store passwords or create a "forgotten password" mechanics and increases overall security
- user will start adding RSS feeds
- there will be a classic infinite stream, where articles from all feeds will load as user scrolls the page
- there will also be a hard limit to how many articles are displayed on page at once, so browser does not get overloaded by hundreds of articles
- this limit will be much lower for mobile devices than for desktop ones, due to memory restrictions
- once the limit is reached, certain number articles from top will be removed and only then a number of new articles will be added to the bottom of page
- the retrieval of new posts will work in the following manner:
- a JSONP call to Google API will be made to retrieve all feeds condensed via a Yahoo! Pipe and filtered for irrelevant data
- once data is received, it will be compared to data shown in the browser and updated as necessary
- there will be a link displayed for each post, leading to a list of all archived posts from that post's feed
- retrieval of archived posts will be handled exclusively from jReader's server
- the only exception will be when the server cannot be reached, in which case only data available via Google API will be shown
- archived posts per feed will also be accessible via fixed top/bottom toolbar, where there will also be links for adding/managing feeds and user's account data (e-mail, notifications, fetch frequency)
- for each post, two filtering icons will be available
- first icon will mark full post as irrelevant
- doing so will call a PHP script that will analyze this post's data in this way:
- post author, categories and tags will be compared to previously marked posts. The more matches with previous posts will be found, the higher score will be assigned to these elements.
- post title will be analyzed for keywords. Each keyword will be also compared with keywords previously marked as irrelevant in post titles form this feed. Keywords matched in this post will have their overall score increased.
- post is marked as irrelevant, so it will no longer be displayed (and is hidden from the front-end as well)
- second icon will allow for advanced options, such as:
- marking each of the post's parts (tag, author, categories, text in title) as irrelevant
- these options will raise score of such elements, so future posts containing them will not be presented to the user anymore