Shahnawaz Ahmed - 2015-03-04

Hi,
I am Shahnawaz Ahmed from BITS Pilani Goa Campus, India. I am interested in the idea "Python tagger for multiword expression lexicon." I have worked on several projects related to text processing and classification which involved data scraping from web pages and structuring them to extract meaningful information. I will mention two relevant projects :

  1. Automated Case - List SMS service :
    This involved tagging cases from a list issued daily http://goo.gl/v7CLV4 using regular expressions and classifying them based on lawyer names and then sending a list of court cases for the next day. It required text splicing, catching specific strings sandwiched between keywords and constructing a database of different cases tagged by the lawyer names involving that case.

  2. Data extraction from 99acres.com :
    For an analysis regarding the price fluctuations of real estate in a city, I scraped html pages from 99acres.com to construct a data matrix with 5 year data of housing prices. The data was available only as points on a graph and had to be pruned to convert into a numpy array.
    http://goo.gl/FPPA2Z

I like to work in python but I am also proficient in C and Java. I am currently trying out the MBSP library as mentioned in the ideas page. I also have experience in machine learning (Neural Networks, SVM) and classification algorithms. This might be helpful if you would like to implement a learning algorithm for improving tagging. I would love to discuss this further.

Email : shd339@gmail.com
irc : sahmed95

 

Last edit: Shahnawaz Ahmed 2015-03-04