Menu

The list of experiments

1) NumPy
i) Different ways to create NumPy arrays
ii) Add, remove, modify elements in an array.
iii) Arithmetic operations on NumPy array
iv) Slicing and iterating of NumPy arrays
V) Matrix operations on NumPy arrays

2) Pandas
i) Create a data Frame manually
ii) Different ways of importing a data frame
iii) Adding, Deleting, Modifying the rows/columns in a dataframe.
iv) Apply functions on dataframe.
V) Iterations on dataframe
vi) Accessing the elements from a dataframe
Vii) Different ways to deal with NA’s in dataframe
viii) Groupby operations on dataframe
ix) Merging dataframes

3) Data Visualizations:
i) Line Graphs
ii) Scatter Plots
iii) Histograms
iv) Subplots
v) Join plots
vi) Heatmaps

4) Basic statistics for machine learning:
Consider a dataset. Apply the following statistical operations on it.
i) Central Tendency- Mean, Median, Mode
ii) Distribution of Data- Range, Interquartile range, Variance, Standard deviation, Correlation.
iii) Draw a box plot to demonstrate Range, Interquartile range.
iv) Show correlation between 2 variables using scatter plot.
V) Draw histogram to show how data is distributed for a given data.
Vi) For the given data, show which attributes as a) continuous b) Ordered c) Binary

5) Prediction
a. Consider a data set and perform univariate linear regression and find the coefficients. Show the relation between independent variable and dependent variable using scatter plot. Show the performance of the model using R-Square error, mean absolute error and Mean Square error
b. Consider a data set and perform multivariate linear regression and find the coefficients. Which attributes are mostly influencing the target variable? Show the performance of the model using R-Square error, mean absolute error and Mean Square error

6) Classification
Consider a dataset and apply following classifiers on it
i) KNN- Classifier
ii) Decision Tree
iii) SVM
iv) Logistic regression
Show the confusion matrix for every model.
Find the accuracy, sensitivity, specificity, F1 score of every model.
Compare the performance of all models

7) Clustering and feature reduction
Consider a dataset and apply the following
i) Apply K-means clustering on the data. Use Elbow method and find the optimal value of K.
ii) Apply Agglomerative clustering on the data. Use dendrograms.
iii) Apply PCA to reduce the number of features in a dataset.

8) Natural Language processing
a. Use NLTK package and perform the following
i) Tokenization
ii) Stemming
iii)Lemmatization
iv) Bag of words
v)TF/IDF
b. Given set of documents, use NLTK to classify them.

9) Ensemble methods.
Consider a dataset
i) Use ensemble method to combine predictive power of decision tree, Logistic regression using bagging technique
ii) Apply ADABOOST on the given data set and draw the confusion matrix for the strong classifier. Apply simple decision tree on the same dataset and compare the performance
iii) ) Apply Random Forest on the given data set and draw the confusion matrix for the strong classifier. Apply simple decision tree on the same dataset and compare the performance

10) Association analysis
For a given sales dataset, apply apriori algorithm to generate association rules which is able certain support and confidence.

11) Recommender system.
Use collaborative filtering technique and find similar movies based on the movies watched and rated by a user.

Posted by immidi kalipradeep 2020-09-12

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.