Limited time offer

Get 25% off your order

Use the code below at checkout — offer expires soon.

Your promo codeBESTW25
25%
Expires in: 10:00
Claim my 25% discount

🎓 Get 20% off your first order! Use code FIRST20 at checkout. Order Now →

Home In this project, students will implement a Naive Bayes Classifier (NBC) for sentiment analysis on a dataset containing reviews and their respective st

In this project, students will implement a Naive Bayes Classifier (NBC) for sentiment analysis on a dataset containing reviews and their respective st

Project Description:

In this project, students will implement a Naive Bayes Classifier (NBC) for sentiment analysis on a dataset containing reviews and their respective star ratings. The datasets, “train.csv” and “test.csv”, will be provided. A review with a 5-star rating will be considered positive, while all other ratings will be considered negative. Do not use any publicly available code-vour code will be checked against public implementations or Al- generated codes.

The project consists of three tasks:

Task 1: Feature Selection (10 points)

• Students will preprocess “train.csv” and select the top 1000 words (by frequency) as word features for their model. All other words will be ignored.

• Please print out the top 20-50 words from the selected features.

• Preprocessing Guideline:

a. Convert all text to lowercase.

b. Remove special characters.

c. Tokenize the text into words.

D. Remove stop words.

Task 2: Model Training and Evaluation (15 points)

• Using “train.csv” and “test.csv”, which they will use to train and evaluate their Naive Bayes Classifier with Laplace Smoothing

o Laplace Smoothing: Implement Laplace smoothing in the parameter estimation. For an attribute Xi with k values, Laplace correction adds 1 to the numerator and k to the denominator of the maximum likelihood estimate, o Evaluation measure: Accuracy

• Please describe your observations and provide an analysis of their model’s performance.

Task 3: Learning Curve Analysis (5 points)

• Students will plot a learning curve by varying the amount of training data used [10%, 30%, 50%, 70%, 100%]. The testing set will remain unchanged.

• For this plotting task only, students may use external plotting packages like the MatplotLib.

• Students will describe their observations and provide an analysis of the learning curve.

Deliverables:

1. Python code implementation of the Naive Bayes Classifier.

2. README file for executing your code.

3. PDF report

📝 Need Help With a Similar Assignment?

Our expert writers can deliver a 100% original, plagiarism-free paper tailored to your requirements with fast turnaround.

Get Professional Help Now →
WhatsApp
Limited Offer Get 25% off — use code BESTW25
No AI No Plagiarism On-Time Delivery Free Revisions
Claim Now