Data Analysis Hackathon Project 👩‍💻

Analyzing Misinformation in News using NLP and Data Visualization

This project was created for the CANIS Data Analysis Hackathon. The task was to create compelling data visualizations that would help others understand the survey results in a meaningful way. The dataset used in this project is available on Kaggle through this link.

Project Purpose ⛳

The purpose of this project is to analyze a dataset of news articles and identify the words that are most frequently used in true and fake news. Our goal is to gain insights into the language patterns that are associated with fake news and to develop techniques for detecting and combating misinformation.
To achieve this, we will perform data analysis and NLP techniques on the dataset, extract meaningful insights, and create data visualizations to present our findings in an understandable and meaningful way.
By identifying the key language features of fake news, we hope to contribute to the development of more effective strategies for combating the spread of misinformation and promoting the dissemination of accurate and reliable information.

Project Overview 👓

The project involves data analysis of the misinformation fake news text dataset. The project includes the following steps:

Preprocessing the dataset by cleaning, formatting, and transforming the data into a suitable format for analysis.
Performing Natural Language Processing (NLP) techniques, such as tokenization, stemming, and lemmatization, to extract meaningful information from the text data.
Identifying and removing stop words and other irrelevant information from the dataset.
Visualizing the results of the analysis using various tools, such as charts, graphs, and interactive dashboards, to make the insights more understandable and accessible to users.
Developing a web-based platform to present the findings of the analysis and enable users to interact with the data and explore the results in more detail.

Dependencies for Python Program 📦

The code for this project is written in Python and requires the following dependencies:

pandas
nltk
textblob
matplotlib
wordcloud

To install the dependencies, run the following commands:

pip install pandas
pip install nltk
pip install textblob
pip install matplotlib

After Python script that simplified the datasets based on word frequency. From there, our team of data analysts used advanced techniques and R code to analyze the data and identify patterns that could distinguish between fake and true news.

To create our fake news detector, we collected all possible word sets that were likely to be associated with fake news and used this data to make the detector. Now, if a news article contains a certain number of these words, our detector can quickly flag it as fake news. It's like having a trusty assistant by our side, helping us navigate the murky waters of information overload and stay vigilant against the spread of false information! 🕵️‍♀️📰