AI-THON
Mental Health Of INDIA
During COVID-19
TEAM : THE ELITE
Hiring Partners:
-
L&T Infotech
-
Lifevitae Singapore
-
EKO Informatics
Team : The Elite
Priyank Jha
PGP in Data Science @
Aegis
https://spotle.ai/Priyankjha1
Devleena Banerjee
Business Analytics @ IIM
Indore
https://spotle.ai/DevleenaBanerjee
Vidhya Subramaniam
Business
Analytics @ IIM Indore
https://spotle.ai/VidhyaSubramaniam
Chiranjeevi Karthik
Student @ Vardhaman
College
https://spotle.ai/Karthikchiranjeevi
Problem Statement
- Can we analyze the mental health of a person based on his twitter
usage?
- If yes, what are the factors that determine this?
- To what extent COVID-19 affected people`s mental
health?
Objective
- To come up with an effective methodology to analyze mental health
based
on
tweets.
- To understand what determines the emotion conveyed in a tweet.
- To gather insights on how
COVID-19 affected mental health based on tweets.
Observations : Labelling tweets
- % of tweets with emojis and hashtags which correspond to an emotion are less.
- Less correlation between emotions extracted from emojis,hashtags and polarity of
tweets.
- For every tweet we could extract whether a particular emotion is present in the
tweet or not.
Our Strategy
- The unsupervised approach using emotion
lexicons is relatively faster.
- We just need a single scan of the dataset to
label each tweet with a particular emotion.
- Thanks to the emotion lexicons, we could label each tweet with multiple emotions.
Modelling
- We have built 6 binary classification models, where each model
corresponds
to
a particular emotion.
- To build these models, Logistic regression was used over the vector
embeddings extracted using the lexicon database.
Conclusion
- Yes, we can determine the mental health of a person
using twitter usage.
- The overall emotions in a tweet are decided by the emotions of
individual words and not
hashtags or emojis.
- COVID-19 has definitely taken a toll on people`s mental health as fear and sadness seem to be
dominating their emotional state.
Limitations
- The predictions of our model are accurate only for tweets that use vocabulary similar to
that of our training set.
- If none of the words in a tweet are part of our training set vocabulary, then it is
implicitly labelled as neutral.
- To overcome the above limitations, we can train on a larger dataset.
Time of the day the tweets were posted
Emojis wordcloud
Hashtags wordcloud
Most used Hashtags
Number of tweets belonging to each emotion
Polarity
Correlation between emotions extracted from emojis v/s Polarity of the tweet
References
1. https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html
2.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop_duplicates.html
3. https://www.geeksforgeeks.org/handling-oserror-exception-in-python/
4.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.DateOffset.html
5. https://stackoverflow.com/questions/43146528/how-to-extract-all-the-emojis-from-text
6. https://emojis.wiki/
7. https://stackoverflow.com/questions/43145199/create-wordcloud-from-dictionary-values
8. https://textblob.readthedocs.io/en/dev/quickstart.html#sentiment-analysis
9. http://sentiment.nrc.ca/lexicons-for-research/
10. https://seaborn.pydata.org/generated/seaborn.pairplot.html
11. https://stackoverflow.com/questions/9897345/pickle-alternatives