Whatsapp group chat analytics

GitHub Check out the code on GitHub


This notebook takes your Whatsapp group chat and parses all the messages to show interesting insights.

This is just a preview. Check out the full product here

  • It breaks up messages by group members MemberOverview
  • Analyzes sentiment of messages (and of course at an aggregate level too)
  • Provides time-based analysis of conversation volumes and emotions (using google’s industry standard cloud NLP API) MemberOverview
  • Seperates out emojis used
  • Seperates out individuals tagged in messages
  • Handle emoticons and common contractions

MemberOverview

What you need to run it yourself:

  • A python/jupyter installation (run the .py/ .ipynb file respectively) or you can simply use an online tool like Colab.
  • A google cloud account setup. Use this basic intro of GCP components and then follow these steps (till ‘use key in your env’) if you are new to GCP.

Input format

Export chats from Whatsapp mobile application. This will give you a file with a name like WhatsApp Chat with Sushil Khairnar.txt

17/06/20, 20:05 - Aditya Daftari Coep:

17/06/20, 20:05 - Sim-rum Melvani: Hahaha feels like ages ago

17/06/20, 20:06 - Aditya Daftari Coep: Yes 😂

17/06/20, 20:06 - Sim-rum Melvani: Kya photo hai😂

17/06/20, 20:07 - Aditya Daftari Coep:

17/06/20, 20:07 - Aditya Daftari Coep: Sponge Bob with the sponge ball😂

17/06/20, 20:07 - Anoushka Kundu: 😂😂😂😂 look at his fists

17/06/20, 20:07 - Aditya Daftari Coep: 😂

17/06/20, 20:07 - Anoushka Kundu: SonjBob

17/06/20, 20:08 - Aditya Daftari Coep: No

17/06/20, 20:08 - Aditya Daftari Coep: Sponge Bob bhi bolte the hum 😂

17/06/20, 20:08 - Anoushka Kundu: But there lies an opportunity to call him sonjbob 😂

Comments

The cleaned message is the one sent to Google’s natural language API for classification. Thus, that’s what sentiment score and magnitude are based on.
Cleaned message + (Links + Emoticons + Emojis* + Tagged people + Garbage characters) = Original message *Emojis are conected to their text equivalent in the cleaned message. Eg: 😓 –> downcast_face_with_sweat

  • Removed system messages like ’..has left the group’, ‘.. changed the description’, ‘.. changed the group icon’ etc..
  • Ignored <media ommitted> that corresponds to images ad GIFs
  • Extract URLs, emojis, emoticons from message.

Future scope

  • Tagged members are mentioned by mobile number. Eg Hey @919167023114, are you going for the party?. A simple function to map this number to your mobile’s contacts (export as csv) can reveal name of the person and provide another interesting dimension to the analysis about who tags whom etc.

Limitations

  • Whatsapp does not provide ‘reply context’. i.e. if a message B is in a repsonse to message A, there is no information about this in the .txt that you can export.
  • Only recent media can be exported. Cannot perform straightforward analysis of media. (Might need to export directly from Whatsapp’s mobile storage) and tie it up with chat.txt.
  • A significant number of messages are hindi words described in English. These cannot be analysed for sentiment (Although if this can be converted into devnagiri script, then it is easily possible)

Check out my other projects!🌟


I'd love to know what you thought of this