Loading Events

« All Events

  • This event has passed.

2022-08-05 Group Webinar: Martin Canaan Mafunda, A hybrid tweet classification system based on Word2Vec and SVM algorithms

2022-08-05 @ 14:00 - 16:00

Abstract:

In this study, a hybrid tweet classification system built using Word2Vec and Support Vector Machines is presented. The aim of the study is to use the tweet classification system to solve the natural language processing task of automatically assigning labels to unseen tweets into three label attributes namely, anti-zuma, neutral and pro-zuma. Model building process was made up of 3 different phases, namely human tweet labeling, tweet encoding and SVM-based model training. Human tweet labeling involved the selection of 3100 tweets and allocate them to two human coders for labeling into one of three attributes named above. Tweet encoding involved the using the Word2Vec algorithm to train a word embedding based on all the tweets that constituted our dataset. The resulting high-dimensional word vector representations was then used to encode the labeled tweets before using the vectors for training the SVM-based tweet classifier. As a final step, we trained an SVM-based tweet classification system and used the model to predict classes for all unseen tweets that  made the bulky of our dataset. Our hybrid tweet classification model achieves an overall F1-score of 77% which is state-of-the-art considering the diversity and complex nature of language commonly used in tweets. Qualitative analysis demonstrated the ability of the model to correctly classify tweets written in Japanese.

Keywords: Word2Vec, Support Vector Machines, tweet encoding, tweet classification system

Details

Date:
2022-08-05
Time:
14:00 - 16:00
quantum.sun.ac.za

Organizer

F. Petruccione
Email:
petruccione@sun.ac.za

Venue

Teams
South Africa + Google Map
View Venue Website