Transformer Models for Classification on Health-Related Imbalanced Twitter Datasets

Published in SMM4H Workshop, NAACL-HLT, 2021

We present a system that addresses classic health-related binary classification problems presented in Tasks 1a, 4, and 8 of the 6th edition of Social Media Mining for Health Applications (SMM4H) shared tasks. We developed a system based on RoBERTa (for Task 1a & 4) and BioBERT (for Task 8). Furthermore, we address the challenge of the imbalanced dataset and propose techniques such as undersampling, oversampling, and data augmentation to overcome the imbalanced nature of a given health-related dataset.

Shared Tasks / CodaLab Page

Paper / Colab Notebooks / Poster / Presentation / Link

Recommended citation: Varad Pimpalkhute, Prajwal Nakhate and Tausif Diwan, &quote;Transformer Models for Classification on Health-Related Imbalanced Twitter Datasets,&quote; in Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task, pp. 118–122, 2021.

Leave a Comment