The Power of Transformers in Predicting Twitter Account Identities
Leveraging Large Language Models for Advanced NLP

Introduction
This project aims to build a model capable of predicting the identity of account from it's tweets. I will walk through the steps I have taken from data processing, to fine tuning, and performance evaluation of the models.
Before proceeding I would caveat that identity here is defined as male, female, or a brand. This in no way reflects my views on gender identity, this is simply a toy project demonstrating the power of Transformers for sequence classification. In some of the code snippets you may notice gender is being used where we are referring to identify, this is simply how the data arrived.
Approach
Due to the complex nature of text data, non-linear relationships being modelled I eliminated simpler methods and chose to leverage pretrained transformer models for this project.
Transformers are the current state-of-the-art for natural language processing and understanding tasks. The Transformer library from Hugging face gives you access to thousands of pre-trained models along with APIs to perform your own fine tuning. Most of the models have been trained on large text corpora, some across multiple languages. Without any fine tuning they have been shown to perform very well on similar text classification tasks including; sentiment analysis, emotion detection, and hate speech recognition.
I chose two models to fine tune along with a zero-shot model as a baseline for comparison.
Zero-shot learning gives a baseline estimate of how powerful a transformer can be without fine-tuning on your particular classification task.
Notebooks, Models & Repos
Due to computational cost I can't make the training scripts interactive. However, I have made the performance analysis notebook and models available to you. You can try the models yourself with live tweets!