Datasets
The dataset folder contains two subfolders.
- claim understanding: This benchmark dataset is annotated based on the taxonomy.
The file 'claims-annotated.csv' contains the following columns:
- tweet: the human or LLM-generated tweet.
- Jurisdiction: incdicates if election jurisdiction is specified.
- Jurisdiction - State: indicates if state level jursidiction.
- Jurisdiction - County: indicates if county level jursidiction.
- Jurisdiction - federal: indicates if federal elections are mentioned.
- Equipment: indicates if election equipments are mentioned.
- Equipment - Machines: indicates if the tweet mentions voting machines.
- Equipment - Ballots: indicates if the tweet mentions ballots.
- Processes: indicates if the tweet mentions any election process.
- Processes - Vote Counting: indicates if the tweet mentions vote counting process.
- Claim of Fraud: indicates if the tweet has a claim of fraud.
- Claim of Fraud - Corruption: indicates if the claim is related to corruption.
- Claim of Fraud - Illegal Voting: indicates if the claim is related to illegal voting.
- authorship attribution: This folder contains train (train.csv) and test(test.csv) files for the authorship attribution task.
The files contain the following columns:
- tweet: the human or LLM-generated tweet.
- label: the label denoting the human/LLM used to generate the tweet.
Code
The datasets and code can be accessed through the following link: Datasets
All the scripts are implemented using python.
Installation
Clone the repository from github using the link above as follows:
git clone https://github.com/LanguageTechnologyLab/ElectAI.git
Navigate to the GenOffense directory and install the required packages
cd ElectAI
pip install -r requirements.txt
Accessing Data
The files are a '.csv' files. The columns are tab separated. These file can be loaded in python using the pandas library as follows.
import pandas
pandas.read_csv(filename, sep="\t")
Claim Understanding
To evaluate an LLM for claim understanding, use the following command. You need to specify the name of the model.
python incontext.py MODEL_NAME
MODEL_NAME can be llama, falcon, mistral, flan
Authorship Attribution
To train/fine-tune a model for classification, run the following command. You must specify the name of the model. In case of Random forest, specify the features to be used
python trainer.py MODEL_NAME FEATURE_NAME
MODEL_NAME can be bert, roberta, rf
FEATURE_NAME can be tfidf, word2vec