Implementation of Longformers

Let us use the long-transformer to classify the IMDB review dataset into positive or negative. This code was run successfully in google colab using T4 GPU. We will train it on 400 reviews and then use it to classify a new review. Using 200 train data and training epoch of 2 we were able to achieve accuracy of 90 % . If we use entire dataset and train for more epochs we can achieve much better accuracy.

First we need to install the dependencies

# Installing dependencies
!pip install transformers
!pip install datasets
!pip install transformers[torch]
!pip install evaluate
!pip install torch
!pip install accelerate

We will use the hugging face model ‘allenai/longformer-base-4096’. Loading the tokenizer .

Python3

from transformers import AutoTokenizer
  
tokenizer = AutoTokenizer.from_pretrained("allenai/longformer-base-4096")

                    


Loading the IMDB dataset and applying the tokenizer.

Python3

import datasets
# here i have taken the first 200 and last 200 reviews for demo
train_ds = datasets.load_dataset("imdb", split="train[:200]+train[-200:]")
test_ds = datasets.load_dataset("imdb", split="test[:100]+test[-100:]")
  
  
def preprocess_function(examples):
    return tokenizer(examples["text"], truncation=True)
  
  
train_tokenized_imdb = train_ds.map(preprocess_function, batched=True)
test_tokenized_imdb = test_ds.map(preprocess_function, batched=True)

                    


Creating label index and loading the model.

Python3

from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer
  
id2label = {0: "NEGATIVE", 1: "POSITIVE"}
label2id = {"NEGATIVE": 0, "POSITIVE": 1}
  
model = AutoModelForSequenceClassification.from_pretrained(
    "allenai/longformer-base-4096"
  num_labels=2
  id2label=id2label, 
  label2id=label2id
)

                    


Using Data Collator to create batches of tokenized input with padding during run time. Also creating a compute_metric function for evaluation of test data during training. This function will calculate the accuracy of our classification.

Python3

import numpy as np
import evaluate
from transformers import DataCollatorWithPadding
  
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
  
  
accuracy = evaluate.load("accuracy")
  
  
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return accuracy.compute(predictions=predictions, references=labels)

                    


Training the model

The training_args contain the hyperparameters. You can modify them as per your requirement.

Python3

from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer
  
training_args = TrainingArguments(
    output_dir="sequence_classification",
    learning_rate=2e-5,
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    num_train_epochs=2,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    # push_to_hub=True,
)
  
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_tokenized_imdb,
    eval_dataset=test_tokenized_imdb,
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)
  
trainer.train()

                    

Output:


Once the model is trained and saved we can use it for inferencing on new data. Inferencing from the trained model.

Python3

import torch
# here give the path of checkpoint which was saved durign training
model = AutoModelForSequenceClassification.from_pretrained(
    "/content/sequence_classification/checkpoint-200")
  
text = "This was an awesome movie."
inputs = tokenizer(text, return_tensors="pt")
  
with torch.no_grad():
    logits = model(**inputs).logits
  
predicted_class_id = logits.argmax().item()
model.config.id2label[predicted_class_id]

                    

Output:

'POSITIVE'


Longformer in Deep Learning

Transformer-based models are really good at understanding and processing text, but they struggle when the text is very long. To address this issue, researchers developed a device known as the “longformer.” It’s a modified Transformer meant to operate well with extremely lengthy bits of text. It accomplishes this by altering how it perceives words.

For the understanding of this article, we will take a running example of a task. Let’s say we want to classify a review written on the Geeks for Geeks website. The length of this review is 1000 words. Since it’s not practical to fit all the words of the review in the article at all the places, we will take a short representation of the review so that it becomes easy to comprehend the concepts presented. Let the review be “I love Geeks for Geeks”.

Similar Reads

LongFormers

Longformers are neural networks designed specially to process and understand long sequences of text or other data. They are able to handle very long sequences and documents with thousand words, without experiencing the computational challenges that Transformers face....

Quadratic Scaling in Self Attention

Lets calculate the number of operation to be done for calculation of output of self...

Attention Mechanism in Longformers

The long transformer proposed 4 types of self attention as shown in below diagram:...

Implementation of Longformers

Let us use the long-transformer to classify the IMDB review dataset into positive or negative. This code was run successfully in google colab using T4 GPU. We will train it on 400 reviews and then use it to classify a new review. Using 200 train data and training epoch of 2 we were able to achieve accuracy of 90 % . If we use entire dataset and train for more epochs we can achieve much better accuracy....

Contact Us