Text Summarization

A Simple Explanation - By Varsha Saini

In Text summarization, machine learning algorithms are built which reduce the size of the text data and create its summary without losing the semantic information.

Types of Text Summarization

There are two types of text summarization: Extractive approaches and Abstractive approaches.

Extractive Approaches

The extractive approach of text summarization follows the traditional approach in which the frequency of all the words is calculated. The top occurring words are filtered and the sentences which have these top words are retained in the summary.

Abstractive Approaches

The abstractive approach of text summarization is a more advanced approach in which the new shorter sentences are created having the same semantic meaning. It uses deep learning NLP models like GPT, BERT, Transformers, etc.

Below is a code from the Huggingface library using transformers to create a Text Summary:

Step 1: Install Library

pip install transformers

Step 2: Import Libraries

from transformers import T5ForConditionalGeneration, T5Tokenizer

Step 3: Input Content for Summarization

content=’Our home planet Earth is a rocky, terrestrial planet. It has a solid and active surface with mountains, valleys, canyons, plains and so much more. Earth is special because it is an ocean planet. Water covers 70% of Earth surface. Earth atmosphere is made mostly of nitrogen and has plenty of oxygen for us to breathe. The atmosphere also protects us from incoming meteoroids, most of which break up before they can hit the surface. Visit NASA Space Place for more kid-friendly facts.’

text = content.strip().replace("\n"," ")

text = "summarize: "+text

max_len = 512

Step 4: Create Tokens

summary_tokenizer = T5Tokenizer.from_pretrained('t5-base')

encoding = summary_tokenizer.encode_plus(text,max_length=max_len,pad_to_max_length=False,truncation=True,return_tensors="pt")

input_ids, attention_mask = encoding["input_ids"], encoding["attention_mask"]

Step 5: Create Model

summary_model = T5ForConditionalGeneration.from_pretrained('t5-base')

outs = summary_model.generate(input_ids=input_ids,attention_mask=attention_mask,
min_length = 75,max_length=300)

# generate summary
summary = [summary_tokenizer.decode(ids,skip_special_tokens=True) for ids in outs][0].strip()


“Earth is special because it is an ocean planet. water covers 70% of Earth’s surface and protects us from meteoroids and incoming meteorites. visit NASA Space Place for more kid-friendly facts. click here to learn more about our home planet, Earth. Earth has a solid and active surface with mountains, valleys, canyons and so much more.”