In Text summarization, machine learning algorithms are built which reduce the size of the text data and create its summary without losing the semantic information.
Types of Text Summarization
There are two types of text summarization: Extractive approaches and Abstractive approaches.
Extractive Approaches
The extractive approach of text summarization follows the traditional approach in which the frequency of all the words is calculated. The top occurring words are filtered and the sentences which have these top words are retained in the summary.
Abstractive Approaches
The abstractive approach of text summarization is a more advanced approach in which the new shorter sentences are created having the same semantic meaning. It uses deep learning NLP models like GPT, BERT, Transformers, etc.
Below is a code from the Huggingface library using transformers to create a Text Summary:
Step 1: Install Library
pip install transformers
Step 2: Import Libraries
from transformers import T5ForConditionalGeneration, T5Tokenizer
Step 3: Input Content for Summarization
content=’Our home planet Earth is a rocky, terrestrial planet. It has a solid and active surface with mountains, valleys, canyons, plains and so much more. Earth is special because it is an ocean planet. Water covers 70% of Earth surface. Earth atmosphere is made mostly of nitrogen and has plenty of oxygen for us to breathe. The atmosphere also protects us from incoming meteoroids, most of which break up before they can hit the surface. Visit NASA Space Place for more kid-friendly facts.’
text = content.strip().replace("\n"," ") text = "summarize: "+text max_len = 512
Step 4: Create Tokens
summary_tokenizer = T5Tokenizer.from_pretrained('t5-base') encoding = summary_tokenizer.encode_plus(text,max_length=max_len,pad_to_max_length=False,truncation=True,return_tensors="pt") input_ids, attention_mask = encoding["input_ids"], encoding["attention_mask"]
Step 5: Create Model
summary_model = T5ForConditionalGeneration.from_pretrained('t5-base') outs = summary_model.generate(input_ids=input_ids,attention_mask=attention_mask, early_stopping=True,num_beams=3,num_return_sequences=1,no_repeat_ngram_size=2, min_length = 75,max_length=300) # generate summary summary = [summary_tokenizer.decode(ids,skip_special_tokens=True) for ids in outs][0].strip() summary
“Earth is special because it is an ocean planet. water covers 70% of Earth’s surface and protects us from meteoroids and incoming meteorites. visit NASA Space Place for more kid-friendly facts. click here to learn more about our home planet, Earth. Earth has a solid and active surface with mountains, valleys, canyons and so much more.”