CUTTING EDGE
Text Summarization

This page was last updated on July 20, 2004


Cuttingedge Online 
      Home
      Project Page
      Sitemap

 Downloads                  
      Version 1.1
      Help

 Project Brief             
     
About Project
      Design details
      Presentations
Future Work

 Interact!                      
      Contact Us
      Discussion Forum
      Bug Reports

 Developers                 
Programmers

    PROJECT BRIEF

The Objective

Cutting edge aims at forming the extract of an arbitrary text that provides its summary. The text is assumed to be following text progression pattern.

The Algorithm

It is based on the algorithm described in the paper 'Using Lexical Chains for Text Summarization' by Regina Barzilay and Michael Elhadad.

The essence of text summarization lies in the formation of lexical chains, identifying strong chains and finally extracting the sentences that relate to the words in these strong chains.

The Tools

WordNet Thesaurus
fnTBL - POS tagger and Chunker
Segmentation Algorithm

The Design Methodology

Given an input technical report, we proceed to first segment the text, using the segmentation algorithm (Hearst, 1994). This is done using the reference of Roget's Theusaurus and POS tagger to identify tokens and carve the text into a block size of x tokens.

Once the tokens have been identified and segments created, Wordnet is used to create lexical chains within the segments. When this is done for every segment, the chains are inspected for inter-segment connections. Based on the links found, weights are assigned to individual chains to find their strength.

The top p% sentences related to strongest chains make up the extract.