|
Cuttingedge Online
Home
Project Page
Sitemap Downloads
Version 1.1
Help
Project Brief
About Project
Design details
Presentations
Future Work
Interact!
Contact Us
Discussion Forum
Bug Reports
Developers
Programmers
|
|
PROJECT BRIEF
|
|
Cutting edge aims at forming the extract of an arbitrary
text that provides its summary. The text is assumed to be following text
progression pattern. |
|
|
It is based on the algorithm described in the paper 'Using
Lexical Chains for Text Summarization' by Regina Barzilay and Michael
Elhadad.
The essence of text summarization lies in the formation of lexical
chains, identifying strong chains and finally extracting the sentences
that relate to the words in these strong chains.
|
|
|
WordNet Thesaurus
fnTBL -
POS tagger and Chunker
Segmentation Algorithm |
|
|
Given an input technical report, we proceed to first segment
the text, using the segmentation algorithm (Hearst, 1994). This is done
using the reference of Roget's Theusaurus and POS tagger to identify
tokens and carve the text into a block size of x tokens.
Once the tokens have been identified and segments created, Wordnet is
used to create lexical chains within the segments. When this is done for
every segment, the chains are inspected for inter-segment connections.
Based on the links found, weights are assigned to individual chains to
find their strength.
The top p% sentences related to strongest chains make up the extract.
|
|