Creating short summaries of documents with respect to a query has applications in for example search engines, where it may help inform users of the most relevant results. Constructing such a summary automatically, with the potential expressiveness of a human-written summary, is a difficult problem yet to be fully solved.
Difference between extractive and abstractive summarization:
Extractive summarization:
Extractive methods work by selecting a subset of existing words, phrases, or sentences in the original text to form the summary.
There is no natural language generation involved.
Abstractive summarization
In Query based abstractive text summarization the aim is to generate the summary of a document in the context of a query. In general, abstractive summarization, aims to cover all the salient points of a document in a compact and coherent fashion.
Query focused summarization highlights those points that are relevant in the context of the query.
Each summary is abstractive and not extractive in the sense that the summary does not necessarily comprise of a sentence which is simply copied from the original document.
Can be used in search engines :
There can be multiple such business scenarios where we can feed large documents and then get results.
Useful for Q and A platforms, where the system can answer queries based on relevant tags/sources.
The dataset is from Debatepedia an encyclopedia of pro and con arguments and quotes on critical debate topics. There are 663 debates in the corpus (only those debates are considered which have at least one query with one document). These 663 debates belong to 53 overlapping categories such as Politics, Law, Crime, Environment, Health, Morality, Religion, etc. A given topic can belong to more than one category. For example, the topic “Eye for an Eye philos
rouge-1: 28.074
rouge-2: 2.183
rouge-L: 21.681
Challenges and limitations ==========================
To get comfortable with deep learning techniques.
Understanding the intuition behind the working of different models.
Generating large dataset including the ground truth of queries.
The above task is limited to single-document summarization.
Code and Dataset:\ https://github.com/group23IRE/diversity_based_attention
Project Webpage:\ https://group23ire.github.io
Encoder -Decoder Model:\ https://cs224d.stanford.edu/reports/urvashik.pdf
A neural attention model for abstractive text summarization\ https://arxiv.org/pdf/1509.00685.pdf
TENSORFLOW RNN\ https://medium.com/@erikhallstrm/hello-world-rnn-83cd7105b767
http://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html
http://publications.lib.chalmers.se/records/fulltext/249908/249908.pdf