- Information Extraction systems try to automatically extract structured information from unstructured or semi-structured machine-readable documents.
- In most of the cases, this activity concerns processing human language texts by means of natural language processing.
- Information Extraction differs from traditional Information Retrieval techniques in that it does not recover (from a collection of documents) a subset of documents which are hopefully relevant to a query.
- Instead, the goal of Information Extraction is to extract from the documents (which may be in a variety of languages) salient facts about prespecified types of events, entities or relationships.
- These facts are then usually entered automatically into a database or an XML file, which may then be used to analyse the data for trends, to give a natural language summary, or simply to serve for on-line access.
General Architecture for Text Engineering. GATE Information Extraction.