ExoPatent is an experiment of various retrieval techniques on top of patent information.
Facts: 40 000 patents, 23 000 patented drugs from FDA Orange Book, 370 000 terms from Unified Medical Language System (UMLS)
Process:
- Patents exported from the Alexandria patent repository;
- Drug product data obtained from the FDA Orange Book;
- FDA Products naive ontology created and aligned with a basic upper level ontology PROTON;
- References to FDA products and product characteristics in patent content have been semantically annotated (references from the content pointing to entities in the knowledge base);
- Other classes named entities are also extracted from the patents. ExoPatent recognizes
a wide range diseases and other medical terms, defined
in the industry-standard Unified Medical Language System (UMLS).
UMLS features rich semantic information that expresses the links
between the medical concepts. Those links allow for complex structured queries for entities or patents;
- Finally, ExoPatent recognizes different types of measurements in the patent text.
Retrieval:
- Structure: provides a simplified way for creating queries, enabling the user to link patents with structured information.
- Predefined patterns of semantic searches: useful for frequent queries with different restrictions.
- Facets - faceted search paradigm based on co-occurrence of entities and terms in the same context.
- Hybrid - a cross between full-text and entity search, allows you to put both entities and keywords in a simple search box; auto-suggest entity names; show facets.
- Results could be entities or documents.
Analysis:
- Timelines and Trend Analysis: how popularity of entities changes through time; what trends emerge;
Technology:
- KIM - Semantic Annotation and Search Platform by Ontotext
- GATE - text engineering platform by University of Sheffield
- OWLIM - semantic database by Ontotext
Acknowledgements: