Klaus U. Schulz

Automated Metadata Extraction: Finding Topic in Texts

Speaker: Prof. Klaus U. Schulz, University of Munich and TopicZoom GmbH


We present a web service for automated extraction of metadata for electronic text documents in libraries and archives. Given any text in electronic form, this service automatically computes a set of topics that represent a thematic profile of input text. For texts containing geographic or temporal references, also relevant geographic regions and periods are extracted. Each topic comes with a specific weight measuring the relevance of the topic for the given document. Topics are derived from a very large taxonomy/ontology via fast linguistic analysis of input texts. We describe the taxonomy/ontology and the main principles for text analysis. The topic tags computed can be used for offering a thematic access to text collections, for linking texts with other data and knowledge resources, and for intelligent data mining and data analysis.