How to comply with OpenMinTeD interoperability specifications
OpenMinTeD has defined a set of technical and legal specifications that intend to facilitate the secure and robust deployment of TDM applications running on scholarly content. The specifications for content focus on their interaction with the applications and/or components.
At present, the OpenMinTeD platform supports the following:
- Data formats: The preferred formats for delivering textual material are plain text, PDF (not proprietary and certainly not of scanned images), and XML, which can be read by one of the readers offered in OpenMinTeD.
- Character encoding: The preferred character encoding is UTF-8.
To be fully compatible with OpenMinTeD, you must endorse the following rules:
- You must ensure that the publications are distributed under Open Access conditions
- You must include in the metadata record of each publication a link to the licence document that describes the terms and conditions under which it is provided, and attach the licence document together with the publication
- Unique and persistent identifiers:
If you wish your material to be easily processable and interoperable with TDM tools and services, you should adopt the following recommendations. Please, note that they are not absolute: if your material is not compliant with them, it may still be processable, but their adoption makes it better equipped for TDM and NLP processing.
Domain classification: Use standard classification vocabularies, such as MeSH, DDC, LCSH etc., for adding classification tags to your material and specify the vocabulary you use in the metadata record; provide at least one broad category for your material (e.g. life sciences, computing etc.).
Linking through authority lists: In all cases, where linking to other resources or entities (e.g. persons, projects etc.) in the metadata records is added, please try to do this through unique and persistent identifiers of authority lists and sources (e.g. ORCID for persons, ISNI or fundref for organizations), to the extent possible, documenting also the authority and/or scheme it adheres to.
Annotation formats: If you want to provide annotated publications1, please note that OpenMinTeD has endorsed the use of the XML Metadata Interchange (XMI) format, specifically the representation of a UIMA CAS to encode annotations on text in particular when exchanging data between components within a workflow.