annotation (text/corpus annotation)
A note by way of explanation or comment added to a text or diagram [Oxford English Dictionary, https://en.oxforddictionaries.com/definition/annotation]. In OpenMinTeD, the term refers mainly to text or corpus annotation, which is the practice of adding interpretative linguistic information grounded in a knowledge resource to a text or corpus respectively. For example, one common type of annotation is the addition of tags, or labels, indicating the word class to which lexical units in a text belong; these tags come from a predefined set (e.g. Noun, Verb, Preposition, etc.). Semantic labeling with terms and concepts from an ontology is another common example of annotation. Relationships such as syntactic dependencies or semantic relations that link entities of the text are also annotations.
Any resource that can be used for annotating a text, including part-of-speech tagsets, annotation schemes, domain-specific ontologies, etc.
A set of elements and values designed to annotate data. An annotation scheme usually aims to represent a specific level of information, such as morphological features of words, syntactic dependency relations between phrases, discourse level information, etc. It can consist of a flat structure of elements and values (e.g. part-of-speech tags) or it can be more complex with interrelated elements (e.g. specific morphological features to be used for each part-of-speech).
Any software program (or group of programs seen as a whole) intended for the end-user and addressing one or multiple related user needs.
component (software component)
An algorithm wrapped in a standard way so that it can be integrated as a reusable tool or service within a particular component-oriented framework such as UIMA, GATE, etc.
A structured collection of pieces of data (textual, audio, video, multimodal/multimedia, etc.) typically of considerable size and selected according to criteria external to these data (e.g. size, type of language, type of producers or expected audience, etc.) to represent as comprehensively as possible the object of study.
A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to properties of the real world entities. [Wikipedia, https://en.wikipedia.org/wiki/Data_model]
A piece of written, printed, or electronic matter that is primarily intended for reading.
Interoperability describes the extent to which systems and devices can work together, exchange data, and interpret that shared data. For two systems to be interoperable, they must be able to exchange data and subsequently present that data such that it can be understood by a user. [Research Data Alliance, http://smw-rda.esc.rzg.mpg.de/index.php/Interoperability]
A permission or a written evidence of a permission that confers the licensee the right to do something that otherwise would be prevented by the law.
The condition or state in which two or more licences can co-exist or be combined without conflicting with each other. In OpenMinTeD, licence compatibility and licence interoperability are used as synonyms.
A resource (data and/or tool) containing, producing or representing knowledge; knowledge is specific information that is relevant for the linguistic and conceptual interpretation of data. For OpenMinTeD purposes, this information is exploited or produced by TDM modules and tools, or exchanged between them.
The resource describes a language or some aspect(s) of a language via a systematic documentation of linguistic structures. [Open Language Archives Community, http://www.language-archives.org/REC/type.html#language_description] Examples include sketch grammar, computational grammar, etc.
Language Resources (LRs) encompass (a) data sets (textual, multimodal/multimedia and lexical data, grammars, language models, etc.) in machine readable form, used to assist and augment language processing applications, but also, in a broader sense, in language and language-mediated research studies and applications, and (b) tools/technologies/services used for their processing.
A resource organised on the basis of lexical or conceptual entries (lexical items, terms, concepts, etc.) with their supplementary information (e.g. grammatical, semantic, statistical information, etc.). In OpenMinTeD, they can be used for annotation purposes.
machine learning (ML) model
The process of training an ML model involves providing an ML algorithm (that is, the learning algorithm) with training data to learn from. The term ML model refers to the model artifact that is created by the training process. [http://docs.aws.amazon.com/machine-learning/latest/dg/training-ml-models.html]
Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. Metadata is often called data about data or information about information. [National Information Standards Organization, Understanding metadata, https://groups.niso.org/apps/group_public/download.php/17446/Understanding%20Metadata.pdf]
The free and online availability of literature, which allows to read, download, copy, distribute, print, search, or link to the full text, crawl articles for indexing, pass them as data to software, or use them for any other useful purpose. An availability that is granted without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself, and those related to giving authors control over the integrity of their work and the right to be properly acknowledged and cited [Budapest OA Initiative 2002; Bethesda Statement on OA Publishing 2003; Berlin Declaration on OA Knowledge in Science and Humanities 2003]
An infrastructure refers to the basic structures and facilities required for the operation of a system. The OpenMinTeD infrastructure consists of different layers of resources: content resources that can be mined, ancillary knowledge resources, tools and web services. Any resource that can be registered in the OpenMinTeD registry is part of the underlying infrastructure.
The OpenMinTeD platform brings together all the services that facilitate the interoperability aspects of the underlying infrastructure (e.g. registration, search and browsing, creation of workflows, processing, annotation, etc.) and, thus, becomes an infrastructural service of the wider research ecosystem.
A book, article, etc., that has been made available to the public either via a formal publication service or over the internet and is stored at an archive or repository. For OpenMinTeD purposes, this mainly covers scholarly publications.
Something that you can use to help you to achieve something, especially in your work or study. [MacMillan dictionary, http://www.macmillandictionary.com/dictionary/british/resource_1]
Formal or official statement asserting the copyright status and/or the licensing conditions for a given resource. It can be issued by an authoritative body (e.g. http://rightsstatements.org/). For OpenMinTeD purposes, it can be deemed similar to a "licence category", grouping licences that share similar features.
Text and Data Mining
Text and Data Mining (TDM) was initially defined as “the discovery by computer of new, previously unknown information, by automatically extracting and relating information from different (…) resources, to reveal otherwise hidden meanings” (Hearst, 1999), in other words, “an exploratory data analysis that leads to the discovery of heretofore unknown information, or to answers for questions for which the answer is not currently known” (Hearst, 1999). [FutureTDM, http://www.futuretdm.eu/news/tdm-definition/]
service / web service
Piece of software accessible through remote invocation typically using some REST-style APIs or SOAP protocols.
Piece of (standalone) software typically for a very limited technical purpose, such as a particular implementation of a part-of-speech tagger (e.g. TreeTagger), a tree parsing program (e.g. mstparser), etc. Preferred terms in OpenMinTeD include 'component' and 'workflow'.
A series of software components assembled together in order to perform a specific task.