Schema documentation for component ms:dataFormat

enumeration

http://w3id.org/meta-share/omtd-share/AudioFormat

Any format used for audio files

enumeration

http://w3id.org/meta-share/omtd-share/basic

enumeration

http://w3id.org/meta-share/omtd-share/mpg

enumeration

http://w3id.org/meta-share/omtd-share/aif

enumeration

http://w3id.org/meta-share/omtd-share/wav

enumeration

http://w3id.org/meta-share/omtd-share/mp3

enumeration

http://w3id.org/meta-share/omtd-share/Xml

Superclass for grouping together XML formats

enumeration

http://w3id.org/meta-share/omtd-share/Tuepp

Format of the Tübingen Partially Parsed Corpus of Written German (TüPP-D/Z) XML files; TüPP D/Z (http://www.sfs.uni-tuebingen.de/de/ascl/ressourcen/corpora/tuepp-dz.html) is a collection of articles from the German newspaper taz (die tageszeitung) annotated and encoded in a XML format.

enumeration

http://w3id.org/meta-share/omtd-share/Folia

FoLiA is an XML-based annotation format, suitable for the representation of linguistically annotated language resources

enumeration

http://w3id.org/meta-share/omtd-share/Tmx

The purpose of the TMX format is to provide a standard method to describe translation memory data that is being exchanged among tools and/or translation vendors, while introducing little or no loss of critical data during the process.

enumeration

http://w3id.org/meta-share/omtd-share/BncFormat

Data format for the XML version of the British National Corpus (http://www.natcorp.ox.ac.uk/)

enumeration

http://w3id.org/meta-share/omtd-share/Xmi

Data format for the XML Metadata Interchange (XMI), which is an Object Management Group (OMG) standard for exchanging metadata information via Extensible Markup Language (XML)

enumeration

http://w3id.org/meta-share/omtd-share/Rdf_xml

Data format for RDF (Resource Description Framework) XML format; RDF/XML is a serialisation for RDF

enumeration

http://w3id.org/meta-share/omtd-share/Tcf

An XML data exchange format developed within the WebLicht architecture to facilitate efficient interoperability between the tools; it allows the various linguistic annotations produced by the tools within WebLicht to be stored in one document; it supports incremental enrichment of linguistic annotations at various levels of analysis in a stand-off XML‐based format

enumeration

http://w3id.org/meta-share/omtd-share/OpenOfficeDocument

enumeration

http://w3id.org/meta-share/omtd-share/Xhtml

Data format for XHTML (Extensible HyperText Markup Language)

enumeration

http://w3id.org/meta-share/omtd-share/InlineXml

Inline XML file format

enumeration

http://w3id.org/meta-share/omtd-share/Alto

enumeration

http://w3id.org/meta-share/omtd-share/XmlBioc

BioC is a simple format to share text data and annotations.

enumeration

http://w3id.org/meta-share/omtd-share/OpenOfficeSpreadsheet

enumeration

http://w3id.org/meta-share/omtd-share/Xces

Data format for documents and corpora using the XCES standard (Corpus Encoding Standard for XML), cf. http://www.xces.org/

enumeration

http://w3id.org/meta-share/omtd-share/XcesIlspVariant

A variant of XCES implemented for documents

enumeration

http://w3id.org/meta-share/omtd-share/AlvisEnrichedDocumentFormat

Format for linguistic annotations of documents used for the ALVIS framework

enumeration

http://w3id.org/meta-share/omtd-share/GateXml

XML-based format for GATE components

enumeration

http://w3id.org/meta-share/omtd-share/Xpath

XPath is a language for addressing parts of an XML document, designed to be used by both XSLT and XPointer.

enumeration

http://w3id.org/meta-share/omtd-share/TigerXml

The TIGER XML format was created for encoding syntactic constituency structures in the German TIGER corpus. It has since been used for many other corpora as well. TIGERSearch is a linguistic search engine specifically targetting this format. The format has later been extended to also support semantic frame annotations.

enumeration

http://w3id.org/meta-share/omtd-share/Pml

Format according to the Prague Markup Language (http://ufal.mff.cuni.cz/jazz/PML/index_en.html); PML is a generic data format based on XML intended for storing linguistically annotated data, such as the Prague Dependency Treebank, also annotation lexicons, etc.

enumeration

http://w3id.org/meta-share/omtd-share/Emma

Data format according to the EMMA (Extensible MultiModal Annotation markup language) specifications, cf. https://www.w3.org/TR/2007/CR-emma-20071211/

enumeration

http://w3id.org/meta-share/omtd-share/Owl_xml

XML format for OWL ontologies

enumeration

http://w3id.org/meta-share/omtd-share/Pls

Data format according to the Pronunciation Lexicon Specification (PLS)

enumeration

http://w3id.org/meta-share/omtd-share/OpenOfficePresentation

enumeration

http://w3id.org/meta-share/omtd-share/Tei

Data format for TEI-encoded (Text Encoding Initiative) texts

enumeration

http://w3id.org/meta-share/omtd-share/RdfFormat

Formats for RDF (Resource Description Framework) resources

enumeration

http://w3id.org/meta-share/omtd-share/Rdf_xml

Data format for RDF (Resource Description Framework) XML format; RDF/XML is a serialisation for RDF

enumeration

http://w3id.org/meta-share/omtd-share/Turtle

Textual syntax for RDF that allows an RDF graph to be completely written in a compact and natural text form, with abbreviations for common usage patterns and datatypes.

enumeration

http://w3id.org/meta-share/omtd-share/Obo

Serialization format for ontologies according to the Open Biomedical Ontologies model.

enumeration

http://w3id.org/meta-share/omtd-share/Nif

The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations; it consists of specifications, ontologies and software (overview), which are combined under the version identifier "NIF 2.0", but are versioned individually

enumeration

http://w3id.org/meta-share/omtd-share/Owl

Superclass for formats used for OWL

enumeration

http://w3id.org/meta-share/omtd-share/Owl_xml

XML format for OWL ontologies

enumeration

http://w3id.org/meta-share/omtd-share/UimaCasFormat

Formats used for the UIMA CAS (Common Analysis System) objects

enumeration

http://w3id.org/meta-share/omtd-share/SerializedCas

The CAS is the native data model used by UIMA; there are various ways of saving CAS data, using XMI, XCAS, or binary formats; this is for the serialized format

enumeration

http://w3id.org/meta-share/omtd-share/Uima_json

UIMA serialisation in JSON

enumeration

http://w3id.org/meta-share/omtd-share/BinaryCas

Binary format used for CAS data

enumeration

http://w3id.org/meta-share/omtd-share/DocumentFormat

Any format used for documents (textual resources)

enumeration

http://w3id.org/meta-share/omtd-share/Sgml

SGML format

enumeration

http://w3id.org/meta-share/omtd-share/Pls

Data format according to the Pronunciation Lexicon Specification (PLS)

enumeration

http://w3id.org/meta-share/omtd-share/Latex

Data format for documents using LaTeX (a high-quality typesetting system very popular for scientific documents)

enumeration

http://w3id.org/meta-share/omtd-share/Tika

enumeration

http://w3id.org/meta-share/omtd-share/OasisSpreadsheet

enumeration

http://w3id.org/meta-share/omtd-share/Rtf

Rich Text Format; proprietary data format of Microsoft

enumeration

http://w3id.org/meta-share/omtd-share/Html

HTML format

enumeration

http://w3id.org/meta-share/omtd-share/Html5Microdata

Format according to the specifications of HTML5 Microdata

enumeration

http://w3id.org/meta-share/omtd-share/OpenOfficeSpreadsheet

enumeration

http://w3id.org/meta-share/omtd-share/MsExcel

Data format for Microsoft Excel documents

enumeration

http://w3id.org/meta-share/omtd-share/MsWord

Data format for Microsoft Word documents

enumeration

http://w3id.org/meta-share/omtd-share/Pubmed

Textual format used for PubMed articles

enumeration

http://w3id.org/meta-share/omtd-share/BionlpFormats

Formats used  for BioNLP shared tasks

enumeration

http://w3id.org/meta-share/omtd-share/BionlpSt2013A1_a2

Format used in BioNLP Shared Task 2013

enumeration

http://w3id.org/meta-share/omtd-share/Json_genia

JSON format of the Genia dataset

enumeration

http://w3id.org/meta-share/omtd-share/Bionlp

File format used for the BioNLP Shared Task format

enumeration

http://w3id.org/meta-share/omtd-share/Cochrane

Format used in Cochrane texts

enumeration

http://w3id.org/meta-share/omtd-share/BionlpFormat

Formats used  for BioNLP shared tasks

enumeration

http://w3id.org/meta-share/omtd-share/Xmi

Data format for the XML Metadata Interchange (XMI), which is an Object Management Group (OMG) standard for exchanging metadata information via Extensible Markup Language (XML)

enumeration

http://w3id.org/meta-share/omtd-share/Tex

Data format for documents using Tex (a typesetting system)

enumeration

http://w3id.org/meta-share/omtd-share/Pdf

Data format for PDF files (Portable Document Format)

enumeration

http://w3id.org/meta-share/omtd-share/Xhtml

Data format for XHTML (Extensible HyperText Markup Language)

enumeration

http://w3id.org/meta-share/omtd-share/LD_json

Data format encoding Linked Data using JSON

enumeration

http://w3id.org/meta-share/omtd-share/OasisPresentation

enumeration

http://w3id.org/meta-share/omtd-share/OasisText

enumeration

http://w3id.org/meta-share/omtd-share/OpenOfficeDocument

enumeration

http://w3id.org/meta-share/omtd-share/OpenOfficePresentation

enumeration

http://w3id.org/meta-share/omtd-share/Postscript

Data format for PostScript files

enumeration

http://w3id.org/meta-share/omtd-share/MsPowerpoint

Data format for Microsoft Powerpoint files

enumeration

http://w3id.org/meta-share/omtd-share/Text

Default value for the format of textual files; a textual file should be human-readable and must not contain binary data

enumeration

http://w3id.org/meta-share/omtd-share/WikiFormat

Superclass for wiki formats

enumeration

http://w3id.org/meta-share/omtd-share/MediaWikiMarkup

Wiki markup for formatting

enumeration

http://w3id.org/meta-share/omtd-share/CorpusFormat

A format used by a specific type of corpus (collection of texts)

enumeration

http://w3id.org/meta-share/omtd-share/KeaCorpus

KEA-style (Keyphrase Extraction Algorithm) corpus

enumeration

http://w3id.org/meta-share/omtd-share/Tuepp

Format of the Tübingen Partially Parsed Corpus of Written German (TüPP-D/Z) XML files; TüPP D/Z (http://www.sfs.uni-tuebingen.de/de/ascl/ressourcen/corpora/tuepp-dz.html) is a collection of articles from the German newspaper taz (die tageszeitung) annotated and encoded in a XML format.

enumeration

http://w3id.org/meta-share/omtd-share/Web1t

File format used by the Web1T n-gram corpus, a huge collection of n-grams collected from the internet.

enumeration

http://w3id.org/meta-share/omtd-share/Imscwb

A tab-separated format with limited markup (e.g. for sentences, documents, but not recursive structures like parse-trees) used by the IMS Open Corpus Workbench.

enumeration

http://w3id.org/meta-share/omtd-share/BncFormat

Data format for the XML version of the British National Corpus (http://www.natcorp.ox.ac.uk/)

enumeration

http://w3id.org/meta-share/omtd-share/AclAnthologyCorpusFormat

Data format specific to the ACL Anthology Reference Corpus (http://acl-arc.comp.nus.edu.sg/), most probably version 20080325

enumeration

http://w3id.org/meta-share/omtd-share/Nif

The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations; it consists of specifications, ontologies and software (overview), which are combined under the version identifier "NIF 2.0", but are versioned individually

enumeration

http://w3id.org/meta-share/omtd-share/Reuters21578Txt

Reuters-21578 corpus transformed into text format using ExtractReuters in the lucene-benchmarks project

enumeration

http://w3id.org/meta-share/omtd-share/TigerXml

The TIGER XML format was created for encoding syntactic constituency structures in the German TIGER corpus. It has since been used for many other corpora as well. TIGERSearch is a linguistic search engine specifically targetting this format. The format has later been extended to also support semantic frame annotations.

enumeration

http://w3id.org/meta-share/omtd-share/AimedCorpusFormat

Format of the Aimed corpus (225 abstracts from MEDLINE) with the gold standard sentence, protein, protein-protein interaction annotations.

enumeration

http://w3id.org/meta-share/omtd-share/Tcf

An XML data exchange format developed within the WebLicht architecture to facilitate efficient interoperability between the tools; it allows the various linguistic annotations produced by the tools within WebLicht to be stored in one document; it supports incremental enrichment of linguistic annotations at various levels of analysis in a stand-off XML‐based format

enumeration

http://w3id.org/meta-share/omtd-share/Reuters21578Sgml

Reuters-21578 corpus in SGML format

enumeration

http://w3id.org/meta-share/omtd-share/WikiFormats

Superclass for wiki formats

enumeration

http://w3id.org/meta-share/omtd-share/MediaWikiMarkup

Wiki markup for formatting

enumeration

http://w3id.org/meta-share/omtd-share/WikipediaFormat

Formats used for wikipedia

enumeration

http://w3id.org/meta-share/omtd-share/WikipediaPage

Format of wikipedia pages in the database (articles, discussions, etc)

enumeration

http://w3id.org/meta-share/omtd-share/Blikiwikipedia

The Java Wikipedia API (Bliki engine) is a parser library for converting Wikipedia wikitext notation to HTML.

enumeration

http://w3id.org/meta-share/omtd-share/WikipediaTemplateFilteredArticle

Format for wikipedia pages that contain or do not contain the templates specified in the template whitelist and template blacklist

enumeration

http://w3id.org/meta-share/omtd-share/WikipediaLink

Format for wikipedia links

enumeration

http://w3id.org/meta-share/omtd-share/WikipediaRevision

Format for wikipedia revision pages

enumeration

http://w3id.org/meta-share/omtd-share/WikipediaRevisionPair

Pairs of adjacent revisions of all articles

enumeration

http://w3id.org/meta-share/omtd-share/WikipediaArticle

Format for wikipedia articles

enumeration

http://w3id.org/meta-share/omtd-share/WikipediaArticleInfo

Format of general article infos

enumeration

http://w3id.org/meta-share/omtd-share/WikipediaDiscussion

Format for wikipedia discussion pages

enumeration

http://w3id.org/meta-share/omtd-share/WikipediaQuery

Reads all article pages that match a query created by the numerous parameters of this class.

enumeration

http://w3id.org/meta-share/omtd-share/OpenFormat

enumeration

http://w3id.org/meta-share/omtd-share/OpenOfficeDocument

enumeration

http://w3id.org/meta-share/omtd-share/OpenOfficePresentation

enumeration

http://w3id.org/meta-share/omtd-share/OasisPresentation

enumeration

http://w3id.org/meta-share/omtd-share/OasisText

enumeration

http://w3id.org/meta-share/omtd-share/OasisSpreadsheet

enumeration

http://w3id.org/meta-share/omtd-share/OpenOfficeSpreadsheet

enumeration

http://w3id.org/meta-share/omtd-share/Json

Superclass of JSON formats

enumeration

http://w3id.org/meta-share/omtd-share/Avro_json

enumeration

http://w3id.org/meta-share/omtd-share/LD_json

Data format encoding Linked Data using JSON

enumeration

http://w3id.org/meta-share/omtd-share/Kaf

KAF (also known as Knowledge Annotation Format) is a language neutral annotation format representing both morpho-syntactic and semantic annotation of documents through a stand-off multilayered structure

enumeration

http://w3id.org/meta-share/omtd-share/WebAnnotationFormat

A structured model and format to enable annotations to be shared and reused across different hardware and software platforms.

enumeration

http://w3id.org/meta-share/omtd-share/Gate_twitter_json

A Twitter-style JSON format used for GATE documents

enumeration

http://w3id.org/meta-share/omtd-share/Datasift_json

Common format for social media data from http://datasift.com

enumeration

http://w3id.org/meta-share/omtd-share/Json_genia

JSON format of the Genia dataset

enumeration

http://w3id.org/meta-share/omtd-share/Uima_json

UIMA serialisation in JSON

enumeration

http://w3id.org/meta-share/omtd-share/Cadixe_json

AlvisAE protocol format

enumeration

http://w3id.org/meta-share/omtd-share/DatabaseFormat

Formats used for databases

enumeration

http://w3id.org/meta-share/omtd-share/Jdbc

For JDBC databases

enumeration

http://w3id.org/meta-share/omtd-share/MsAccessDatabase

Data format for Microsoft Access database

enumeration

http://w3id.org/meta-share/omtd-share/BinaryFormat

Any format of a computer file in which information is stored in the form of ones and zeros, or in some other binary (two-state) sequence; used mainly for executable files or files that need to be interpreted by a computer program

enumeration

http://w3id.org/meta-share/omtd-share/Pdf

Data format for PDF files (Portable Document Format)

enumeration

http://w3id.org/meta-share/omtd-share/FastInfoset

A compressed binary encoding of GATE XML

enumeration

http://w3id.org/meta-share/omtd-share/Solr

Solr format

enumeration

http://w3id.org/meta-share/omtd-share/GateFormat

Formats used for the GATE framework

enumeration

http://w3id.org/meta-share/omtd-share/GateXml

XML-based format for GATE components

enumeration

http://w3id.org/meta-share/omtd-share/Datasift_json

Common format for social media data from http://datasift.com

enumeration

http://w3id.org/meta-share/omtd-share/Gate_twitter_json

A Twitter-style JSON format used for GATE documents

enumeration

http://w3id.org/meta-share/omtd-share/FastInfoset

A compressed binary encoding of GATE XML

enumeration

http://w3id.org/meta-share/omtd-share/AnnotationFormat

Any format used for annotated files

enumeration

http://w3id.org/meta-share/omtd-share/Nif

The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations; it consists of specifications, ontologies and software (overview), which are combined under the version identifier "NIF 2.0", but are versioned individually

enumeration

http://w3id.org/meta-share/omtd-share/Anafora

enumeration

http://w3id.org/meta-share/omtd-share/Tuepp

Format of the Tübingen Partially Parsed Corpus of Written German (TüPP-D/Z) XML files; TüPP D/Z (http://www.sfs.uni-tuebingen.de/de/ascl/ressourcen/corpora/tuepp-dz.html) is a collection of articles from the German newspaper taz (die tageszeitung) annotated and encoded in a XML format.

enumeration

http://w3id.org/meta-share/omtd-share/DkproTokenized

DkPro format for tokenized files containing one sentence per line and tokens split by whitespaces.

enumeration

http://w3id.org/meta-share/omtd-share/Html5Microdata

Format according to the specifications of HTML5 Microdata

enumeration

http://w3id.org/meta-share/omtd-share/MsExcel

Data format for Microsoft Excel documents

enumeration

http://w3id.org/meta-share/omtd-share/Brat

BRAT stand-off format for annotations (BRAT is a online environment for collaborative text annotation, cf. http://brat.nlplab.org/)

enumeration

http://w3id.org/meta-share/omtd-share/InlineXml

Inline XML file format

enumeration

http://w3id.org/meta-share/omtd-share/Lll

Format of the LLL challenge

enumeration

http://w3id.org/meta-share/omtd-share/Cadixe_json

AlvisAE protocol format

enumeration

http://w3id.org/meta-share/omtd-share/Diaml

Format following Dialogue Act Markup Language (DiAML) which is defined within the ISO standard 24617-2

enumeration

http://w3id.org/meta-share/omtd-share/I2b2

Format of the I2B2 challenge

enumeration

http://w3id.org/meta-share/omtd-share/Naf

The NAF format is linguistic annotation format designed for complex NLP pipelines. NAF combines strengths of the Linguistic Annotation Framework (LAF) as described in Ide et al. (2003) and the NLP Interchange Format (Hellman et al. 2013, NIF).

enumeration

http://w3id.org/meta-share/omtd-share/Pml

Format according to the Prague Markup Language (http://ufal.mff.cuni.cz/jazz/PML/index_en.html); PML is a generic data format based on XML intended for storing linguistically annotated data, such as the Prague Dependency Treebank, also annotation lexicons, etc.

enumeration

http://w3id.org/meta-share/omtd-share/Ptb

Penn Tree Bank formats

enumeration

http://w3id.org/meta-share/omtd-share/PtbChunked

Penn Treebank chunked format

enumeration

http://w3id.org/meta-share/omtd-share/PtbCombined

Penn Treebank combined format

enumeration

http://w3id.org/meta-share/omtd-share/TigerXml

The TIGER XML format was created for encoding syntactic constituency structures in the German TIGER corpus. It has since been used for many other corpora as well. TIGERSearch is a linguistic search engine specifically targetting this format. The format has later been extended to also support semantic frame annotations.

enumeration

http://w3id.org/meta-share/omtd-share/Tmx

The purpose of the TMX format is to provide a standard method to describe translation memory data that is being exchanged among tools and/or translation vendors, while introducing little or no loss of critical data during the process.

enumeration

http://w3id.org/meta-share/omtd-share/MalletLdaTopicProportionsSorted

Topic proportions in the shape [\t]\t\t... sorted

enumeration

http://w3id.org/meta-share/omtd-share/WebAnnotationFormat

A structured model and format to enable annotations to be shared and reused across different hardware and software platforms.

enumeration

http://w3id.org/meta-share/omtd-share/NegraExport

Export format for annotated corpora in the NeGra project

enumeration

http://w3id.org/meta-share/omtd-share/FactoredTagLemFormat

Factored tag lemma format

enumeration

http://w3id.org/meta-share/omtd-share/Folia

FoLiA is an XML-based annotation format, suitable for the representation of linguistically annotated language resources

enumeration

http://w3id.org/meta-share/omtd-share/AlvisEnrichedDocumentFormat

Format for linguistic annotations of documents used for the ALVIS framework

enumeration

http://w3id.org/meta-share/omtd-share/ConllFormat

Formats used in the CoNLL Shared Tasks

enumeration

http://w3id.org/meta-share/omtd-share/ConllU

Format used for CoNLL.

enumeration

http://w3id.org/meta-share/omtd-share/Conll2012

The CoNLL 2012 format targets semantic role labeling and coreference. Columns are tab-separated. Sentences are separated by a blank new line.

enumeration

http://w3id.org/meta-share/omtd-share/Conll2003

The CoNLL 2004 format encodes named entity spans and chunk spans. Fields are separated by a single space. Sentences are separated by a blank new line. Named entities and chunks are encoded in the IOB1 format. I.e. a B prefix is only used if the category of the following span differs from the category of the current span.

enumeration

http://w3id.org/meta-share/omtd-share/Conll2002

The CoNLL 2002 format encodes named entity spans. Fields are separated by a single space. Sentences are separated by a blank new line.

enumeration

http://w3id.org/meta-share/omtd-share/Conll2008

The CoNLL 2008 format targets syntactic and semantic dependencies. Columns are tab-separated. Sentences are separated by a blank new line.

enumeration

http://w3id.org/meta-share/omtd-share/Conll2000

The CoNLL 2000 format represents POS and Chunk tags. Fields in a line are separated by spaces. Sentences are separated by a blank new line.

enumeration

http://w3id.org/meta-share/omtd-share/Conll2009

The CoNLL 2009 format targets semantic role labeling. Columns are tab-separated. Sentences are separated by a blank new line.

enumeration

http://w3id.org/meta-share/omtd-share/Conll2006

The CoNLL 2006 (aka CoNLL-X) format targets dependency parsing. Columns are tab-separated. Sentences are separated by a blank new line.

enumeration

http://w3id.org/meta-share/omtd-share/Xces

Data format for documents and corpora using the XCES standard (Corpus Encoding Standard for XML), cf. http://www.xces.org/

enumeration

http://w3id.org/meta-share/omtd-share/XcesIlspVariant

A variant of XCES implemented for documents

enumeration

http://w3id.org/meta-share/omtd-share/Tei

Data format for TEI-encoded (Text Encoding Initiative) texts

enumeration

http://w3id.org/meta-share/omtd-share/Tcf

An XML data exchange format developed within the WebLicht architecture to facilitate efficient interoperability between the tools; it allows the various linguistic annotations produced by the tools within WebLicht to be stored in one document; it supports incremental enrichment of linguistic annotations at various levels of analysis in a stand-off XML‐based format

enumeration

http://w3id.org/meta-share/omtd-share/MalletLdaTopicProportions

Topic proportions in the shape [\t]\t\t...

enumeration

http://w3id.org/meta-share/omtd-share/Kaf

KAF (also known as Knowledge Annotation Format) is a language neutral annotation format representing both morpho-syntactic and semantic annotation of documents through a stand-off multilayered structure

enumeration

http://w3id.org/meta-share/omtd-share/Tgrep2

Format for TGrep2 (search engine for searching syntactic parse trees represented as bracketed structures)

enumeration

http://w3id.org/meta-share/omtd-share/Graf

GrAF (Graph Annotation Format) is an extension of the Linguistic Annotation Framework (LAF)

enumeration

http://w3id.org/meta-share/omtd-share/Emma

Data format according to the EMMA (Extensible MultiModal Annotation markup language) specifications, cf. https://www.w3.org/TR/2007/CR-emma-20071211/

enumeration

http://w3id.org/meta-share/omtd-share/Chat

CHAT (Codes for the Human Analysis of Transcripts) transcription format; used by CHILDES corpora

enumeration

http://w3id.org/meta-share/omtd-share/RdfFormats

Formats for RDF (Resource Description Framework) resources

enumeration

http://w3id.org/meta-share/omtd-share/Rdf_xml

Data format for RDF (Resource Description Framework) XML format; RDF/XML is a serialisation for RDF

enumeration

http://w3id.org/meta-share/omtd-share/Turtle

Textual syntax for RDF that allows an RDF graph to be completely written in a compact and natural text form, with abbreviations for common usage patterns and datatypes.

enumeration

http://w3id.org/meta-share/omtd-share/Nif

The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations; it consists of specifications, ontologies and software (overview), which are combined under the version identifier "NIF 2.0", but are versioned individually

enumeration

http://w3id.org/meta-share/omtd-share/Owl

Superclass for formats used for OWL

enumeration

http://w3id.org/meta-share/omtd-share/Owl_xml

XML format for OWL ontologies

enumeration

http://w3id.org/meta-share/omtd-share/Obo

Serialization format for ontologies according to the Open Biomedical Ontologies model.

enumeration

http://w3id.org/meta-share/omtd-share/TabularFormat

Any format based on columns

enumeration

http://w3id.org/meta-share/omtd-share/Csv

Data format with comma-separated values

enumeration

http://w3id.org/meta-share/omtd-share/Imscwb

A tab-separated format with limited markup (e.g. for sentences, documents, but not recursive structures like parse-trees) used by the IMS Open Corpus Workbench.

enumeration

http://w3id.org/meta-share/omtd-share/OasisSpreadsheet

enumeration

http://w3id.org/meta-share/omtd-share/ConllFormat

Formats used in the CoNLL Shared Tasks

enumeration

http://w3id.org/meta-share/omtd-share/ConllU

Format used for CoNLL.

enumeration

http://w3id.org/meta-share/omtd-share/Conll2012

The CoNLL 2012 format targets semantic role labeling and coreference. Columns are tab-separated. Sentences are separated by a blank new line.

enumeration

http://w3id.org/meta-share/omtd-share/Conll2003

The CoNLL 2004 format encodes named entity spans and chunk spans. Fields are separated by a single space. Sentences are separated by a blank new line. Named entities and chunks are encoded in the IOB1 format. I.e. a B prefix is only used if the category of the following span differs from the category of the current span.

enumeration

http://w3id.org/meta-share/omtd-share/Conll2002

The CoNLL 2002 format encodes named entity spans. Fields are separated by a single space. Sentences are separated by a blank new line.

enumeration

http://w3id.org/meta-share/omtd-share/Conll2008

The CoNLL 2008 format targets syntactic and semantic dependencies. Columns are tab-separated. Sentences are separated by a blank new line.

enumeration

http://w3id.org/meta-share/omtd-share/Conll2000

The CoNLL 2000 format represents POS and Chunk tags. Fields in a line are separated by spaces. Sentences are separated by a blank new line.

enumeration

http://w3id.org/meta-share/omtd-share/Conll2009

The CoNLL 2009 format targets semantic role labeling. Columns are tab-separated. Sentences are separated by a blank new line.

enumeration

http://w3id.org/meta-share/omtd-share/Conll2006

The CoNLL 2006 (aka CoNLL-X) format targets dependency parsing. Columns are tab-separated. Sentences are separated by a blank new line.

enumeration

http://w3id.org/meta-share/omtd-share/MsExcel

Data format for Microsoft Excel documents

enumeration

http://w3id.org/meta-share/omtd-share/OpenOfficeSpreadsheet

enumeration

http://w3id.org/meta-share/omtd-share/Tsv

Format for files with tab-separated values

enumeration

http://w3id.org/meta-share/omtd-share/LinkedDataFormat

Formats used for linked data

enumeration

http://w3id.org/meta-share/omtd-share/LD_json

Data format encoding Linked Data using JSON

Showing: