enumeration |
http://w3id.org/meta-share/omtd-share/AudioFormat |
Any format used for audio files |
|
enumeration |
http://w3id.org/meta-share/omtd-share/basic |
|
enumeration |
http://w3id.org/meta-share/omtd-share/mpg |
|
enumeration |
http://w3id.org/meta-share/omtd-share/aif |
|
enumeration |
http://w3id.org/meta-share/omtd-share/wav |
|
enumeration |
http://w3id.org/meta-share/omtd-share/mp3 |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Xml |
Superclass for grouping together XML formats |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Tuepp |
Format of the Tübingen Partially Parsed Corpus of Written German (TüPP-D/Z) XML files; TüPP D/Z (http://www.sfs.uni-tuebingen.de/de/ascl/ressourcen/corpora/tuepp-dz.html) is a collection of articles from the German newspaper taz (die tageszeitung) annotated and encoded in a XML format. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Folia |
FoLiA is an XML-based annotation format, suitable for the representation of linguistically annotated language resources |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Tmx |
The purpose of the TMX format is to provide a standard method to describe translation memory data that is being exchanged among tools and/or translation vendors, while introducing little or no loss of critical data during the process. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/BncFormat |
Data format for the XML version of the British National Corpus (http://www.natcorp.ox.ac.uk/) |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Xmi |
Data format for the XML Metadata Interchange (XMI), which is an Object Management Group (OMG) standard for exchanging metadata information via Extensible Markup Language (XML) |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Rdf_xml |
Data format for RDF (Resource Description Framework) XML format; RDF/XML is a serialisation for RDF |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Tcf |
An XML data exchange format developed within the WebLicht architecture to facilitate efficient interoperability between the tools; it allows the various linguistic annotations produced by the tools within WebLicht to be stored in one document; it supports incremental enrichment of linguistic annotations at various levels of analysis in a stand-off XML‐based format |
|
enumeration |
http://w3id.org/meta-share/omtd-share/OpenOfficeDocument |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Xhtml |
Data format for XHTML (Extensible HyperText Markup Language) |
|
enumeration |
http://w3id.org/meta-share/omtd-share/InlineXml |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Alto |
|
enumeration |
http://w3id.org/meta-share/omtd-share/XmlBioc |
BioC is a simple format to share text data and annotations. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/OpenOfficeSpreadsheet |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Xces |
Data format for documents and corpora using the XCES standard (Corpus Encoding Standard for XML), cf. http://www.xces.org/ |
|
enumeration |
http://w3id.org/meta-share/omtd-share/XcesIlspVariant |
A variant of XCES implemented for documents |
|
enumeration |
http://w3id.org/meta-share/omtd-share/AlvisEnrichedDocumentFormat |
Format for linguistic annotations of documents used for the ALVIS framework |
|
enumeration |
http://w3id.org/meta-share/omtd-share/GateXml |
XML-based format for GATE components |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Xpath |
XPath is a language for addressing parts of an XML document, designed to be used by both XSLT and XPointer. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/TigerXml |
The TIGER XML format was created for encoding syntactic constituency structures in the German TIGER corpus. It has since been used for many other corpora as well. TIGERSearch is a linguistic search engine specifically targetting this format. The format has later been extended to also support semantic frame annotations. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Pml |
Format according to the Prague Markup Language (http://ufal.mff.cuni.cz/jazz/PML/index_en.html); PML is a generic data format based on XML intended for storing linguistically annotated data, such as the Prague Dependency Treebank, also annotation lexicons, etc. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Emma |
Data format according to the EMMA (Extensible MultiModal Annotation markup language) specifications, cf. https://www.w3.org/TR/2007/CR-emma-20071211/ |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Owl_xml |
XML format for OWL ontologies |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Pls |
Data format according to the Pronunciation Lexicon Specification (PLS) |
|
enumeration |
http://w3id.org/meta-share/omtd-share/OpenOfficePresentation |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Tei |
Data format for TEI-encoded (Text Encoding Initiative) texts |
|
enumeration |
http://w3id.org/meta-share/omtd-share/RdfFormat |
Formats for RDF (Resource Description Framework) resources |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Rdf_xml |
Data format for RDF (Resource Description Framework) XML format; RDF/XML is a serialisation for RDF |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Turtle |
Textual syntax for RDF that allows an RDF graph to be completely written in a compact and natural text form, with abbreviations for common usage patterns and datatypes. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Obo |
Serialization format for ontologies according to the Open Biomedical Ontologies model. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Nif |
The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations; it consists of specifications, ontologies and software (overview), which are combined under the version identifier "NIF 2.0", but are versioned individually |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Owl |
Superclass for formats used for OWL |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Owl_xml |
XML format for OWL ontologies |
|
enumeration |
http://w3id.org/meta-share/omtd-share/UimaCasFormat |
Formats used for the UIMA CAS (Common Analysis System) objects |
|
enumeration |
http://w3id.org/meta-share/omtd-share/SerializedCas |
The CAS is the native data model used by UIMA; there are various ways of saving CAS data, using XMI, XCAS, or binary formats; this is for the serialized format |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Uima_json |
UIMA serialisation in JSON |
|
enumeration |
http://w3id.org/meta-share/omtd-share/BinaryCas |
Binary format used for CAS data |
|
enumeration |
http://w3id.org/meta-share/omtd-share/DocumentFormat |
Any format used for documents (textual resources) |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Sgml |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Pls |
Data format according to the Pronunciation Lexicon Specification (PLS) |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Latex |
Data format for documents using LaTeX (a high-quality typesetting system very popular for scientific documents) |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Tika |
|
enumeration |
http://w3id.org/meta-share/omtd-share/OasisSpreadsheet |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Rtf |
Rich Text Format; proprietary data format of Microsoft |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Html |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Html5Microdata |
Format according to the specifications of HTML5 Microdata |
|
enumeration |
http://w3id.org/meta-share/omtd-share/OpenOfficeSpreadsheet |
|
enumeration |
http://w3id.org/meta-share/omtd-share/MsExcel |
Data format for Microsoft Excel documents |
|
enumeration |
http://w3id.org/meta-share/omtd-share/MsWord |
Data format for Microsoft Word documents |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Pubmed |
Textual format used for PubMed articles |
|
enumeration |
http://w3id.org/meta-share/omtd-share/BionlpFormats |
Formats used for BioNLP shared tasks |
|
enumeration |
http://w3id.org/meta-share/omtd-share/BionlpSt2013A1_a2 |
Format used in BioNLP Shared Task 2013 |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Json_genia |
JSON format of the Genia dataset |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Bionlp |
File format used for the BioNLP Shared Task format |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Cochrane |
Format used in Cochrane texts |
|
enumeration |
http://w3id.org/meta-share/omtd-share/BionlpFormat |
Formats used for BioNLP shared tasks |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Xmi |
Data format for the XML Metadata Interchange (XMI), which is an Object Management Group (OMG) standard for exchanging metadata information via Extensible Markup Language (XML) |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Tex |
Data format for documents using Tex (a typesetting system) |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Pdf |
Data format for PDF files (Portable Document Format) |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Xhtml |
Data format for XHTML (Extensible HyperText Markup Language) |
|
enumeration |
http://w3id.org/meta-share/omtd-share/LD_json |
Data format encoding Linked Data using JSON |
|
enumeration |
http://w3id.org/meta-share/omtd-share/OasisPresentation |
|
enumeration |
http://w3id.org/meta-share/omtd-share/OasisText |
|
enumeration |
http://w3id.org/meta-share/omtd-share/OpenOfficeDocument |
|
enumeration |
http://w3id.org/meta-share/omtd-share/OpenOfficePresentation |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Postscript |
Data format for PostScript files |
|
enumeration |
http://w3id.org/meta-share/omtd-share/MsPowerpoint |
Data format for Microsoft Powerpoint files |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Text |
Default value for the format of textual files; a textual file should be human-readable and must not contain binary data |
|
enumeration |
http://w3id.org/meta-share/omtd-share/WikiFormat |
Superclass for wiki formats |
|
enumeration |
http://w3id.org/meta-share/omtd-share/MediaWikiMarkup |
Wiki markup for formatting |
|
enumeration |
http://w3id.org/meta-share/omtd-share/CorpusFormat |
A format used by a specific type of corpus (collection of texts) |
|
enumeration |
http://w3id.org/meta-share/omtd-share/KeaCorpus |
KEA-style (Keyphrase Extraction Algorithm) corpus |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Tuepp |
Format of the Tübingen Partially Parsed Corpus of Written German (TüPP-D/Z) XML files; TüPP D/Z (http://www.sfs.uni-tuebingen.de/de/ascl/ressourcen/corpora/tuepp-dz.html) is a collection of articles from the German newspaper taz (die tageszeitung) annotated and encoded in a XML format. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Web1t |
File format used by the Web1T n-gram corpus, a huge collection of n-grams collected from the internet. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Imscwb |
A tab-separated format with limited markup (e.g. for sentences, documents, but not recursive structures like parse-trees) used by the IMS Open Corpus Workbench. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/BncFormat |
Data format for the XML version of the British National Corpus (http://www.natcorp.ox.ac.uk/) |
|
enumeration |
http://w3id.org/meta-share/omtd-share/AclAnthologyCorpusFormat |
Data format specific to the ACL Anthology Reference Corpus (http://acl-arc.comp.nus.edu.sg/), most probably version 20080325 |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Nif |
The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations; it consists of specifications, ontologies and software (overview), which are combined under the version identifier "NIF 2.0", but are versioned individually |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Reuters21578Txt |
Reuters-21578 corpus transformed into text format using ExtractReuters in the lucene-benchmarks project |
|
enumeration |
http://w3id.org/meta-share/omtd-share/TigerXml |
The TIGER XML format was created for encoding syntactic constituency structures in the German TIGER corpus. It has since been used for many other corpora as well. TIGERSearch is a linguistic search engine specifically targetting this format. The format has later been extended to also support semantic frame annotations. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/AimedCorpusFormat |
Format of the Aimed corpus (225 abstracts from MEDLINE) with the gold standard sentence, protein, protein-protein interaction annotations. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Tcf |
An XML data exchange format developed within the WebLicht architecture to facilitate efficient interoperability between the tools; it allows the various linguistic annotations produced by the tools within WebLicht to be stored in one document; it supports incremental enrichment of linguistic annotations at various levels of analysis in a stand-off XML‐based format |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Reuters21578Sgml |
Reuters-21578 corpus in SGML format |
|
enumeration |
http://w3id.org/meta-share/omtd-share/WikiFormats |
Superclass for wiki formats |
|
enumeration |
http://w3id.org/meta-share/omtd-share/MediaWikiMarkup |
Wiki markup for formatting |
|
enumeration |
http://w3id.org/meta-share/omtd-share/WikipediaFormat |
Formats used for wikipedia |
|
enumeration |
http://w3id.org/meta-share/omtd-share/WikipediaPage |
Format of wikipedia pages in the database (articles, discussions, etc) |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Blikiwikipedia |
The Java Wikipedia API (Bliki engine) is a parser library for converting Wikipedia wikitext notation to HTML. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/WikipediaTemplateFilteredArticle |
Format for wikipedia pages that contain or do not contain the templates specified in the template whitelist and template blacklist |
|
enumeration |
http://w3id.org/meta-share/omtd-share/WikipediaLink |
Format for wikipedia links |
|
enumeration |
http://w3id.org/meta-share/omtd-share/WikipediaRevision |
Format for wikipedia revision pages |
|
enumeration |
http://w3id.org/meta-share/omtd-share/WikipediaRevisionPair |
Pairs of adjacent revisions of all articles |
|
enumeration |
http://w3id.org/meta-share/omtd-share/WikipediaArticle |
Format for wikipedia articles |
|
enumeration |
http://w3id.org/meta-share/omtd-share/WikipediaArticleInfo |
Format of general article infos |
|
enumeration |
http://w3id.org/meta-share/omtd-share/WikipediaDiscussion |
Format for wikipedia discussion pages |
|
enumeration |
http://w3id.org/meta-share/omtd-share/WikipediaQuery |
Reads all article pages that match a query created by the numerous parameters of this class. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/OpenFormat |
|
enumeration |
http://w3id.org/meta-share/omtd-share/OpenOfficeDocument |
|
enumeration |
http://w3id.org/meta-share/omtd-share/OpenOfficePresentation |
|
enumeration |
http://w3id.org/meta-share/omtd-share/OasisPresentation |
|
enumeration |
http://w3id.org/meta-share/omtd-share/OasisText |
|
enumeration |
http://w3id.org/meta-share/omtd-share/OasisSpreadsheet |
|
enumeration |
http://w3id.org/meta-share/omtd-share/OpenOfficeSpreadsheet |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Json |
Superclass of JSON formats |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Avro_json |
|
enumeration |
http://w3id.org/meta-share/omtd-share/LD_json |
Data format encoding Linked Data using JSON |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Kaf |
KAF (also known as Knowledge Annotation Format) is a language neutral annotation format representing both morpho-syntactic and semantic annotation of documents through a stand-off multilayered structure |
|
enumeration |
http://w3id.org/meta-share/omtd-share/WebAnnotationFormat |
A structured model and format to enable annotations to be shared and reused across different hardware and software platforms. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Gate_twitter_json |
A Twitter-style JSON format used for GATE documents |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Datasift_json |
Common format for social media data from http://datasift.com |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Json_genia |
JSON format of the Genia dataset |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Uima_json |
UIMA serialisation in JSON |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Cadixe_json |
|
enumeration |
http://w3id.org/meta-share/omtd-share/DatabaseFormat |
Formats used for databases |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Jdbc |
|
enumeration |
http://w3id.org/meta-share/omtd-share/MsAccessDatabase |
Data format for Microsoft Access database |
|
enumeration |
http://w3id.org/meta-share/omtd-share/BinaryFormat |
Any format of a computer file in which information is stored in the form of ones and zeros, or in some other binary (two-state) sequence; used mainly for executable files or files that need to be interpreted by a computer program |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Pdf |
Data format for PDF files (Portable Document Format) |
|
enumeration |
http://w3id.org/meta-share/omtd-share/FastInfoset |
A compressed binary encoding of GATE XML |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Solr |
|
enumeration |
http://w3id.org/meta-share/omtd-share/GateFormat |
Formats used for the GATE framework |
|
enumeration |
http://w3id.org/meta-share/omtd-share/GateXml |
XML-based format for GATE components |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Datasift_json |
Common format for social media data from http://datasift.com |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Gate_twitter_json |
A Twitter-style JSON format used for GATE documents |
|
enumeration |
http://w3id.org/meta-share/omtd-share/FastInfoset |
A compressed binary encoding of GATE XML |
|
enumeration |
http://w3id.org/meta-share/omtd-share/AnnotationFormat |
Any format used for annotated files |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Nif |
The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations; it consists of specifications, ontologies and software (overview), which are combined under the version identifier "NIF 2.0", but are versioned individually |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Anafora |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Tuepp |
Format of the Tübingen Partially Parsed Corpus of Written German (TüPP-D/Z) XML files; TüPP D/Z (http://www.sfs.uni-tuebingen.de/de/ascl/ressourcen/corpora/tuepp-dz.html) is a collection of articles from the German newspaper taz (die tageszeitung) annotated and encoded in a XML format. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/DkproTokenized |
DkPro format for tokenized files containing one sentence per line and tokens split by whitespaces. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Html5Microdata |
Format according to the specifications of HTML5 Microdata |
|
enumeration |
http://w3id.org/meta-share/omtd-share/MsExcel |
Data format for Microsoft Excel documents |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Brat |
BRAT stand-off format for annotations (BRAT is a online environment for collaborative text annotation, cf. http://brat.nlplab.org/) |
|
enumeration |
http://w3id.org/meta-share/omtd-share/InlineXml |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Lll |
Format of the LLL challenge |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Cadixe_json |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Diaml |
Format following Dialogue Act Markup Language (DiAML) which is defined within the ISO standard 24617-2 |
|
enumeration |
http://w3id.org/meta-share/omtd-share/I2b2 |
Format of the I2B2 challenge |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Naf |
The NAF format is linguistic annotation format designed for complex NLP pipelines. NAF combines strengths of the Linguistic Annotation Framework (LAF) as described in Ide et al. (2003) and the NLP Interchange Format (Hellman et al. 2013, NIF). |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Pml |
Format according to the Prague Markup Language (http://ufal.mff.cuni.cz/jazz/PML/index_en.html); PML is a generic data format based on XML intended for storing linguistically annotated data, such as the Prague Dependency Treebank, also annotation lexicons, etc. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Ptb |
|
enumeration |
http://w3id.org/meta-share/omtd-share/PtbChunked |
Penn Treebank chunked format |
|
enumeration |
http://w3id.org/meta-share/omtd-share/PtbCombined |
Penn Treebank combined format |
|
enumeration |
http://w3id.org/meta-share/omtd-share/TigerXml |
The TIGER XML format was created for encoding syntactic constituency structures in the German TIGER corpus. It has since been used for many other corpora as well. TIGERSearch is a linguistic search engine specifically targetting this format. The format has later been extended to also support semantic frame annotations. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Tmx |
The purpose of the TMX format is to provide a standard method to describe translation memory data that is being exchanged among tools and/or translation vendors, while introducing little or no loss of critical data during the process. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/MalletLdaTopicProportionsSorted |
Topic proportions in the shape [\t]\t\t... sorted |
|
enumeration |
http://w3id.org/meta-share/omtd-share/WebAnnotationFormat |
A structured model and format to enable annotations to be shared and reused across different hardware and software platforms. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/NegraExport |
Export format for annotated corpora in the NeGra project |
|
enumeration |
http://w3id.org/meta-share/omtd-share/FactoredTagLemFormat |
Factored tag lemma format |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Folia |
FoLiA is an XML-based annotation format, suitable for the representation of linguistically annotated language resources |
|
enumeration |
http://w3id.org/meta-share/omtd-share/AlvisEnrichedDocumentFormat |
Format for linguistic annotations of documents used for the ALVIS framework |
|
enumeration |
http://w3id.org/meta-share/omtd-share/ConllFormat |
Formats used in the CoNLL Shared Tasks |
|
enumeration |
http://w3id.org/meta-share/omtd-share/ConllU |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Conll2012 |
The CoNLL 2012 format targets semantic role labeling and coreference. Columns are tab-separated. Sentences are separated by a blank new line. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Conll2003 |
The CoNLL 2004 format encodes named entity spans and chunk spans. Fields are separated by a single space. Sentences are separated by a blank new line. Named entities and chunks are encoded in the IOB1 format. I.e. a B prefix is only used if the category of the following span differs from the category of the current span. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Conll2002 |
The CoNLL 2002 format encodes named entity spans. Fields are separated by a single space. Sentences are separated by a blank new line. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Conll2008 |
The CoNLL 2008 format targets syntactic and semantic dependencies. Columns are tab-separated. Sentences are separated by a blank new line. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Conll2000 |
The CoNLL 2000 format represents POS and Chunk tags. Fields in a line are separated by spaces. Sentences are separated by a blank new line. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Conll2009 |
The CoNLL 2009 format targets semantic role labeling. Columns are tab-separated. Sentences are separated by a blank new line. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Conll2006 |
The CoNLL 2006 (aka CoNLL-X) format targets dependency parsing. Columns are tab-separated. Sentences are separated by a blank new line. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Xces |
Data format for documents and corpora using the XCES standard (Corpus Encoding Standard for XML), cf. http://www.xces.org/ |
|
enumeration |
http://w3id.org/meta-share/omtd-share/XcesIlspVariant |
A variant of XCES implemented for documents |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Tei |
Data format for TEI-encoded (Text Encoding Initiative) texts |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Tcf |
An XML data exchange format developed within the WebLicht architecture to facilitate efficient interoperability between the tools; it allows the various linguistic annotations produced by the tools within WebLicht to be stored in one document; it supports incremental enrichment of linguistic annotations at various levels of analysis in a stand-off XML‐based format |
|
enumeration |
http://w3id.org/meta-share/omtd-share/MalletLdaTopicProportions |
Topic proportions in the shape [\t]\t\t... |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Kaf |
KAF (also known as Knowledge Annotation Format) is a language neutral annotation format representing both morpho-syntactic and semantic annotation of documents through a stand-off multilayered structure |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Tgrep2 |
Format for TGrep2 (search engine for searching syntactic parse trees represented as bracketed structures) |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Graf |
GrAF (Graph Annotation Format) is an extension of the Linguistic Annotation Framework (LAF) |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Emma |
Data format according to the EMMA (Extensible MultiModal Annotation markup language) specifications, cf. https://www.w3.org/TR/2007/CR-emma-20071211/ |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Chat |
CHAT (Codes for the Human Analysis of Transcripts) transcription format; used by CHILDES corpora |
|
enumeration |
http://w3id.org/meta-share/omtd-share/RdfFormats |
Formats for RDF (Resource Description Framework) resources |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Rdf_xml |
Data format for RDF (Resource Description Framework) XML format; RDF/XML is a serialisation for RDF |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Turtle |
Textual syntax for RDF that allows an RDF graph to be completely written in a compact and natural text form, with abbreviations for common usage patterns and datatypes. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Nif |
The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations; it consists of specifications, ontologies and software (overview), which are combined under the version identifier "NIF 2.0", but are versioned individually |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Owl |
Superclass for formats used for OWL |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Owl_xml |
XML format for OWL ontologies |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Obo |
Serialization format for ontologies according to the Open Biomedical Ontologies model. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/TabularFormat |
Any format based on columns |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Csv |
Data format with comma-separated values |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Imscwb |
A tab-separated format with limited markup (e.g. for sentences, documents, but not recursive structures like parse-trees) used by the IMS Open Corpus Workbench. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/OasisSpreadsheet |
|
enumeration |
http://w3id.org/meta-share/omtd-share/ConllFormat |
Formats used in the CoNLL Shared Tasks |
|
enumeration |
http://w3id.org/meta-share/omtd-share/ConllU |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Conll2012 |
The CoNLL 2012 format targets semantic role labeling and coreference. Columns are tab-separated. Sentences are separated by a blank new line. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Conll2003 |
The CoNLL 2004 format encodes named entity spans and chunk spans. Fields are separated by a single space. Sentences are separated by a blank new line. Named entities and chunks are encoded in the IOB1 format. I.e. a B prefix is only used if the category of the following span differs from the category of the current span. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Conll2002 |
The CoNLL 2002 format encodes named entity spans. Fields are separated by a single space. Sentences are separated by a blank new line. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Conll2008 |
The CoNLL 2008 format targets syntactic and semantic dependencies. Columns are tab-separated. Sentences are separated by a blank new line. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Conll2000 |
The CoNLL 2000 format represents POS and Chunk tags. Fields in a line are separated by spaces. Sentences are separated by a blank new line. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Conll2009 |
The CoNLL 2009 format targets semantic role labeling. Columns are tab-separated. Sentences are separated by a blank new line. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Conll2006 |
The CoNLL 2006 (aka CoNLL-X) format targets dependency parsing. Columns are tab-separated. Sentences are separated by a blank new line. |
|
enumeration |
http://w3id.org/meta-share/omtd-share/MsExcel |
Data format for Microsoft Excel documents |
|
enumeration |
http://w3id.org/meta-share/omtd-share/OpenOfficeSpreadsheet |
|
enumeration |
http://w3id.org/meta-share/omtd-share/Tsv |
Format for files with tab-separated values |
|
enumeration |
http://w3id.org/meta-share/omtd-share/LinkedDataFormat |
Formats used for linked data |
|
enumeration |
http://w3id.org/meta-share/omtd-share/LD_json |
Data format encoding Linked Data using JSON |
|