DocWire DocToText - Powered by Silvercoders 5.0.5
A multifaceted, data extraction software development toolkit that converts all sorts of files to plain text and html. Written in C++, this data extraction tool has a parser able to convert PST & OST files along with a brand new API for better file processing. To enhance its utility, DocToText, as a data extraction tool, can be integrated with other data mining and data analytics applications. It comes equipped with a high grade, scriptable and trainable OCR that has LSTM neural networks based character recognition. This document parser is able to extract metadata along with annotations and supports a list of formats that include: DOC, XLS, XLSB, PPT, RTF, ODF (ODT, ODS, ODP), OOXML (DOCX, XLSX, PPTX), iWork (PAGES, NUMBERS, KEYNOTE), ODFXML (FODP, FODS, FODT), PDF, EML, HTML, Outlook (PST, OST), Image (JPG, JPEG, JFIF, BMP, PNM, PNG, TIFF, WEBP) and DICOM (DCM)
doctotext::StandardTag Class Reference

Contains set of basic tags using in parsers. More...

#include <parser.h>

Static Public Attributes

static const std::string TAG_P = "p"
 Tag for paragraph. More...
 
static const std::string TAG_CLOSE_P = "/p"
 Tag for closing paragraph. More...
 
static const std::string TAG_BR = "br"
 Tag for line break. More...
 
static const std::string TAG_B = "b"
 Tag for bold. More...
 
static const std::string TAG_CLOSE_B = "/b"
 Tag for closing bold. More...
 
static const std::string TAG_I = "i"
 Tag for italic. More...
 
static const std::string TAG_CLOSE_I = "/i"
 Tag for closing italic. More...
 
static const std::string TAG_U = "u"
 Tag for underline. More...
 
static const std::string TAG_CLOSE_U = "/u"
 Tag for closing underline. More...
 
static const std::string TAG_TABLE = "table"
 Tag for table. More...
 
static const std::string TAG_CLOSE_TABLE = "/table"
 Tag for closing table. More...
 
static const std::string TAG_TR = "tr"
 Tag for table row. More...
 
static const std::string TAG_CLOSE_TR = "/tr"
 Tag for closing table row. More...
 
static const std::string TAG_TD = "td"
 Tag for table cell. More...
 
static const std::string TAG_CLOSE_TD = "/td"
 Tag for closing table cell. More...
 
static const std::string TAG_TEXT = "#text"
 Tag for text. More...
 
static const std::string TAG_LINK = "a"
 Tag for link. Attributes: "url": std::string. More...
 
static const std::string TAG_CLOSE_LINK = "/a"
 Tag for link. More...
 
static const std::string TAG_STYLE = "style"
 Tag for style. More...
 
static const std::string TAG_CLOSE_STYLE = "/style"
 Tag for close style. More...
 
static const std::string TAG_LIST = "list"
 Tag for list. Attributes: "is_ordered": bool (def. is false), "list_style_prefix": std::string. More...
 
static const std::string TAG_CLOSE_LIST = "/list"
 Tag for closing list. More...
 
static const std::string TAG_LIST_ITEM = "list-item"
 Tag for list item. More...
 
static const std::string TAG_CLOSE_LIST_ITEM = "/list-item"
 Tag for closing list item. More...
 
static const std::string TAG_MAIL = "mail"
 Tag for mail. Attributes: "subject": std::string, "date": uint (unix timestamp). More...
 
static const std::string TAG_CLOSE_MAIL = "/mail"
 Tag for closing mail. More...
 
static const std::string TAG_MAIL_BODY = "mail-body"
 Tag for mail body. More...
 
static const std::string TAG_CLOSE_MAIL_BODY = "/mail-body"
 Tag for closing mail body. More...
 
static const std::string TAG_ATTACHMENT = "attachment"
 Tag for attachment. If you set skip in this tag, then the attachment won't be parsed. Attributes: "name": std::string, "size": uint, "extension": std::string. More...
 
static const std::string TAG_CLOSE_ATTACHMENT = "/attachment"
 Tag for closing attachment. More...
 
static const std::string TAG_FOLDER = "folder"
 Tag for folder. If you set skip in this tag, then the folder won't be parsed. Attributes: "name": std::string. More...
 
static const std::string TAG_CLOSE_FOLDER = "/folder"
 Tag for closing folder. More...
 
static const std::string TAG_METADATA = "metadata"
 Tag for metadata. More...
 
static const std::string TAG_COMMENT = "comment"
 Tag for comments. Attributes: "author": std::string, "time": std::string (format:(yyyy-mm-ddThh:mm:ss)), "comment": std::string. More...
 
static const std::string TAG_PAGE = "new-page"
 Tag for page. This tag is sent before parsing the page, so if we set in this tag, then the page won't be parsed. More...
 
static const std::string TAG_CLOSE_PAGE = "/new-page"
 Tag for closing page. More...
 

Detailed Description

Contains set of basic tags using in parsers.

Definition at line 53 of file parser.h.

Member Data Documentation

◆ TAG_ATTACHMENT

const std::string doctotext::StandardTag::TAG_ATTACHMENT = "attachment"
inlinestatic

Tag for attachment. If you set skip in this tag, then the attachment won't be parsed. Attributes: "name": std::string, "size": uint, "extension": std::string.

Definition at line 86 of file parser.h.

◆ TAG_B

const std::string doctotext::StandardTag::TAG_B = "b"
inlinestatic

Tag for bold.

Definition at line 59 of file parser.h.

◆ TAG_BR

const std::string doctotext::StandardTag::TAG_BR = "br"
inlinestatic

Tag for line break.

Definition at line 58 of file parser.h.

◆ TAG_CLOSE_ATTACHMENT

const std::string doctotext::StandardTag::TAG_CLOSE_ATTACHMENT = "/attachment"
inlinestatic

Tag for closing attachment.

Definition at line 87 of file parser.h.

◆ TAG_CLOSE_B

const std::string doctotext::StandardTag::TAG_CLOSE_B = "/b"
inlinestatic

Tag for closing bold.

Definition at line 60 of file parser.h.

◆ TAG_CLOSE_FOLDER

const std::string doctotext::StandardTag::TAG_CLOSE_FOLDER = "/folder"
inlinestatic

Tag for closing folder.

Definition at line 89 of file parser.h.

◆ TAG_CLOSE_I

const std::string doctotext::StandardTag::TAG_CLOSE_I = "/i"
inlinestatic

Tag for closing italic.

Definition at line 62 of file parser.h.

◆ TAG_CLOSE_LINK

const std::string doctotext::StandardTag::TAG_CLOSE_LINK = "/a"
inlinestatic

Tag for link.

Definition at line 73 of file parser.h.

◆ TAG_CLOSE_LIST

const std::string doctotext::StandardTag::TAG_CLOSE_LIST = "/list"
inlinestatic

Tag for closing list.

Definition at line 78 of file parser.h.

◆ TAG_CLOSE_LIST_ITEM

const std::string doctotext::StandardTag::TAG_CLOSE_LIST_ITEM = "/list-item"
inlinestatic

Tag for closing list item.

Definition at line 80 of file parser.h.

◆ TAG_CLOSE_MAIL

const std::string doctotext::StandardTag::TAG_CLOSE_MAIL = "/mail"
inlinestatic

Tag for closing mail.

Definition at line 83 of file parser.h.

◆ TAG_CLOSE_MAIL_BODY

const std::string doctotext::StandardTag::TAG_CLOSE_MAIL_BODY = "/mail-body"
inlinestatic

Tag for closing mail body.

Definition at line 85 of file parser.h.

◆ TAG_CLOSE_P

const std::string doctotext::StandardTag::TAG_CLOSE_P = "/p"
inlinestatic

Tag for closing paragraph.

Definition at line 57 of file parser.h.

◆ TAG_CLOSE_PAGE

const std::string doctotext::StandardTag::TAG_CLOSE_PAGE = "/new-page"
inlinestatic

Tag for closing page.

Definition at line 95 of file parser.h.

◆ TAG_CLOSE_STYLE

const std::string doctotext::StandardTag::TAG_CLOSE_STYLE = "/style"
inlinestatic

Tag for close style.

Definition at line 75 of file parser.h.

◆ TAG_CLOSE_TABLE

const std::string doctotext::StandardTag::TAG_CLOSE_TABLE = "/table"
inlinestatic

Tag for closing table.

Definition at line 66 of file parser.h.

◆ TAG_CLOSE_TD

const std::string doctotext::StandardTag::TAG_CLOSE_TD = "/td"
inlinestatic

Tag for closing table cell.

Definition at line 70 of file parser.h.

◆ TAG_CLOSE_TR

const std::string doctotext::StandardTag::TAG_CLOSE_TR = "/tr"
inlinestatic

Tag for closing table row.

Definition at line 68 of file parser.h.

◆ TAG_CLOSE_U

const std::string doctotext::StandardTag::TAG_CLOSE_U = "/u"
inlinestatic

Tag for closing underline.

Definition at line 64 of file parser.h.

◆ TAG_COMMENT

const std::string doctotext::StandardTag::TAG_COMMENT = "comment"
inlinestatic

Tag for comments. Attributes: "author": std::string, "time": std::string (format:(yyyy-mm-ddThh:mm:ss)), "comment": std::string.

Definition at line 92 of file parser.h.

◆ TAG_FOLDER

const std::string doctotext::StandardTag::TAG_FOLDER = "folder"
inlinestatic

Tag for folder. If you set skip in this tag, then the folder won't be parsed. Attributes: "name": std::string.

Definition at line 88 of file parser.h.

◆ TAG_I

const std::string doctotext::StandardTag::TAG_I = "i"
inlinestatic

Tag for italic.

Definition at line 61 of file parser.h.

◆ TAG_LINK

const std::string doctotext::StandardTag::TAG_LINK = "a"
inlinestatic

Tag for link. Attributes: "url": std::string.

Definition at line 72 of file parser.h.

◆ TAG_LIST

const std::string doctotext::StandardTag::TAG_LIST = "list"
inlinestatic

Tag for list. Attributes: "is_ordered": bool (def. is false), "list_style_prefix": std::string.

Definition at line 77 of file parser.h.

◆ TAG_LIST_ITEM

const std::string doctotext::StandardTag::TAG_LIST_ITEM = "list-item"
inlinestatic

Tag for list item.

Definition at line 79 of file parser.h.

◆ TAG_MAIL

const std::string doctotext::StandardTag::TAG_MAIL = "mail"
inlinestatic

Tag for mail. Attributes: "subject": std::string, "date": uint (unix timestamp).

Examples
example_3.cpp, example_4.cpp, example_6.cpp, and example_8.cpp.

Definition at line 82 of file parser.h.

◆ TAG_MAIL_BODY

const std::string doctotext::StandardTag::TAG_MAIL_BODY = "mail-body"
inlinestatic

Tag for mail body.

Definition at line 84 of file parser.h.

◆ TAG_METADATA

const std::string doctotext::StandardTag::TAG_METADATA = "metadata"
inlinestatic

Tag for metadata.

Definition at line 91 of file parser.h.

◆ TAG_P

const std::string doctotext::StandardTag::TAG_P = "p"
inlinestatic

Tag for paragraph.

Definition at line 56 of file parser.h.

◆ TAG_PAGE

const std::string doctotext::StandardTag::TAG_PAGE = "new-page"
inlinestatic

Tag for page. This tag is sent before parsing the page, so if we set in this tag, then the page won't be parsed.

Definition at line 94 of file parser.h.

◆ TAG_STYLE

const std::string doctotext::StandardTag::TAG_STYLE = "style"
inlinestatic

Tag for style.

Definition at line 74 of file parser.h.

◆ TAG_TABLE

const std::string doctotext::StandardTag::TAG_TABLE = "table"
inlinestatic

Tag for table.

Definition at line 65 of file parser.h.

◆ TAG_TD

const std::string doctotext::StandardTag::TAG_TD = "td"
inlinestatic

Tag for table cell.

Definition at line 69 of file parser.h.

◆ TAG_TEXT

const std::string doctotext::StandardTag::TAG_TEXT = "#text"
inlinestatic

Tag for text.

Definition at line 71 of file parser.h.

◆ TAG_TR

const std::string doctotext::StandardTag::TAG_TR = "tr"
inlinestatic

Tag for table row.

Definition at line 67 of file parser.h.

◆ TAG_U

const std::string doctotext::StandardTag::TAG_U = "u"
inlinestatic

Tag for underline.

Definition at line 63 of file parser.h.


The documentation for this class was generated from the following file: