|
DocWire DocToText - Powered by Silvercoders 5.0.5
A multifaceted, data extraction software development toolkit that converts all sorts of files to plain text and html. Written in C++, this data extraction tool has a parser able to convert PST & OST files along with a brand new API for better file processing. To enhance its utility, DocToText, as a data extraction tool, can be integrated with other data mining and data analytics applications. It comes equipped with a high grade, scriptable and trainable OCR that has LSTM neural networks based character recognition. This document parser is able to extract metadata along with annotations and supports a list of formats that include: DOC, XLS, XLSB, PPT, RTF, ODF (ODT, ODS, ODP), OOXML (DOCX, XLSX, PPTX), iWork (PAGES, NUMBERS, KEYNOTE), ODFXML (FODP, FODS, FODT), PDF, EML, HTML, Outlook (PST, OST), Image (JPG, JPEG, JFIF, BMP, PNM, PNG, TIFF, WEBP) and DICOM (DCM)
|
Contains set of basic tags using in parsers. More...
#include <parser.h>
Static Public Attributes | |
| static const std::string | TAG_P = "p" |
| Tag for paragraph. More... | |
| static const std::string | TAG_CLOSE_P = "/p" |
| Tag for closing paragraph. More... | |
| static const std::string | TAG_BR = "br" |
| Tag for line break. More... | |
| static const std::string | TAG_B = "b" |
| Tag for bold. More... | |
| static const std::string | TAG_CLOSE_B = "/b" |
| Tag for closing bold. More... | |
| static const std::string | TAG_I = "i" |
| Tag for italic. More... | |
| static const std::string | TAG_CLOSE_I = "/i" |
| Tag for closing italic. More... | |
| static const std::string | TAG_U = "u" |
| Tag for underline. More... | |
| static const std::string | TAG_CLOSE_U = "/u" |
| Tag for closing underline. More... | |
| static const std::string | TAG_TABLE = "table" |
| Tag for table. More... | |
| static const std::string | TAG_CLOSE_TABLE = "/table" |
| Tag for closing table. More... | |
| static const std::string | TAG_TR = "tr" |
| Tag for table row. More... | |
| static const std::string | TAG_CLOSE_TR = "/tr" |
| Tag for closing table row. More... | |
| static const std::string | TAG_TD = "td" |
| Tag for table cell. More... | |
| static const std::string | TAG_CLOSE_TD = "/td" |
| Tag for closing table cell. More... | |
| static const std::string | TAG_TEXT = "#text" |
| Tag for text. More... | |
| static const std::string | TAG_LINK = "a" |
| Tag for link. Attributes: "url": std::string. More... | |
| static const std::string | TAG_CLOSE_LINK = "/a" |
| Tag for link. More... | |
| static const std::string | TAG_STYLE = "style" |
| Tag for style. More... | |
| static const std::string | TAG_CLOSE_STYLE = "/style" |
| Tag for close style. More... | |
| static const std::string | TAG_LIST = "list" |
| Tag for list. Attributes: "is_ordered": bool (def. is false), "list_style_prefix": std::string. More... | |
| static const std::string | TAG_CLOSE_LIST = "/list" |
| Tag for closing list. More... | |
| static const std::string | TAG_LIST_ITEM = "list-item" |
| Tag for list item. More... | |
| static const std::string | TAG_CLOSE_LIST_ITEM = "/list-item" |
| Tag for closing list item. More... | |
| static const std::string | TAG_MAIL = "mail" |
| Tag for mail. Attributes: "subject": std::string, "date": uint (unix timestamp). More... | |
| static const std::string | TAG_CLOSE_MAIL = "/mail" |
| Tag for closing mail. More... | |
| static const std::string | TAG_MAIL_BODY = "mail-body" |
| Tag for mail body. More... | |
| static const std::string | TAG_CLOSE_MAIL_BODY = "/mail-body" |
| Tag for closing mail body. More... | |
| static const std::string | TAG_ATTACHMENT = "attachment" |
| Tag for attachment. If you set skip in this tag, then the attachment won't be parsed. Attributes: "name": std::string, "size": uint, "extension": std::string. More... | |
| static const std::string | TAG_CLOSE_ATTACHMENT = "/attachment" |
| Tag for closing attachment. More... | |
| static const std::string | TAG_FOLDER = "folder" |
| Tag for folder. If you set skip in this tag, then the folder won't be parsed. Attributes: "name": std::string. More... | |
| static const std::string | TAG_CLOSE_FOLDER = "/folder" |
| Tag for closing folder. More... | |
| static const std::string | TAG_METADATA = "metadata" |
| Tag for metadata. More... | |
| static const std::string | TAG_COMMENT = "comment" |
| Tag for comments. Attributes: "author": std::string, "time": std::string (format:(yyyy-mm-ddThh:mm:ss)), "comment": std::string. More... | |
| static const std::string | TAG_PAGE = "new-page" |
| Tag for page. This tag is sent before parsing the page, so if we set in this tag, then the page won't be parsed. More... | |
| static const std::string | TAG_CLOSE_PAGE = "/new-page" |
| Tag for closing page. More... | |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
Tag for mail. Attributes: "subject": std::string, "date": uint (unix timestamp).
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |