|
DocWire DocToText - Powered by Silvercoders 5.0.5
A multifaceted, data extraction software development toolkit that converts all sorts of files to plain text and html. Written in C++, this data extraction tool has a parser able to convert PST & OST files along with a brand new API for better file processing. To enhance its utility, DocToText, as a data extraction tool, can be integrated with other data mining and data analytics applications. It comes equipped with a high grade, scriptable and trainable OCR that has LSTM neural networks based character recognition. This document parser is able to extract metadata along with annotations and supports a list of formats that include: DOC, XLS, XLSB, PPT, RTF, ODF (ODT, ODS, ODP), OOXML (DOCX, XLSX, PPTX), iWork (PAGES, NUMBERS, KEYNOTE), ODFXML (FODP, FODS, FODT), PDF, EML, HTML, Outlook (PST, OST), Image (JPG, JPEG, JFIF, BMP, PNM, PNG, TIFF, WEBP) and DICOM (DCM)
|
Sets of standard filters to use in parsers. example of use: More...
#include <standard_filter.h>
Static Public Member Functions | |
| static doctotext::NewNodeCallback | filterByFolderName (const std::vector< std::string > &names) |
| Filters folders by name. Keeps only folders with names that exist in the given list. More... | |
| static doctotext::NewNodeCallback | filterByAttachmentType (const std::vector< std::string > &types) |
| Filters attachments by type. Keeps only attachments with type that exist in the given list. More... | |
| static doctotext::NewNodeCallback | filterByMailMinCreationTime (unsigned int min_time) |
| Filters mail by creation date. Keeps only mails that are created after the given date. More... | |
| static doctotext::NewNodeCallback | filterByMailMaxCreationTime (unsigned int max_time) |
| Filters mail by creation date. Keeps only mails that are created before the given date. More... | |
| static doctotext::NewNodeCallback | filterByMaxNodeNumber (unsigned int max_nodes) |
Sets of standard filters to use in parsers. example of use:
Definition at line 52 of file standard_filter.h.
|
static |
Filters attachments by type. Keeps only attachments with type that exist in the given list.
| types | list of types to keep |
|
static |
Filters folders by name. Keeps only folders with names that exist in the given list.
| names | list of names to keep |
|
static |
Filters mail by creation date. Keeps only mails that are created before the given date.
| max_time | maximum time to keep |
|
static |
Filters mail by creation date. Keeps only mails that are created after the given date.
| min_time | minimum time to keep |
|
static |
| max_nodes |