DocWire DocToText - Powered by Silvercoders 5.0.5
A multifaceted, data extraction software development toolkit that converts all sorts of files to plain text and html. Written in C++, this data extraction tool has a parser able to convert PST & OST files along with a brand new API for better file processing. To enhance its utility, DocToText, as a data extraction tool, can be integrated with other data mining and data analytics applications. It comes equipped with a high grade, scriptable and trainable OCR that has LSTM neural networks based character recognition. This document parser is able to extract metadata along with annotations and supports a list of formats that include: DOC, XLS, XLSB, PPT, RTF, ODF (ODT, ODS, ODP), OOXML (DOCX, XLSX, PPTX), iWork (PAGES, NUMBERS, KEYNOTE), ODFXML (FODP, FODS, FODT), PDF, EML, HTML, Outlook (PST, OST), Image (JPG, JPEG, JFIF, BMP, PNM, PNG, TIFF, WEBP) and DICOM (DCM)
doctotext::StandardFilter Class Reference

Sets of standard filters to use in parsers. example of use: More...

#include <standard_filter.h>

Static Public Member Functions

static doctotext::NewNodeCallback filterByFolderName (const std::vector< std::string > &names)
 Filters folders by name. Keeps only folders with names that exist in the given list. More...
 
static doctotext::NewNodeCallback filterByAttachmentType (const std::vector< std::string > &types)
 Filters attachments by type. Keeps only attachments with type that exist in the given list. More...
 
static doctotext::NewNodeCallback filterByMailMinCreationTime (unsigned int min_time)
 Filters mail by creation date. Keeps only mails that are created after the given date. More...
 
static doctotext::NewNodeCallback filterByMailMaxCreationTime (unsigned int max_time)
 Filters mail by creation date. Keeps only mails that are created before the given date. More...
 
static doctotext::NewNodeCallback filterByMaxNodeNumber (unsigned int max_nodes)
 

Detailed Description

Sets of standard filters to use in parsers. example of use:

PSTParser pst_parser("test.pst");
pst_parser.onNewNode(StandardFilter::filterByFolderName({"Inbox", "Sent"}))
.onNewNode(StandardFilter::filterByAttachmentType({"jpg", "png"}))
.parse();
static doctotext::NewNodeCallback filterByAttachmentType(const std::vector< std::string > &types)
Filters attachments by type. Keeps only attachments with type that exist in the given list.
static doctotext::NewNodeCallback filterByFolderName(const std::vector< std::string > &names)
Filters folders by name. Keeps only folders with names that exist in the given list.

Definition at line 52 of file standard_filter.h.

Member Function Documentation

◆ filterByAttachmentType()

static doctotext::NewNodeCallback doctotext::StandardFilter::filterByAttachmentType ( const std::vector< std::string > &  types)
static

Filters attachments by type. Keeps only attachments with type that exist in the given list.

Parameters
typeslist of types to keep

◆ filterByFolderName()

static doctotext::NewNodeCallback doctotext::StandardFilter::filterByFolderName ( const std::vector< std::string > &  names)
static

Filters folders by name. Keeps only folders with names that exist in the given list.

Parameters
nameslist of names to keep

◆ filterByMailMaxCreationTime()

static doctotext::NewNodeCallback doctotext::StandardFilter::filterByMailMaxCreationTime ( unsigned int  max_time)
static

Filters mail by creation date. Keeps only mails that are created before the given date.

Parameters
max_timemaximum time to keep

◆ filterByMailMinCreationTime()

static doctotext::NewNodeCallback doctotext::StandardFilter::filterByMailMinCreationTime ( unsigned int  min_time)
static

Filters mail by creation date. Keeps only mails that are created after the given date.

Parameters
min_timeminimum time to keep

◆ filterByMaxNodeNumber()

static doctotext::NewNodeCallback doctotext::StandardFilter::filterByMaxNodeNumber ( unsigned int  max_nodes)
static
Parameters
max_nodes

The documentation for this class was generated from the following file: