DocWire DocToText - Powered by Silvercoders 5.0.5
A multifaceted, data extraction software development toolkit that converts all sorts of files to plain text and html. Written in C++, this data extraction tool has a parser able to convert PST & OST files along with a brand new API for better file processing. To enhance its utility, DocToText, as a data extraction tool, can be integrated with other data mining and data analytics applications. It comes equipped with a high grade, scriptable and trainable OCR that has LSTM neural networks based character recognition. This document parser is able to extract metadata along with annotations and supports a list of formats that include: DOC, XLS, XLSB, PPT, RTF, ODF (ODT, ODS, ODP), OOXML (DOCX, XLSX, PPTX), iWork (PAGES, NUMBERS, KEYNOTE), ODFXML (FODP, FODS, FODT), PDF, EML, HTML, Outlook (PST, OST), Image (JPG, JPEG, JFIF, BMP, PNM, PNG, TIFF, WEBP) and DICOM (DCM)
doctotext::Parser Class Referenceabstract

Abstract class for all parsers. More...

#include <parser.h>

Inheritance diagram for doctotext::Parser:
Collaboration diagram for doctotext::Parser:

Public Member Functions

 Parser (const std::shared_ptr< doctotext::ParserManager > &inParserManager=nullptr)
 
virtual void parse () const =0
 Executes text parsing. More...
 
virtual ParseraddOnNewNodeCallback (NewNodeCallback callback)
 Adds new function to execute when new node will be created. Node is a part of parsed text. Depends on the kind of parser it could be. For example, email from pst file or page from pdf file. More...
 
virtual ParserwithParameters (const ParserParameters &parameters)
 

Protected Member Functions

FormattingStyle getFormattingStyle () const
 Loads FormattingStyle from ParserParameters. More...
 
std::ostream & getLogOutStream () const
 
bool isVerboseLogging () const
 
Info sendTag (const std::string &tag_name, const std::string &text="", const std::map< std::string, std::any > &attributes={}) const
 
Info sendTag (const Info &info) const
 

Protected Attributes

std::shared_ptr< doctotext::ParserManagerm_parser_manager
 
ParserParameters m_parameters
 

Detailed Description

Abstract class for all parsers.

Examples
example_9.cpp.

Definition at line 129 of file parser.h.

Constructor & Destructor Documentation

◆ Parser()

doctotext::Parser::Parser ( const std::shared_ptr< doctotext::ParserManager > &  inParserManager = nullptr)
explicit
Parameters
inParserManagerparser manager contains all available parsers which could be used recursive

Member Function Documentation

◆ addOnNewNodeCallback()

virtual Parser & doctotext::Parser::addOnNewNodeCallback ( NewNodeCallback  callback)
virtual

Adds new function to execute when new node will be created. Node is a part of parsed text. Depends on the kind of parser it could be. For example, email from pst file or page from pdf file.

Parameters
callbackfunction to execute
Returns
reference to self

◆ getFormattingStyle()

FormattingStyle doctotext::Parser::getFormattingStyle ( ) const
protected

Loads FormattingStyle from ParserParameters.

Returns
Loaded FormattingStyle if exists, otherwise defualt FormattingStyle .

◆ parse()

virtual void doctotext::Parser::parse ( ) const
pure virtual

Executes text parsing.

Implemented in doctotext::ParserWrapper< ParserType >.

Member Data Documentation

◆ m_parameters

ParserParameters doctotext::Parser::m_parameters
protected

Definition at line 169 of file parser.h.

◆ m_parser_manager

std::shared_ptr<doctotext::ParserManager> doctotext::Parser::m_parser_manager
protected

Definition at line 168 of file parser.h.


The documentation for this class was generated from the following file: