|
DocWire DocToText - Powered by Silvercoders 5.0.5
A multifaceted, data extraction software development toolkit that converts all sorts of files to plain text and html. Written in C++, this data extraction tool has a parser able to convert PST & OST files along with a brand new API for better file processing. To enhance its utility, DocToText, as a data extraction tool, can be integrated with other data mining and data analytics applications. It comes equipped with a high grade, scriptable and trainable OCR that has LSTM neural networks based character recognition. This document parser is able to extract metadata along with annotations and supports a list of formats that include: DOC, XLS, XLSB, PPT, RTF, ODF (ODT, ODS, ODP), OOXML (DOCX, XLSX, PPTX), iWork (PAGES, NUMBERS, KEYNOTE), ODFXML (FODP, FODS, FODT), PDF, EML, HTML, Outlook (PST, OST), Image (JPG, JPEG, JFIF, BMP, PNM, PNG, TIFF, WEBP) and DICOM (DCM)
|
Provides the basic mechanism to build any parser. More...
#include <parser_wrapper.h>


Public Member Functions | |
| std::unique_ptr< doctotext::Parser > | build (const std::string &inFileName) const override |
| Builds new parser object. More... | |
| std::unique_ptr< doctotext::Parser > | build (const char *buffer, size_t size) const override |
| Builds new parser object. More... | |
| doctotext::ParserBuilder & | withLogStream (std::ostream *log_stream) override |
| Sets log stream for parser. More... | |
| doctotext::ParserBuilder & | withVerboseLogging (bool verbose) override |
| Turns on/off verbose logging. More... | |
| doctotext::ParserBuilder & | withOnNewNodeCallbacks (const std::vector< doctotext::NewNodeCallback > &callbacks) override |
| doctotext::ParserBuilder & | withParserManager (const std::shared_ptr< doctotext::ParserManager > &inParserManager) override |
| doctotext::ParserBuilder & | withParameters (const ParserParameters &inParameter) override |
| Sets parser parameters. More... | |
Public Member Functions inherited from doctotext::ParserBuilder | |
| virtual std::unique_ptr< Parser > | build (const std::string &inFileName) const =0 |
| Builds new parser object. More... | |
| virtual std::unique_ptr< Parser > | build (const char *buffer, size_t size) const =0 |
| Builds new parser object. More... | |
| virtual ParserBuilder & | withLogStream (std::ostream *log_stream)=0 |
| Sets log stream for parser. More... | |
| virtual ParserBuilder & | withVerboseLogging (bool verbose)=0 |
| Turns on/off verbose logging. More... | |
| virtual ParserBuilder & | withOnNewNodeCallbacks (const std::vector< NewNodeCallback > &callbacks)=0 |
| Adds callback function. More... | |
| virtual ParserBuilder & | withParserManager (const std::shared_ptr< ParserManager > &inParserManager)=0 |
| Sets parser manager. More... | |
| virtual ParserBuilder & | withParameters (const ParserParameters &inParameters)=0 |
| Sets parser parameters. More... | |
Provides the basic mechanism to build any parser.
| ParserCreator | type of parser to build |
Definition at line 123 of file parser_wrapper.h.
|
inline |
Definition at line 126 of file parser_wrapper.h.
|
inlineoverridevirtual |
Builds new parser object.
| buffer | raw data of file to be parsed |
| size | file size |
Implements doctotext::ParserBuilder.
Definition at line 147 of file parser_wrapper.h.
|
inlineoverridevirtual |
Builds new parser object.
| inFileName | path to file |
Implements doctotext::ParserBuilder.
Definition at line 135 of file parser_wrapper.h.
|
inlineoverridevirtual |
Sets log stream for parser.
| log_stream |
Implements doctotext::ParserBuilder.
Definition at line 160 of file parser_wrapper.h.
|
inlineoverride |
Definition at line 174 of file parser_wrapper.h.
|
inlineoverridevirtual |
Sets parser parameters.
| inParameters |
Implements doctotext::ParserBuilder.
Definition at line 188 of file parser_wrapper.h.
|
inlineoverride |
Definition at line 181 of file parser_wrapper.h.
|
inlineoverridevirtual |
Turns on/off verbose logging.
| verbose |
Implements doctotext::ParserBuilder.
Definition at line 167 of file parser_wrapper.h.