DocWire DocToText - Powered by Silvercoders 5.0.5
A multifaceted, data extraction software development toolkit that converts all sorts of files to plain text and html. Written in C++, this data extraction tool has a parser able to convert PST & OST files along with a brand new API for better file processing. To enhance its utility, DocToText, as a data extraction tool, can be integrated with other data mining and data analytics applications. It comes equipped with a high grade, scriptable and trainable OCR that has LSTM neural networks based character recognition. This document parser is able to extract metadata along with annotations and supports a list of formats that include: DOC, XLS, XLSB, PPT, RTF, ODF (ODT, ODS, ODP), OOXML (DOCX, XLSX, PPTX), iWork (PAGES, NUMBERS, KEYNOTE), ODFXML (FODP, FODS, FODT), PDF, EML, HTML, Outlook (PST, OST), Image (JPG, JPEG, JFIF, BMP, PNM, PNG, TIFF, WEBP) and DICOM (DCM)
importer.h
1/***************************************************************************************************************************************************/
2/* DocToText - A multifaceted, data extraction software development toolkit that converts all sorts of files to plain text and html. */
3/* Written in C++, this data extraction tool has a parser able to convert PST & OST files along with a brand new API for better file processing. */
4/* To enhance its utility, DocToText, as a data extraction tool, can be integrated with other data mining and data analytics applications. */
5/* It comes equipped with a high grade, scriptable and trainable OCR that has LSTM neural networks based character recognition. */
6/* */
7/* This document parser is able to extract metadata along with annotations and supports a list of formats that include: */
8/* DOC, XLS, XLSB, PPT, RTF, ODF (ODT, ODS, ODP), OOXML (DOCX, XLSX, PPTX), iWork (PAGES, NUMBERS, KEYNOTE), ODFXML (FODP, FODS, FODT), */
9/* PDF, EML, HTML, Outlook (PST, OST), Image (JPG, JPEG, JFIF, BMP, PNM, PNG, TIFF, WEBP) and DICOM (DCM) */
10/* */
11/* Copyright (c) SILVERCODERS Ltd */
12/* http://silvercoders.com */
13/* */
14/* Project homepage: */
15/* http://silvercoders.com/en/products/doctotext */
16/* https://www.docwire.io/ */
17/* */
18/* The GNU General Public License version 2 as published by the Free Software Foundation and found in the file COPYING.GPL permits */
19/* the distribution and/or modification of this application. */
20/* */
21/* Please keep in mind that any attempt to circumvent the terms of the GNU General Public License by employing wrappers, pipelines, */
22/* client/server protocols, etc. is illegal. You must purchase a commercial license if your program, which is distributed under a license */
23/* other than the GNU General Public License version 2, directly or indirectly calls any portion of this code. */
24/* Simply stop using the product if you disagree with this viewpoint. */
25/* */
26/* According to the terms of the license provided by SILVERCODERS and included in the file COPYING.COM, licensees in possession of */
27/* a current commercial license for this product may use this file. */
28/* */
29/* This program is provided WITHOUT ANY WARRANTY, not even the implicit warranty of merchantability or fitness for a particular purpose. */
30/* It is supplied in the hope that it will be useful. */
31/***************************************************************************************************************************************************/
32
33#ifndef IMPORTER_H
34#define IMPORTER_H
35
36#include <algorithm>
37#include <memory>
38
39#include "parser.h"
40#include "parser_builder.h"
41#include "parser_manager.h"
42#include "parser_parameters.h"
43#include "defines.h"
44
45namespace doctotext
46{
47
56class DllExport Importer
57{
58public:
63 explicit Importer(const ParserParameters &parameters = ParserParameters(),
64 const std::shared_ptr<ParserManager> &parser_manager = std::make_shared<ParserManager>());
70 Importer(const std::string &file_name,
71 const ParserParameters &parameters = ParserParameters(),
72 const std::shared_ptr<ParserManager> &parser_manager = std::make_shared<ParserManager>());
73
80 Importer(std::istream &input_stream,
81 const ParserParameters &parameters = ParserParameters(),
82 const std::shared_ptr<ParserManager> &parser_manager = std::make_shared<ParserManager>());
83
84 Importer(const Importer &other);
85
86 Importer& operator=(const Importer &other);
87
88 virtual ~Importer();
89
94 void set_input_stream(std::istream &input_stream);
95
100 bool is_valid() const;
101
106 void add_callback(const NewNodeCallback &callback);
107
112 void add_parameters(const ParserParameters &parameters);
113
117 void process() const;
118
123
124private:
125 class Implementation;
126 std::unique_ptr<Implementation> impl;
127};
128
129
130} // namespace doctotext
131
132#endif //IMPORTER_H
The Importer class. This class is used to import a file and parse it using available parsers.
Definition: importer.h:57
bool is_valid() const
Check if Importer contains valid input data (path to file or stream).
Importer(const ParserParameters &parameters=ParserParameters(), const std::shared_ptr< ParserManager > &parser_manager=std::make_shared< ParserManager >())
Importer(std::istream &input_stream, const ParserParameters &parameters=ParserParameters(), const std::shared_ptr< ParserManager > &parser_manager=std::make_shared< ParserManager >())
void add_callback(const NewNodeCallback &callback)
Adds callback. Callbacks will execute when parser returns new node.
void set_input_stream(std::istream &input_stream)
Sets new input stream to parse.
void add_parameters(const ParserParameters &parameters)
Adds parser parameters.
void disconnect_all()
Disconnects all listeners.
void process() const
Starts parsing process.
Importer(const std::string &file_name, const ParserParameters &parameters=ParserParameters(), const std::shared_ptr< ParserManager > &parser_manager=std::make_shared< ParserManager >())
Stores list of parsers parameters. Every parser can query ParserParameter for a specific parameter....