DocWire DocToText - Powered by Silvercoders 5.0.5
A multifaceted, data extraction software development toolkit that converts all sorts of files to plain text and html. Written in C++, this data extraction tool has a parser able to convert PST & OST files along with a brand new API for better file processing. To enhance its utility, DocToText, as a data extraction tool, can be integrated with other data mining and data analytics applications. It comes equipped with a high grade, scriptable and trainable OCR that has LSTM neural networks based character recognition. This document parser is able to extract metadata along with annotations and supports a list of formats that include: DOC, XLS, XLSB, PPT, RTF, ODF (ODT, ODS, ODP), OOXML (DOCX, XLSX, PPTX), iWork (PAGES, NUMBERS, KEYNOTE), ODFXML (FODP, FODS, FODT), PDF, EML, HTML, Outlook (PST, OST), Image (JPG, JPEG, JFIF, BMP, PNM, PNG, TIFF, WEBP) and DICOM (DCM)
example_5.cpp

[example_cpp]

[example_cpp]

#include <algorithm>
#include <iostream>
#include <memory>
#include "parser.h"
#include "parser_builder.h"
#include "plain_text_writer.h"
int main(int argc, char* argv[])
{
doctotext::ParserManager parser_manager; // Create parser manager (load parsers)
std::string path = argv[1];
auto parser_builder = parser_manager.findParserByExtension(path); // get the parser builder by extension
auto plain_text_writer = std::make_shared<doctotext::PlainTextWriter>(); // create a plain text writer
plain_text_writer->write_header(std::cout); // write the header to the output stream
if (parser_builder) // if parser builder exists
{
(*parser_builder)->build(path) // build the parser
->addOnNewNodeCallback([&plain_text_writer](doctotext::Info &info) // add a callback function
{
plain_text_writer->write_to(info,
std::cout); // write the node to the output stream
})
.parse(); // start the parsing process
}
plain_text_writer->write_footer(std::cout); // write the footer to the output stream
return 0;
}
Parser manager class. Loads all available parsers and provides access to them.
std::optional< ParserBuilder * > findParserByExtension(const std::string &file_name) const
Returns parser builder for given extension type or nullopt if no parser is found.