|
DocWire DocToText - Powered by Silvercoders 5.0.5
A multifaceted, data extraction software development toolkit that converts all sorts of files to plain text and html. Written in C++, this data extraction tool has a parser able to convert PST & OST files along with a brand new API for better file processing. To enhance its utility, DocToText, as a data extraction tool, can be integrated with other data mining and data analytics applications. It comes equipped with a high grade, scriptable and trainable OCR that has LSTM neural networks based character recognition. This document parser is able to extract metadata along with annotations and supports a list of formats that include: DOC, XLS, XLSB, PPT, RTF, ODF (ODT, ODS, ODP), OOXML (DOCX, XLSX, PPTX), iWork (PAGES, NUMBERS, KEYNOTE), ODFXML (FODP, FODS, FODT), PDF, EML, HTML, Outlook (PST, OST), Image (JPG, JPEG, JFIF, BMP, PNM, PNG, TIFF, WEBP) and DICOM (DCM)
|
| ▼Cstd::exception | |
| ▼Cdoctotext::Exception | |
| Cdoctotext::EncryptedFileException | |
| ▼Cdoctotext::Exporter | Exporter class is responsible for exporting the parsed data from importer or transformer to an output stream |
| Cdoctotext::HtmlExporter | Exporter class for HTML output |
| Cdoctotext::MetaDataExporter | Exporter class for meta data. Important: Exports only meta data as a plain text |
| Cdoctotext::PlainTextExporter | Exporter class for plain text output |
| Cdoctotext::FormattingStyle | |
| Cdoctotext::Importer | The Importer class. This class is used to import a file and parse it using available parsers |
| Cdoctotext::Info | |
| Cdoctotext::ListStyle | |
| Cdoctotext::Metadata | |
| ▼Cdoctotext::Parser | Abstract class for all parsers |
| CCustomParser | |
| Cdoctotext::ParserWrapper< ParserType > | |
| Cdoctotext::parser_creator< ParserType > | |
| ▼Cdoctotext::ParserBuilder | |
| CCustomParserBuilder | |
| Cdoctotext::ParserBuilderWrapper< ParserCreator > | Provides the basic mechanism to build any parser |
| Cdoctotext::ParserManager | Parser manager class. Loads all available parsers and provides access to them |
| Cdoctotext::ParserParameters | Stores list of parsers parameters. Every parser can query ParserParameter for a specific parameter. For example OCRParser queries ParserParameters for a language. Every parser contains ParserParameters and recursively passes it to another parser |
| ▼Cdoctotext::ParserProvider | The ParserProvider class |
| CCustomParserProvider | [plugin_example_1] |
| Cdoctotext::ParsingChain | ParsingChain class is a wrapper for all defined steps of the parsing process |
| Cdoctotext::SimpleExtractor | Basic functionality for extracting text from a document |
| Cdoctotext::StandardFilter | Sets of standard filters to use in parsers. example of use: |
| Cdoctotext::StandardTag | Contains set of basic tags using in parsers |
| ▼Cdoctotext::Transformer | The Transformer transforms data from Importer or from another Transformer |
| Cdoctotext::TransformerFunc | Wraps single function (doctotext::NewNodeCallback) into Transformer object |
| Cdoctotext::wrapper_parser_creator< ParserType > | |
| ▼CWriter | |
| Cdoctotext::HtmlWriter | The HTMLWriter class |
| Cdoctotext::MetaDataWriter | Writes the meta data of the document as plain text to an output stream |
| Cdoctotext::PlainTextWriter |