|
DocWire DocToText - Powered by Silvercoders 5.0.5
A multifaceted, data extraction software development toolkit that converts all sorts of files to plain text and html. Written in C++, this data extraction tool has a parser able to convert PST & OST files along with a brand new API for better file processing. To enhance its utility, DocToText, as a data extraction tool, can be integrated with other data mining and data analytics applications. It comes equipped with a high grade, scriptable and trainable OCR that has LSTM neural networks based character recognition. This document parser is able to extract metadata along with annotations and supports a list of formats that include: DOC, XLS, XLSB, PPT, RTF, ODF (ODT, ODS, ODP), OOXML (DOCX, XLSX, PPTX), iWork (PAGES, NUMBERS, KEYNOTE), ODFXML (FODP, FODS, FODT), PDF, EML, HTML, Outlook (PST, OST), Image (JPG, JPEG, JFIF, BMP, PNM, PNG, TIFF, WEBP) and DICOM (DCM)
|
Public Types | |
| enum | DataType { NONE , EXTRACTED } |
Public Member Functions | |
| Metadata (const Metadata &r) | |
| Metadata & | operator= (const Metadata &r) |
| DataType | authorType () |
| void | setAuthorType (DataType type) |
| const char * | author () |
| void | setAuthor (const std::string &author) |
| DataType | creationDateType () |
| void | setCreationDateType (DataType type) |
| const tm & | creationDate () |
| void | setCreationDate (const tm &creation_date) |
| DataType | lastModifiedByType () |
| void | setLastModifiedByType (DataType type) |
| const char * | lastModifiedBy () |
| void | setLastModifiedBy (const std::string &last_modified_by) |
| DataType | lastModificationDateType () |
| void | setLastModificationDateType (DataType type) |
| const tm & | lastModificationDate () |
| void | setLastModificationDate (const tm &last_modification_date) |
| DataType | pageCountType () |
| void | setPageCountType (DataType type) |
| int | pageCount () |
| void | setPageCount (int page_count) |
| DataType | wordCountType () |
| void | setWordCountType (DataType type) |
| int | wordCount () |
| void | setWordCount (int word_count) |
| void | addField (const std::string &field_name, const Variant &field_value) |
| bool | hasField (const std::string &field_name) const |
| const Variant & | getField (const std::string &field_name) const |
| const std::map< std::string, Variant > & | getFields () const |
| const std::map< std::string, std::any > | getFieldsAsAny () const |
Definition at line 44 of file metadata.h.
| enum doctotext::Metadata::DataType |
Definition at line 51 of file metadata.h.