Dedoc
Dedoc is an open-source library/service that extracts texts, tables, attached files and document structure (e.g., titles, list items, etc.) from files of various formats.
Dedoc
supports DOCX
, XLSX
, PPTX
, EML
, HTML
, PDF
, images and more.
Full list of supported formats can be found here.