Dedoc
Dedoc is an open-source library/service that extracts texts, tables, attached files and document structure (e.g., titles, list items, etc.) from files of various formats.
Dedoc supports DOCX, XLSX, PPTX, EML, HTML, PDF, images and more.
Full list of supported formats can be found here.