docconv
Converts PDF, DOC, DOCX, XML, HTML, RTF, etc to plain text
...Now you can add -tags ocr to any go command when building/fetching/testing docconv to include support for processing images. Documents can be sent as a multipart POST request and the plain text (body) and meta information are then returned as a JSON object.