Calibre is far from ideal so I wonder if there is a better way to convert a PDF into EPUB? Maybe a new AI tool exist for that purpose? What do you use?

Cc @nostupidquestions@lemmy.world

  • Em Adespoton@lemmy.ca
    link
    fedilink
    arrow-up
    1
    ·
    1 year ago

    You’ve identified the main issue: PDF extraction. A PDF can lay out pages in an infinite number of ways.

    My personal workflow is to take a PDF, tun it through ClearType OCR, save it as a web-friendly, accessibility standard compliant PDF, which will extract all the text and re-lay it out so a screen reader can read the text in the correct order.

    After that, it’s a matter of exporting the PDF to HTML, chunking it, zipping the results with a CSS file and a manifest, and you’ve got an ePub.

    And of course, there are Python libraries to do a lot of the conversion as well.