Having a good text-to-speech experience is hard when it comes to PDF files, and that doubles when you add the mobile form factor into the equation.
The technical roots of the PDF file are such that they make this process difficult. The primary purpose of the PDF format was to ensure accurate printing/displaying of printed material on screens on every device. That should come at no surprise as this format was invented by Adobe, a company with deep roots in the printing. In many ways, the PDF comes as the evolution of its PostScript printing technology.
It does this job very well, and you will always get the same layout of the page no matter what device you use and whether you print it or display it on the screen.
But when it comes to the text flaw, the PDF shows that this was not its primary purpose. Some of the most notable problems are:
- The PDF may contain just a mess of characters with coordinates on where exactly to print them. As such even most simple things like words may not be defined in PDFs and may require a complex artificial intelligence to extract them.
- As the text is laid out by coordinates, its order inside the PDF file may not be connected with its logical order. Again this requires quite a bit of artificial intelligence to make it correct.
- All textual content is the same. As such from purely digitally perspective footers, headers and footnotes are no different than the regular text. In general there is no logical connection between consequent pages and it is up to the text-to-speech app to figure out if the next page continues the sentence, paragraph, chapter or is a completely new section. Again only the advanced artificial intelligence may detect them.
- As the iPhone screen is fairly small, additional problem is that the PDF format contains only definition of fixed pages, which are generally too big to be displayed on the screen. Alternative is to display the extracted text from the document, but that makes the quality of this operation even more important.
- The text may be just an image of the page. The app can use the AI to detect such a text inside the page image. However those files may contain invisible textual representation of the page which may be helpful if it is accurate, but there are lot of files where this is not the case. Unfortunately, there is no reasonable way to estimate such accuracy, which results in the import of such “garbage” text encoded in the file.
As the text-to-speech is dependent on the text flow, this obviously puts a big barrier to supporting this format. There is no perfect solution, as by definition, the artificial intelligence may make errors in its decisions, and as such, at least in theory, this can lead to various problems where content is not properly detected. However, the good AI will always bring more benefits than problems. Actually, if you have carefully followed the text, it is very hard, or even impossible, to use text-to-speech to read PDFs unless the app is AI-enabled.
When it comes to those AI functions some of them are well supported among all popular iPhone text-to-speech apps, but some are not or require excessive payments:
- Basic text extraction (like recognizing words) – you can expect fairly good results on all apps. In part this comes from the fact that Apple already supplies tools that can provide excellent results built-in the device.
- Recognizing footers and headers – less popular apps may have problems regarding this, but all popular apps should be able to handle these cases. However, some of them may put this as a feature of yearly subscription (like NaturalReader thus), in which case you need to pay at the minimum 50$/year just for this.
- OCR – if a PDF file contains only images of pages there is no textual information in it and AI tools are necessary to extract the textual information.
- Finally, when it comes to the skipping of footnotes the only app that is currently able to do this is Speech Central.
Also one thing that you may get from premium apps that might be important for reading of some PDF books (though this isn’t necessarily related just to the PDF format) is to intelligently skip the “junk” content (like citations in scientific works). The situation regarding this is similar to recognizing of footers and headers – all popular apps do support that. However on all apps it is behind the paywall except on Speech Central. And that paywall may be significant (Speechify 140$/year, NaturalReader 50$/year, Voice Dream Reader 60$/year).
Having this in mind it should come as no surprise that Speech Central advertises itself as “The King of PDF” – while there is no app that handles 100% perfect every single PDF file, Speech Central provides some functions that brings it at the very top regarding this. And what is even more tempting is that those features are available even in its free edition.