PDF indexing can be expensive, limits are required

Indexing a directory of many PDFs can lead to continued high CPU usage.

The fundamental issue is that the PDF format is for printing, not storing text. In some cases the text is stored as string text inside the file which is quick to extract. In other cases, individual glyphs are placed on the page and the Poppler library has to group them together using an expensive algorithm to convert them back to text.

Examples of difficult PDFs: