SearchWP offers the unique feature of extracting plain text from PDF files uploaded to your WordPress website. Out of the box, SearchWP attempts to do this using only PHP, but due to the complexity and variation of the PDF format that sometimes results in content not being accurately extracted. Enter Xpdf. Xpdf is a command line utility that must be installed on your server in order for this Extension to work. Installation is simple, and instructions are included. Using the Xpdf Integration Extension you can offload all the work PHP has to do in processing your PDF files to Xpdf, which is extremely fast and accurate when extracting content from your PDFs. After activating the Extension, you will need to follow the installation instructions. Once installed, SearchWP will offload the PDF content extraction process to Xpdf. Installing Xpdf Using this extension you can utilize Xpdf to extract the content from your PDFs. IMPORTANT: Xpdf is not provided in this download. You must download Xpdf and upload it to a non-public (outside your Web root) location Xpdf offers binary distributions for both Windows and Linux at Xpdf: Download. Installation Once downloaded:
- Extract xpdfbin-linux-3.03.tar.gz (the version number may be different)
- Upload the pdftotext binary (found in either the bin32 or bin64 directory after extracting) to a non-public location, outside your Web root
- Ensure you have set the proper permissions to the file
The last step is to tell SearchWP Xpdf Integration where you installed Xpdf. Add the following to your theme’s functions.php, replacing /path/to/pdftotext with the actual path to the pdftotext binary (not the folder) on your server.