PDF data extraction using Windows PowerShell and pdftotext command line utility provided by xpdftool. In this tutorial we will try to extract some data from a batch of invoices and save it in text file The approach would be:- 1) Get a list of Invoices as PDF files. 2) Convert the 1st PDF to plain text file using pdftotext utility. 3) Find the required information and clean it. 4) Save the data to flat file. 5) Repeat steps 2 to 4 for additional files. Let's start Open Run Dialog Box by pressing Windows Key + R Open Powershell ISE by typing, well powershell_ise.exe in Run Dialog as shown. Tip :- PowerShell Integrated Scripting Environment (ISE) is as an IDE for developing and testing Powershell scripts. You can also use any other text editor (notepad, notepad++) to write the powershell scripts. The IDE looks like this. Now let's dive into the script. A) Set the variables #### Variable setting start ##### # Set the pdftotext.exe location...
I write mostly in bored state.