

PdfFileParser - Still not what we need.pdftohtml - Again, doesn't produce images.Licensing is opaque, looks like we have to pay per client we distribute to. Doesn't seem to be much useful documentation regarding turning PDFs into images in the public domain. iTextSharp - Documentation is a book you have to buy, not a good start.I know this is quite a specific requirement set - perhaps enough for some people to deem this question too localised, but I'm hoping that someone can suggest an approach and some libraries that can be helpful to me, as well as others in the future. We certainly don't mind paying for a commercial solution, but we'd rather not get stuck with paying a fee per individual distribution of the software.

publishing our sourcecode) by paying a license fee. In the case of FOSS, allows us to exempt ourselves from normal FOSS license requirements (i.e.Doesn't fork out processes - wrappers that essentially just create command line parameters and launch an external executable aren't allowed in this case.Is pure-C# or has a supported C# wrapper onto a native DLL.So, what I'm really looking for is two libraries: one to convert PDFs to images, another to OCR those images. We can then apply the template onto that document type. For tables, they define a location of the table and a bunch of further values for column and row sizes.

My plan is to have the PDFs rendered to static images, then allow the users to set up their own templates, which essentially pull out text at predefined pixel-offsets in the PDF, using OCR. Ordinarily I'd write this off as a complete impossibility, but the documents they're importing will be in their own set layout. Our customers will be looking to import any arbitrary document. I'm looking for is a C# solution to import data from PDF documents into our database, in a commercial application.
