Parsing pdf files [closed]

S

3

10

I have a requirement to split a large pdf document into smaller files based on the content of the file. We use BCL easyPDF to manipulate pdf files. easyPDF can split pdf documents based on a page number, but it cannot split the document based on the file content. Also it does not have a search function (as far as I can tell, if I am wrong please someone let me know.) to determine the location of the content.

Now can someone tell me how I can find the location of text in a pdf file using .net?

Thanks

Sweetie answered 3/5, 2012 at 18:19 Comment(4)

yes but it should/is a community where we can help people who may be still learning the ins and outs of a language or protocol. We can try to point them in the right direction. – Jackqueline 3/5, 2012 at 18:24

Isn't PDF a sort of binary file? You cannot just parse it as text. A library is required – Pooka 18/1, 2017 at 16:51

I start out my year with my usual complaint. Why is this off topic ( I know the rules say it is) but its very useful, many of the preserved, 'best' questions (which you cannot find now I see) are of this nature. They represent the accumulated advice (aka wisdom) of many experienced devs – Piedadpiedmont 4/1, 2019 at 0:36

SKDocument.CreatePdf – Northward 27/5 at 7:23

G

3

You might try Docotic.Pdf library for your task.

The library can extract text from PDFs (with or without formatting).

Or you could just retrieve a collection of words with their bounding rectangles from PDFs. This should help you to find location of the text in a file.

Disclaimer: I work for the vendor of the library.

Genuflect answered 4/5, 2012 at 15:45 Comment(1)

NOTE: As Bobrovsky mentions, this is a commercial product. Its price is non-trivial (though appropriate for what it does). – Knowledgeable 4/1, 2019 at 0:23

P

2

You need a PDF library in .NET such as iText.Net.

Procaine answered 3/5, 2012 at 18:23 Comment(0)

J

1

take a look at this question. there are links to some libraries that may satisfy your requirements

How to programatically search a PDF document in c#

Jackqueline answered 3/5, 2012 at 18:22 Comment(0)

Recommended topics

Hot tags