How to get the number of pages of a .PDF uploaded by user?
Asked Answered
T

6

23

I have a file input, and before uploading, I need to calculate the number of pages of that .pdf in Javascript (e.g. JQuery...)

Teishateixeira answered 20/4, 2012 at 21:18 Comment(2)
Check this out github.com/mozilla/pdf.jsNadbus
Also, you can limit the size of the file that can be uploaded to your site, if you're worried about excessive page counts.Flavin
R
55

In case you use pdf.js you may reference an example on github ('.../examples/node/getinfo.js') with following code that prints number of pages in a pdf file.

const pdfjsLib = require('pdfjs-dist');
...
pdfjsLib.getDocument(pdfPath).then(function (doc) {
    var numPages = doc.numPages;
    console.log('# Document Loaded');
    console.log('Number of Pages: ' + numPages);
})
Reynold answered 30/5, 2014 at 18:8 Comment(1)
One closing ) is missing as the last character.Buerger
L
15

and a pure javascript solution:

var input = document.getElementById("files");
var reader = new FileReader();
reader.readAsBinaryString(input.files[0]);
reader.onloadend = function(){
    var count = reader.result.match(/\/Type[\s]*\/Page[^s]/g).length;
    console.log('Number of Pages:',count );
}
Lancastrian answered 30/8, 2016 at 8:41 Comment(4)
That regular expression works for documents fulfilling a number of assumptions and in particular is likely to fail for documents with multiple revisions or intense object stream use.Pah
i tested it on many pdf docs and it works. do you have any sample?Lancastrian
I could create any number of samples: As you surely are aware, the PDF format at byte level allows to add comments; thus, I could simply add any number of comments containing a "/Type /Page" to an existing document and so make the regular expression return a too high result. But you probably don't mean constructed examples but real-world ones. For that you might want to look at questions like this one etc.Pah
I am getting this message - Property 'match' does not exist on type 'string | ArrayBuffer'. Property 'match' does not exist on type 'ArrayBuffer'.ts(2339)Isley
B
2

As has been stated in the other answers, something like pdf.js is be what you are looking for. I've taken a look at the API and it does include a numPages() function to return the total number of pages. It also seems to count pages for me when viewing the demo page from Mozilla.

It depends if you are able to use modern browsers and experimental technology for your solution. pdf.js is very impressive, but it is still experimental according to the github page .

If you are able to count the pages on the server after uploading, then you should look at pdftools or similar.

Something like pdftools --countpages is what you are looking for

Buzzard answered 20/4, 2012 at 21:34 Comment(0)
S
2

You could also use pdf-lib.

You will need to read the file from the input field and then make use of pdf-lib to get the number of pages. The code would be like this:

import { PDFDocument } from 'pdf-lib';

...

const readFile = (file) => {

  return new Promise((resolve, reject) => {

    const reader = new FileReader();

    reader.onload = () => resolve(reader.result);
    reader.onerror = error => reject(error);

    reader.readAsArrayBuffer(file);
  });
}

const getPageCount = async (file) => {

  const arrayBuffer = await readFile(file);

  const pdf = await PDFDocument.load(arrayBuffer);

  return pdf.getPageCount();
}

And then just get the number of pages of the attached file with:

const numPages = await getPageCount(input.files[0]);

being input the variable which stores the reference to the DOM element of the file input.

Stradivari answered 23/9, 2019 at 21:21 Comment(1)
I think the correct function to only get the page count is getPageCount. getPages will return an array of PDFPage objects: pdf-lib.js.org/docs/api/classes/pdfdocument#getpagecountPuerile
S
1

In typescript class using Pdf-lib I use the following.

// getPAGE COUNT:
  async getPageCount(formUrl: any): Promise<number>{
    const LogPdfFields = [] as any[];
    const formPdfBytes = await fetch(formUrl).then((res) => res.arrayBuffer());
    const pdfDoc = await PDFDocument.load(formPdfBytes);
    const pageCount = pdfDoc.getPageCount();
    return pageCount;
  }

Call as a promise

Suppliant answered 29/12, 2020 at 13:25 Comment(0)
W
0

I think the API has changed a little since Tracker1 posted an answer. I tried Tracker1's code and saw this error:

Uncaught TypeError: pdfjsLib.getDocument(...).then is not a function

A small change fixes it:

const pdfjsLib = require('pdfjs-dist');
...
pdfjsLib.getDocument(pdfPath).promise.then(function (doc) {
    var numPages = doc.numPages;
    console.log('# Document Loaded');
    console.log('Number of Pages: ' + numPages);
}
Wessels answered 7/10, 2020 at 4:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.