Counting pages in multiple PDFs from the command line, using pdfinfo or pdftk

Say you have a bunch of PDFs and you want to know how many pages each of them has. You could of course use some graphical software to display every single PDF and check the page count (xpdf, evince, whatever), but that gets tedious very fast.

So here's (mostly as a reminder for myself) one way to count pages of many PDFs (in the current directory) using pdfinfo from the xpdf-utils package:

$ cat countpdfpages
#!/bin/sh
for f in *.pdf; do
        echo -n "$f: "
        pdfinfo "$f" 2>/dev/null | grep Pages | cut -d ":" -f 2
done

A sample run:

$ ./countpdfpages
S71147.pdf:           25
S71226.pdf:           38
S71242-01.pdf:        25
S71258.pdf:           26
S71315.pdf:           35
S72045.pdf:           2

I'm sure there are many other ways to do this, e.g. using pdftk foo.pdf dump_data (in the pdftk package), please leave a comment if you know a simpler way to achieve this. Especially so if you don't need an extra script to do it (i.e. if there's a PDF command line tool which can already count pages in multiple PDFs). Thanks!

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

huge thanks

this does make things a lot easier... the pc (which is quite an old rig) i'm using at the office has tons of programs installed and it's become a little hard for it to handle the load; which makes it slower than normal.. it would take forever to display each pdf file, and the stuff you have here should very much make counting pdf a tad faster...

Nice counting method you

Nice counting method you show us, normally i use the pdfinfo but i can't use with his full features. Whit this you can save time and headaches. Thanks!!

The counting method does

The counting method does appear to work wonders. Top notch stuff. Keep it up!

sum with awk

nice idea! With awk you can sum it up:
$ for i in *pdf; do echo -n "$i: ";pdfinfo "$i" |awk '/^Pages/ { print $2}' ; done | awk '{s+=$2 ; print} END{print "Σ " s}'
freiesMagazin-2009-02.pdf: 31
freiesMagazin-2009-06.pdf: 66
miro.pdf: 6
thinkpython.pdf: 236
yalm.2009.05.pdf: 29
yalm.2009.06.pdf: 31
Σ 399

/thorsten

Counting with PoDoFo

Hi,

Your post inspired me to write a small tool using the PoDoFo PDF library. You can find the tool in SVN at:
https://podofo.svn.sourceforge.net/svnroot/podofo/podofo/trunk.

I think it does exactly what you want with no shell scripting and only 4 lines of C++ code (for the counting - of course some more for commandline argument handling and error handling):


_./podofocountpages -t /home/dominik/Desktop/Documents/master/ModellierungSpezifikationVerifikation/Folien/*.pdf
/home/dominik/Desktop/Documents/master/ModellierungSpezifikationVerifikation/Folien/msv-090420.pdf: 32
/home/dominik/Desktop/Documents/master/ModellierungSpezifikationVerifikation/Folien/msv-090427.pdf: 43
/home/dominik/Desktop/Documents/master/ModellierungSpezifikationVerifikation/Folien/msv-090504.pdf: 33
/home/dominik/Desktop/Documents/master/ModellierungSpezifikationVerifikation/Folien/msv-090511.pdf: 36
/home/dominik/Desktop/Documents/master/ModellierungSpezifikationVerifikation/Folien/msv-090518.pdf: 41
/home/dominik/Desktop/Documents/master/ModellierungSpezifikationVerifikation/Folien/msv-090525.pdf: 42
/home/dominik/Desktop/Documents/master/ModellierungSpezifikationVerifikation/Folien/msv-090608.pdf: 32
/home/dominik/Desktop/Documents/master/ModellierungSpezifikationVerifikation/Folien/msv-090615.pdf: 16
Total: 275

Have Fun!
Dom

podofocountpages

Nice, thanks!

You don't need `ls *.pdf`

You don't need `ls *.pdf` (and it in fact could cause a security risk with certain filenames). You can simply use:

for f in *.pdf
do
        echo -n "$f: "
        pdfinfo $f 2>/dev/null | grep Pages | cut -d ":" -f 2
done

Rely on globbing, but quote filename arguments

Indeed, getting a list of files for a command with a subshell to ‘ls’ will fail if the filenames contain word-break characters (such as white space), so directly using a glob in the ‘for’ statement is correct.

Just as important, though, is properly quoting the filename in the ‘pdfinfo’ command line:

for f in *.pdf
do
echo -n "$f: "
pdfinfo "$f" 2>/dev/null | grep Pages | cut -d ":" -f 2
done

glob etc.

Thanks, fixed.