pdftk

Counting pages in multiple PDFs from the command line, using pdfinfo or pdftk

Say you have a bunch of PDFs and you want to know how many pages each of them has. You could of course use some graphical software to display every single PDF and check the page count (xpdf, evince, whatever), but that gets tedious very fast.

So here's (mostly as a reminder for myself) one way to count pages of many PDFs (in the current directory) using pdfinfo from the xpdf-utils package:

$ cat countpdfpages
#!/bin/sh
for f in *.pdf; do
        echo -n "$f: "
        pdfinfo "$f" 2>/dev/null | grep Pages | cut -d ":" -f 2
done

A sample run:

$ ./countpdfpages
S71147.pdf:           25
S71226.pdf:           38
S71242-01.pdf:        25
S71258.pdf:           26
S71315.pdf:           35
S72045.pdf:           2

I'm sure there are many other ways to do this, e.g. using pdftk foo.pdf dump_data (in the pdftk package), please leave a comment if you know a simpler way to achieve this. Especially so if you don't need an extra script to do it (i.e. if there's a PDF command line tool which can already count pages in multiple PDFs). Thanks!

Manipulating PDFs from the command line - joining, merging, rotating [Update]

One of the single most useful packages when it comes to PDFs in Linux is pdfjam.

From the website:

  • pdfnup, which allows PDF files to be "n-upped" in roughly the way that psnup does for PostScript files.
  • pdfjoin, which concatenates the pages of multiple PDF files together into a single file.
  • pdf90, which rotates the pages of one or more PDF files through 90 degrees (anti-clockwise).

The installation is easy as always: apt-get install pdfjam

PDF is not exactly the most easily editable format out there, but these tools can save you lots of time and trouble. Just recently I needed to merge two PDFs into one (and I didn't have any source format of the files). A simple pdfjoin foo1.pdf foo2.pdf --outfile bar.pdf does the job in a few seconds.

Equally useful when you need to print huge documents is pdfnup --nup 2x2 foo.pdf, which sticks four PDF pages into one (thus drastically reducing the amount of pages you have to print)...

Update 2006-09-20: As was noted by several people, pdftk is very cool, too. It can do some other things such as split PDFs, encrypt/decrypt them, manipulate metadata and more...

Syndicate content