command line

Counting pages in multiple PDFs from the command line, using pdfinfo or pdftk

Say you have a bunch of PDFs and you want to know how many pages each of them has. You could of course use some graphical software to display every single PDF and check the page count (xpdf, evince, whatever), but that gets tedious very fast.

So here's (mostly as a reminder for myself) one way to count pages of many PDFs (in the current directory) using pdfinfo from the xpdf-utils package:

$ cat countpdfpages
#!/bin/sh
for f in *.pdf; do
        echo -n "$f: "
        pdfinfo "$f" 2>/dev/null | grep Pages | cut -d ":" -f 2
done

A sample run:

$ ./countpdfpages
S71147.pdf:           25
S71226.pdf:           38
S71242-01.pdf:        25
S71258.pdf:           26
S71315.pdf:           35
S72045.pdf:           2

I'm sure there are many other ways to do this, e.g. using pdftk foo.pdf dump_data (in the pdftk package), please leave a comment if you know a simpler way to achieve this. Especially so if you don't need an extra script to do it (i.e. if there's a PDF command line tool which can already count pages in multiple PDFs). Thanks!

Strange command line of the day

Stuff I didn't expect I'd had to type today:

  $ dpkg-repack dpkg-repack

Seriously.

jhead - List and modify EXIF fields in JPEG photos

jhead is a very nice and very powerful command line utility to mess with JPEG headers (esp. EXIF fields).

  $ apt-get install jhead

It can display/extract a great amount of metadata fields from JPEG files and also extract the thumbnails stored in JPEG files (if any). The following will list all known metadata fields from a sample photo:

  $ wget http://farm4.static.flickr.com/3173/3061542361_60acb0904b_o.jpg
  $ jhead *.jpg
  File name    : 3061542361_60acb0904b_o.jpg
  File size    : 1074172 bytes
  File date    : 2008:11:26 23:38:04
  Camera make  : Panasonic
  Camera model : DMC-FZ18
  Date/Time    : 2008:03:05 15:45:52
  Resolution   : 3264 x 2448
  Flash used   : No
  Focal length : 4.6mm  (35mm equivalent: 28mm)
  Exposure time: 0.0100 s  (1/100)
  Aperture     : f/3.6
  ISO equiv.   : 100
  Whitebalance : Auto
  Metering Mode: matrix
  Exposure     : program (auto)
  GPS Latitude : N %:.7fd %;.8fm %;.8fs
  GPS Longitude: E %;.8fd %:.7fm %;.8fs
  GPS Altitude : 174.00m
  Comment      : Aufgenommen auf dem <a href="http://www.froutes.de/TT00000014_Ars_Natura">Kunstweg Ars Natura</a>.
  ======= IPTC data: =======
  Record vers.  : 4
  Headline      : Felsburg auf dem Felsberg
  (C)Notice     : www.froutes.de
  Caption       : Aufgenommen auf dem <a href="http://www.froutes.de/TT00000014_Ars_Natura">Kunstweg Ars Natura</a>.

As you can see there's a huge amount of potentially privacy-sensitive metadata in your typical JPEG as generated by your camera (including camera type, settings, date/time, maybe even GPS coordinates of your location, etc).

You can extract the thumbnail stored in all JPEGs in the current directory with:

  $ jhead -st "&i_t.jpg" *.jpg
  Created: '3061542361_60acb0904b_o.jpg_t.jpg'

Random flickr image and its differing thumbnail

Note that the JPEG thumbnail does not necessarily show the same picture as the JPEG itself. Depending on the image manipulation software that was used to create the edited/fixed/cropped JPEG, the thumbnail may still reflect the original JPEG contents (see sample image on the right-hand side). This is a huge potential privacy issue. There have been a number of articles about this some years ago, in case you missed them:

Thus, an important jhead command line to know is the following, which removes all metadata (including any thumbnails) from all JPEG images in the current directory:

  $ jhead -purejpg *.jpg
  Modified: 3061542361_60acb0904b_o.jpg

As you can see the result is that only very basic information can be gathered from the file afterwards:

  $ jhead *.jpg
  File name    : 3061542361_60acb0904b_o.jpg
  File size    : 1052506 bytes
  File date    : 2008:11:26 23:38:04
  Resolution   : 3264 x 2448
  $ jhead -st "&i_t.jpg" *.jpg
  Image contains no thumbnail

I recommend doing this for most photos you make publically available on sites like flickr etc. (unless you have a good reason not to). Finally, see the jhead(1) manpage for lots more options that the tool supports.

Manipulating PDFs from the command line - joining, merging, rotating [Update]

One of the single most useful packages when it comes to PDFs in Linux is pdfjam.

From the website:

  • pdfnup, which allows PDF files to be "n-upped" in roughly the way that psnup does for PostScript files.
  • pdfjoin, which concatenates the pages of multiple PDF files together into a single file.
  • pdf90, which rotates the pages of one or more PDF files through 90 degrees (anti-clockwise).

The installation is easy as always: apt-get install pdfjam

PDF is not exactly the most easily editable format out there, but these tools can save you lots of time and trouble. Just recently I needed to merge two PDFs into one (and I didn't have any source format of the files). A simple pdfjoin foo1.pdf foo2.pdf --outfile bar.pdf does the job in a few seconds.

Equally useful when you need to print huge documents is pdfnup --nup 2x2 foo.pdf, which sticks four PDF pages into one (thus drastically reducing the amount of pages you have to print)...

Update 2006-09-20: As was noted by several people, pdftk is very cool, too. It can do some other things such as split PDFs, encrypt/decrypt them, manipulate metadata and more...

Command-line OpenGL rendering using PyOpenGL and/or OpenGLContext

I noticed that there have been quite a few Python related posts on Planet Debian lately. Here's mine.

I'm having a really hard time trying to find a good solution for rendering an OpenGL scene (e.g. from a VRML file) using PyOpenGL and/or OpenGLContext in the command line. All solutions I tried so far pop up an X11-window, which is not what I'm looking for. Instead I want to render the scene and dump it to PNG/JPG without requiring an X11 server.

Can this be done with PyOpenGL/OpenGLContext? I could probably try to use xvfb, but that's really an ugly hack. Besides the images I get using xvfb are somewhat broken, not sure why (does xvfb support OpenGL?).

Any hints are appreciated.

Syndicate content