script

Converting Mailman "Gzip'd Text" archive files to proper mbox files

Mailman archives are often only available in the pretty useless "Gzip'd Text" format, which you cannot easily download and view locally (and threaded) in a MUA such as mutt. But that is exactly what I want to do from time to time (e.g. because I want to read the discussions of the past weeks on mailing lists where I'm newly subscribed).

After some searching I found one way to do it which I stripped down to my needs:

 $ cat mailman2mbox
 #!/usr/bin/perl
 while (<STDIN>) {
   s/^(From:? .*) (at|en) /\1\@/;
   s/^Date: ([A-Z][a-z][a-z]) +([A-Z][a-z][a-z]) +([0-9]+) +([0-9:]+) +([0-9]+)/Date: \1, \3 \2 \5 \4 +0000/; 
   print;
 }

Example run on some random mail archive:

 $ wget http://participatoryculture.org/pipermail/develop/2009-August.txt.gz
 $ gunzip 2009-August.txt.gz
 $ ./mailman2mbox < 2009-August.txt > 2009-August.mbox

You can then view the mbox as usual in mutt:

 $ mutt -f 2009-August.mbox

Suggestions for a simpler method to do this are highly welcome. Maybe some mbox related Debian package already ships with a script to do this?

How to run 65535 web servers on a single laptop

OK, so here's what crazy computer geeks come up with when they're bored of just sitting in the subway and staring out of the window.

Web servers usually run on port 80. TCP/UDP ports range from 1 to 65535 (port 0 is reserved). You can run multiple web servers on different ports at the same time... Do you think what I think?

Well, first you need a web server (duh). I decided to use lighttpd, as it's said to be small and memory-efficient (which sounded pretty important given what I was trying to do). Apache would probably not be a good choice here. Mind you, I have not done any benchmarks at all, I'm just guessing...

  $ apt-get install lighttpd

Then, I wrote a little shell script containing a loop, which invoked lighttpd on port 1, 2, 3, 4, ..., 65535. That's it ;)

  #!/bin/bash
  TMPDIR=/tmp
  CONFFILE="server.document-root       = \"/var/www/\"
           index-file.names           = ( \"index.html\" )"
  for ((i = 1; i < 300; i = i + 1)) do
    echo "+++ Starting web server on port $i"
    echo $CONFFILE > $TMPDIR/lighttpd.conf
    echo "server.port = $i" >> $TMPDIR/lighttpd.conf
    /usr/sbin/lighttpd -f $TMPDIR/lighttpd.conf
    rm -f $TMPDIR/lighttpd.conf
  done 

I'm sure this can be optimized a lot, but it's sufficient for now, and it works.

You can test any of the web servers by running "wget http://localhost:3556/" (for example). You can kill them all with killall lighttpd. If you already run some other daemons on some ports, you cannot start a lighttpd on the same port, so you'll end up with fewer than 65535 servers.

In case you try this on your hardware, make sure you have lots of RAM and lots of swap. Don't run this on production hardware. Feel free to post your experiences in the comments.

Counting pages in multiple PDFs from the command line, using pdfinfo or pdftk

Say you have a bunch of PDFs and you want to know how many pages each of them has. You could of course use some graphical software to display every single PDF and check the page count (xpdf, evince, whatever), but that gets tedious very fast.

So here's (mostly as a reminder for myself) one way to count pages of many PDFs (in the current directory) using pdfinfo from the xpdf-utils package:

$ cat countpdfpages
#!/bin/sh
for f in *.pdf; do
        echo -n "$f: "
        pdfinfo "$f" 2>/dev/null | grep Pages | cut -d ":" -f 2
done

A sample run:

$ ./countpdfpages
S71147.pdf:           25
S71226.pdf:           38
S71242-01.pdf:        25
S71258.pdf:           26
S71315.pdf:           35
S72045.pdf:           2

I'm sure there are many other ways to do this, e.g. using pdftk foo.pdf dump_data (in the pdftk package), please leave a comment if you know a simpler way to achieve this. Especially so if you don't need an extra script to do it (i.e. if there's a PDF command line tool which can already count pages in multiple PDFs). Thanks!

Updated DIY Dynamic DNS solution HOWTO

I've just updated my DIY secure pseudo-DDNS setup using ssh article/HOWTO a bit, in order to make it simpler to set up (no more extra scripts required) and a bit more secure (by using command= and no-port-forwarding,no-X11-forwarding,no-agent-forwarding in the /home/user/.ssh/authorized_keys file, for instance).

If you've considered employing such as solution please revisit the article for updated instructions.

DIY secure pseudo-DDNS setup using ssh

Here's a quick HOWTO for setting up your own secure pseudo-dynamic DNS (DDNS) server.

It's not a "real" DDNS service, i.e. you won't be able to use standard DNS tools or protocols to talk to the server, but it covers 98% of all functionality I expect from a service such as DynDNS or similar ones: It tells me the IP address of a certain box which doesn't have a static IP address (e.g. my home-server).

Requirements

You'll need:

  • A Linux box with dynamic IP address (dial-up modem/DSL), I'll call it homeserver from now on. This is the box whose public IP address I want to be able to find out.
  • A public Linux box with static IP address (or known DNS name) where you have a user account and ssh access. I'll call this box publicserver.

Setup

On the homeserver:

  • Add a non-root user account (e.g. user) just for the purpose of this mechanism: adduser user. The user doesn't need any special permissions.
  • Create an ssh key with an empty passphrase for the user: ssh-keygen -t rsa -b 4096. This is required as you'll want to run ssh commands via cronjob later.
  • Add a cronjob which runs a random command such as ls regularly (as user), e.g. once per 10 minutes:

    5,15,25,35,45,55 * * * * user ssh -x user@publicserver ls

    The command to run (e.g. ls) doesn't really matter at all, more on that later.

On the publicserver:

  • Add a non-root user account (e.g. also named user) just for the purpose of this mechanism: adduser user. The user doesn't need any special permissions.
  • Add the public ssh key (/home/user/.ssh/id_rsa.pub) of user@homeserver to the publicserver's /home/user/.ssh/authorized_keys, so that the homeserver user can login on the remote publicserver without password (i.e. non-interactively). We'll also limit which ssh commands this user can run using the command keyword in /home/user/.ssh/authorized_keys file:

    command="echo $SSH_CLIENT | cut -d \" \" -f 1 > /home/user/homeserverip.txt && chmod 644 /home/user/homeserverip.txt",no-port-forwarding,no-X11-forwarding,no-agent-forwarding ssh-rsa AAAAAAAAAA...AAAAAAA user@homeserver

    In the above example AAA...AAA is the public key, command specifies which command should be run if this user "logs in" via ssh, and we use some other options such as no-port-forwarding,no-X11-forwarding,no-agent-forwarding to minimize what this user can do via ssh.

So to summarize: the homeserver's user simply executes the above commands on the remote publicserver, which in turn abuses the $SSH_CLIENT environment variable which contains the public IP the ssh connection was coming from (which is exactly what we're looking for). We store that IP in the homeserverip.txt file, which will always contain the latest-known IP address of the homeserver (because of the cronjob).

Getting the current homeserver IP address

You can now retrieve the current IP address of your homeserver easily from anywhere (e.g. from your laptop when you're in another, possibly hostile network) in order to connect to your homeserver:

  $ ssh -x otheruser@publicserver cat /home/user/homeserverip.txt

To make this a bit more convenient you can add a shell alias (e.g. into ~/.bashrc):

  alias homeserverip='ssh -x otheruser@publicserver cat /home/user/homeserverip.txt'

Or, to conveniently login to your homeserver as johndoe:

  alias homeserverlogin='ssh -x johndoe@`ssh -x otheruser@publicserver cat /home/user/homeserverip.txt`'

Conclusion, advantages

This may not be the most elegant solution, and it has a number of drawbacks when compared to services such as DynDNS, but it's sufficient for me and it also has some advantages:

  • You're not dependent on the DDNS service provider. For instance DynDNS recently changed their policy to only allow one update per 28 days, which totally sucks. They then disabled the service completely until I updated my ddclient config and contacted them, i.e. I wasn't able to connect to my homeserver for quite a while, which also sucks.
  • The ssh-based solution is secure and encrypted, in contrast to some other DDNS services, which only allow unencrypted HTTP-based connections (yes, some do allow https/SSL connections).
  • This solution doesn't require in-depth DNS server config knowledge, neither does it require a DNS server you control. You only need a (non-root) ssh account on a public server (or virtual server).

Personally I'm currently using this mechanism for two things, more might follow:

  • Connect to my homeserver via ssh.
  • Get the homeserver's IP address so I can update my OpenVPN client config file on my laptop (I use my homeserver as OpenVPN server).

So far it works pretty nicely.

Update 2008-06-24: Various fixes and simplifications. SSH key must be password-less. Don't run cronjob once per minute, that's overkill.
Update 2008-07-02: Simplify setup by removing the need for extra scripts. Limit the commands the user can perform via ssh in the authorized_keys file. Make the RSA keys 4096 bits strong.

Syndicate content