...making Linux just a little more fun!

<-- prev | next -->

XMLTV

By Bill Lovett

Where do you go to find out what's on TV? The usual suspects might include a newspaper, a recent issue of TV Guide magazine, a favorite Web site, or your nearest TiVo, ReplayTV, or other PVR. But don't forget to add Linux to the top of that list. You can let the machine do the dirty work and bring the listings to you. XMLTV, a short bash script, and a cron job are all you need to get started.

Installation

First things first: getting the program installed. XMLTV is a suite of Perl scripts and can be downloaded from membled.com/work/apps/xmltv. There are releases for Unix-like and Windows environments, but, for obvious reasons, we'll focus on the former. If you're installing from source, it's the usual routine:

% perl Makefile.PL
% make
% make test
% make install 

If you're on Debian, it's all just an apt-get away (apt-cache search xmltv). Links to packages for OS X, Red Hat 8, and Red Hat 9 are available from the project's homepage.

Configuration

Before XMLTV can be useful, it needs to know where in the world you are. XMLTV is international— it can fetch TV listings for Canada and the United States, the United Kingdom, Austria and Germany, New Zealand, Finland, Italy, Spain, the Netherlands, Denmark, and Hungary. (Belgium and France are in the works.) The scripts that collect listings for a particular country are referred to as grabbers, and you'll find them on the command line under tv_grab_*. We'll use the U.S. grabber, tv_grab_na.

When you first run the grabber, do so with the --configure option. This starts a question-and-answer session in which you and the grabber get a little bit better acquainted, as far as your Zip code, TV service provider, and channels you want to ignore are concerned. The results of the script are written to ~/.xmltv/tv_grab_na.conf, and can be easily edited by hand.

At this point, XMLTV is ready to do your bidding. Do a man tv_grab_na to learn about all the available options. For now, just two will suffice:

% tv_grab_na --days 1 --output /tmp/tv.xml
This tells the grabber to get one day's worth of listings, and save them out to /tmp/tv.xml.

XMLTV's file format doesn't quite make for friendly reading, unless you enjoy reading raw markup. A few more scripts from the suite can fix that. tv_sort sorts the contents of an xmltv file date. tv_grep lets you weed out some of the obvious cruft in the listings. Here's how I run it:

% tv_sort --output /tmp/tv_sorted.xml /tmp/tv.xml
% tv_grep --output /tmp/tv_grepped.xml --ignore-case --not --category Children \
          --not --category Sports --not --title "Paid Programming" \
          --not --title "Local Origination" \
          --on-after now /tmp/tv_sorted.xml
The commands above sort the original file and then discard anything categorized as "Children" or "Sports", and anything with "Paid Programming" (infomercials) or "Local Origination" (public access) in the title. Also, we're discarding everything that aired before the script ran.

At this point, we've still got an XML file. Converters to the rescue! tv_to_text is one of the tools that can help us go from XML to something else. (Other possibilities include LaTeX, HTML and PDF. Check the readme to see what's currently available.) After running something like this:

% tv_to_text --output /tmp/tv.txt /tmp/tv_grepped.xml
We get output like this:
21:00--21:30    Spy School      38
21:00--21:30    Designing for the Sexes // European Kitchen     64
21:00--21:30    Chappelle's Show        67
21:00--21:30    The Real World // Las Vegas     71
21:00--22:00    Law & Order: Special Victims Unit // Guilt      44
21:00--22:00    Wild Card // Auntie Venom       45
21:00--22:00    Cold Case Files // The Accidental Killer; Little Sister Lost    57
21:00--22:00    America's Most Wanted: America Fights Back // Top Ten Most Wanted Fugitives     5
21:00--22:00    The FBI Files // The Price of Greed     60
21:00--22:00    Trading Spaces // Nashville: Murphywood Crossing        61
21:00--22:00    Great Chowder Cook-Off  63
21:00--22:00    Ends of the Earth // Secrets of the Holy Land   65
21:00--22:00    The E! True Hollywood Story // The Hilton Sisters       68
...
Simple and no frills. Just what we need for the final step: e-mail delivery.

Delivery

If we stopped at this point we'd have used several of XMLTV's abilities but hardly anything else. We'd also be running low on convenience and automation. Fortunately, we can wrap all the commands we've seen so far into a shell script, and have it e-mail us the final results. mail can take care of, well, the mailing:

% mail -s "Today's TV listings from XMLTV" user@localhost < /tmp/tv.txt
Here's what the full script looks like (text version of this listing):
#!/bin/sh

# Grab today's listings:
tv_grab_na --days 1 --output /tmp/tv.xml

# Sort
tv_sort --output /tmp/tv_sorted.xml /tmp/tv.xml

# Grep
tv_grep --output /tmp/tv_grepped.xml --ignore-case --not --category Children \
--not --category Sports --not --title "Paid Programming" \
--not --title "Local Origination" \
--on-after now /tmp/tv_sorted.xml

# Convert To Text
tv_to_text --output /tmp/tv.txt /tmp/tv_grepped.xml

# Email
mail -s "Today's TV listings from XMLTV" user@localhost < /tmp/tv.txt

Put that in a cron job that runs once per day, and you've got TV listings with no outside advertising, and no channels or shows you know you aren't interested in.

More importantly, you've got a foundation to build on. What we've covered is just the beginning— beyond the command-line scripts, a GUI client is also available. Of course, there are plenty more things you could do from the command line, such as:

  1. Pull in data from imdb.com via tv_imdb
  2. Split the listings into separate files for each day and channel via tv_split
  3. Transform the XML with your own XSLT stylesheet.
  4. Only send e-mail if certain keywords are found
It all depends on how you want to consume the information, and how cleverly you can chain all the scripts together.

 


[BIO] Bill Lovett is a Web developer in New York City. He's one of those PHP/MySQL types. And he has this weird thing about running Linux on old machines that by all rights should have been trashed years ago. Read more about Bill and his Open Source projects at www.ilovett.com


Copyright © 2004, Bill Lovett. Released under the Open Publication license unless otherwise noted in the body of the article. Linux Gazette is not produced, sponsored, or endorsed by its prior host, SSC, Inc.

Published in Issue 99 of Linux Gazette, February 2004

<-- prev | next -->
Tux