The data is acquired and translated from the sources listed here using a series of perl scripts. Because there are 3 different types of content to harvest, there are 3 primary scripts: XML parser, Weather parser (HTML parsing), and Yahoo stock parser. The XML parser uses the perl XML::Parser module to do it's dirty work. The initial data acquisition for the XML is actually performed by lynx in a shell script wrapper (I never bothered to switch to LWP). The HTML weather parser is manually implemented instead of using one of the various HTML modules. It does, however, use LWP to get the HTML documents. I should probably switch it to my HTTP::Lite sometime. And finally, the stock parser is just a simple text parser that uses Geturl and split().
These 3 scripts are launched from CRON periodically to get and process the data. On a daily basis, the database is flushed of old headlines and quotes, and optimized for better performance (MySQL OPTIMIZE TABLES). The optimization is necessary to ensure that the database does not get heavily fragmented, and cause poor performance loading and generating the web-page.
There are many improvements I could make, including:
This project was basically a rainy-day hobby one day while idle in 2000.
2002 update: Environment Canada's changes to their web-site and a hard drive crash caused the new code for handling the EC web-site to be lost. Weather is no longer displayed on the site.