Ivor O’Connor

February 7, 2010

Poor Man’s Web Link Checker

Filed under: Uncategorized — ioconnor @ 7:50 pm

There are link checkers for websites that work reasonably well like http://validator.w3.org/checklink. However it does not work on pages that are dynamically created with lots of JavaScript. For instance if your table of contents are generated with JavaScript the official http://validator.w3.org/checklink link checker will totally miss all links in the TOC!

To get around this on a *nix machine is fairly easy though. You simply need two lines. One line checks the external links and the other line checks the internal links. Here are the lines that assume you are in the root of the directory containing the website and all links are there in local files.:

  1. External links:
    curl -s -S $(grep -ir href= *.* | sed 's/.*href="//' | sed 's/\".*//' | sort -u | grep http | grep -v ^#) > /tmp/blahbla
  2. Internal links:
    for x in $(grep -ir href= *.* | sed 's/.*href="//' | sed 's/\".*//' | sort -u | grep -v http | grep -v ^# ) ; do if [ ! -f "$x" ]; then echo "File \"$x\" does not exists"; fi; done;

The first command simply finds the external links, fetches them to a tmp file, and in the process if there are any errors with the links displays them to the console.

The second command finds all the links to internal files and verifies the files exist on the hard drive.

No need for expensive tools that may not even work on your website!

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: