Why I'm creating my own URL shortening service

I’ve long been concerned about the proliferation of “short URLs”, whose use has gathered great momentum, especially in the light of microblogging services like Twitter.

tinyurl-example1

Short URLs, such as those generated by TinyURL are convenient, especially when you only have 140 characters to get your message across. You can turn a huge URL, many hundreds of characters long, into just 25 characters or even less. Great!

Besides TinyURL, a proliferation of URL shortening services are available. Some that come to mind are bit.ly, tr.im, ow.ly, is.gd, to name but a few. And short URLs themselves are gaining use outside of microblogging services. You will see them in blog posts, emails (to get around the line-wrap-broken-link problem) and even on the printed page (see British Archaeology magazine).

But what happens if a short URL service were to disappear? The company or individual that runs it pulls the plug, and suddenly the web is littered with thousands or even millions of dead links. That would be bad. And it will happen.

I see the state of short URLs as a delicate balance. On one side, we have the originating (possibly long) URL. On the opposite side, we have the short URL. Hopefully, the original URL will work for many years. When I migrated the Wessex Archaeology website to a new CMS last year, I didn’t break any links. Some of those links have worked for more than 7 years, and I hope that they will still work in another 7. WA can make sure that they stay the same (and they will). But what happens to any shortened links that point to those pages? We can’t guarantee that same amount of longevity.

tinyurl-in-print

What happens to the TinyURL links in the printed magazine British Archaeology if TinyURL goes bust? They’ll break. But BA is available in many libraries and people do look at back issues. It would be nice if they could see the web pages mentioned in the articles, but there’s no guarantee that they will work because there are two parts of the equation that could go wrong. One, is that TinyURL disappears, the second is that the originating page is deleted or changes its URL without redirecting.

For short URLs that I create I would like my own control over at least part of that equation.

I’ve often heard the argument that the use of short URL services are only meant to be temporary, for links that are “here and now”. But how often have you come across something old, but still relevant, when doing a web search? For me, that’s a fairly frequent occurrence. Who’s to say what is quick and temporary today, isn’t actually really quite relevant and useful in the future?

By running my own URL shortening service, I won’t change what is being used elsewhere, but at least people looking at my Twitter stream, or wherever those tweets are syndicated to (this blog, for example), have a better chance of seeing what I’m linking to in a few years time. Especially if I plan to run my personal URL shortening system for as long as I’m alive and capable.

I suppose that one of the driving forces behind this is my training as an archaeologist (we don’t like throwing things away, generally, and that includes data). I can’t archive the pages I link to, but at least I can give folks in the future a better chance of finding what I’m linking to.

I have a nice short URL thanks to the .eu top level domain, so I will experiment with some different systems to see which works out – the simpler and easier to maintain the better. It’s got to last a long time…

[Edit] When I say “creating my own URL shortening service” I should clarify that I’m not programming one from scratch, but taking an existing GPL/Open Source URL shortener and modifying it for my needs (if it needs modifying)! I will probably have a public and private version, with varying functionality. Some good ideas are already flowing in through Twitter about identifying canonical URLs, which is great :-)

[Update] My URL shortener is alive: http://qurl.eu/ (think “curlew”, like the bird). It is based upon TightURL, and I chose it because of its ability to use various blacklists to reduce misuse. I will run qurl.eu for as long as I can – i.e. for as long as is technically feasible to do so.

Scribd – YouTube for documents

I’ve been looking at Scribd recently as a way of distributing documents online. Think of it as a kind of YouTube for documents – upload a document (Word, PDF, OpenDoc, RTF etc), tag it, choose a Creative Commons license if you so desire, and it gets converted into FlashPaper and is viewable online. You then get a snippet of code, allowing you to embed documents in your own site like this:

..and the original file remains untouched and available for download.

As an example, I took a PDF that was languishing on a server, and had been unread for a couple of years. Within an hour of being on Scribd, it had been indexed by Google and looked at by 12 people. Not bad.

I know this sounds like an advert, but I’m really rather impressed by it!

Cleaning up Word HTML


Today, whilst building a new data downloads section for the Archaeology at Heathrow T5 website, I had to convert a load of Word documents full of tables and subheadings into beautiful xHTML Strict for pages in a WordPress environment.

Normally, I’d open the files in Word 2004 (on a Mac), save them as HTML, then use Dreamweaver 8 to open each file, clean up the HTML via the “Clean Up Word HTML” command, then perhaps do a bit of cleaning by hand (i.e. removing the inline CSS).

But faced with 8 fairly complex documents, I decided that there must be a more efficient way of doing this. A quick Google (“clean word html osx”) revealed a remarkably simple process.

I’ll repeat it here, just for my own notes.

Open the Word documents in TextEdit (I’m a Mac user, remember!). In TextEdit go to Preferences, then go to the “Opening and Saving” tab. In the HTML saving options select “XHTML 1.0 Strict” and “No CSS”. You can also tick “Ignore rich text commands in HTML files if you like.

Then saving your Word documents as HTML using TextEdit gives you beautifully clean code to work with.

TextEdit’s HTML export options

Zooomr's photos are back online


As of now, blogged photos hosted on Zooomr are now back online. Here’s a quick test:


Silbury Hill, Wiltshire
Silbury Hill, Wiltshire
Hosted on Zooomr

It seems as if Zooomr is back on track with new servers, thanks to a big community effort, and support from some big names like Robert Scoble, Zoho, and Sun Microsystems. And of course the determination and pride of Kristopher Tate.

Good luck Kris!

(keep up with the gossip and news at http://ricin.us/zooomrlive/)