Potater Details

January 25th, 2010

Okay, so, last night, I launched Potater.

Now it’s time to talk about the completely ridiculous technical decisions I made when I was developing Potater.

There’s No Database

The entire site is stored as a ball of JSON files. Heck, you can see them here if you like.

Here’s a tweeest, though – all of the text is stored as ‘restructured text‘ – this is because I edit the JSON files by hand (vim + screen, the man’s man’s editor) and I want them to remain human-readable and human writable.

Here are some benefits of the flat-file JSON approach:

It’s self-describing.

These files contain enough data for you to easily tell what goes where. Each file corresponds with a Top 10 page, and the contents should be so obvious that I don’t even need to tell you how it works.

It’s an API

These files also constitute a workable API for toodling around with the site. Look, all of the data on the site, you can download the JSON files with wget and fiddle with them on your own. Want a listing of the number one entries for the last three weeks? Why not! JSON parsers are built into or libraries for just about every important language, now, so converting one of these files to an object is trivially easy.

Backup is trivial.

Okay, so, backing up a database isn’t that hard. Backing up THIS database, though, is almost free it’s so easy. Just … get the files. In fact, that plays into my next feature…

Entire DB Under Version Control

The entire database is under version control, right next to the code. Rolling back changes to the data is no problem, captain! Backup is as easy as checking the whole thing out from a different computer.

Of course, there are a few features of a database that I don’t get, but let’s consider those:

Security

My entire database only has file-level security. Well, all of the files are downloadable anyways. I don’t care if people can get at my data. It’s not a problem.

SQL

For any query I want to make, I have to write a function that processes all of the JSON files that I want to look at. This is pretty trivial given a small, well-defined set of data (like this), but I wouldn’t want to give up SQL queries for anything that might require several tables or joins – of course, I could also look at CouchDB, which yangman the fucking prolific hacker has recommended.

Data Integrity/Transactions

The data is read-only, my friends. Don’t need to worry about data integrity or transactions.

It’s Entirely Static

Yes, the entire website is Python-generated HTML. The Python reads the JSON and spits out all of the required HTML and the Atom feed. It’s pretty flexible about what can be _IN_ the JSON, so long as there’s something in there (only trouble is, the ‘default’ values look pretty default on a website.) When you hit the front-page, it’s just… well, Apache serving up HTML. I hear it does that pretty well.

There are benefits to this:

Static pages are lightning fast.

There’s no code-interpretation phase. The site just fetches the site and spits it out at you, as fast as it can.

Caching is trivial.

The whole site is cached HTML, so …well, there’s not a lot to worry about, cache-wise. A few HTTP cache directives (I haven’t even installed mod_expires yet, that’s a next step) should be enough to keep the site peppy under just about any sort of load. Throw in some gzipping and I’ve pretty much done everything I need to keep the site humming along.

In order to pull this sort of caching off with a dynamic site, some serious moustache-heavy caching has to be done.

Deployment is trivial.

I could take the ball of generated .html and plunk it down anywhere on the internet. All but a few of the links (in the Atom feed) are relative, so it should work anywhere.

No Broken Site.

In the current configuration of the site, if anything goes wrong in the html-generation phase, it keeps everything that has been generated so-far in a ‘temp’ directory and leaves the ‘release’ version of the generated site alone. So errors should rarely propagate all the way to the front page – although the odd rendering error has been known to make it through. (I don’t always look at the ‘temp’ directory before I push it to release, because I’m a lazy schmoe.)

Small Source

The technologies I used lend themselves to rapid dev and small code signatures – the whole codebase, stylesheet included, weighs less than 19kb.

There’s also a distinct penalty:

No dynamic features.

My site has no dynamic features whatsoever. Not a single one. Nothing server-side happens at all when you roam potater. So let’s look at the one dynamic feature that every blog, article, Tom, Dick and Harry has had embedded into their site since 1999: Comments.

Oh, wait, I can do that with Javascript. The Disqus comment system, by focussing entirely on comments and nothing else has quickly become one of the slickest comment systems on the internet. I don’t like storing my comments in the cloud, but unless an open-source competitor to Disqus pops up, I barely have a choice. On top of that, it means that all of the dynamic ugliness happens on Disqus’s side of the equation.

Let’s look at a few more thingamapoopers:

There are only two images on the page.

Everything else is CSS. And that bulky 120kb background image is going right in the cache (as soon as I set that up.)

The page has an Atom feed.

Atom works nicely with Python, because Python works nicely with ISO 8601 date formatting. .

More Ideas

So, that’s about it for Potater’s features. Now let’s talk about plans for the near future!

Creative Commons License – Done

The whole site, code, database, the works, is going up under a Creative Commons sharealike attribution license. Just have to add that as a dongle near the bottom. Heck, I could add a step in the ‘release’ script that just balls the whole thing up from socks-to-tits as a .tar.gz file and includes it next to the Creative Commons link. (A good hook for autobackups, too.)

Edit: The site is now under a proper GPL license.

Categories

There’s category metadata in the .json files, but the site doesn’t do anything with that. Category-specific pages shouldn’t be too difficult to spatula together.

Widget

Thanks to the freely-available JSON data, a PotaterWidget should not be too hard to generate.

Validation

While an empty Lorem Ipsum site w3c validates as HTML5, some invalid stuff can come through the JSON pipe. I’m looking at you, ‘&’ symbol. Fixing this, and making the site pass through an automatic W3C validation step before going live, are both ideas.

Cache/Gzip – Done

There’s no caching or gzipping yet! Baffling! With content so expressly optimized for such a situation, that’s just a travesty. I’m also going to look at lighter-weight HTML-serving options, like nginx or lighttpd – especially because all my web server has to do is serve up vanilla HTML.

Edit: As it turns out, Apache makes this sort of thing pretty darned foolproof.

The Flood

A tool to parse my Google Reader OPML (xml containing all of my feeds) into a working web feed to share with others. It’s on the buildy-buildy list.

Getting The Word Out There

Also, I have to make Potater THE MOST POPULAR WEBSITE ON THE INTERNET. So, you know, I should .. tell some people about it or something?

One Response to “Potater Details”


  1. lassam says:

    Also, EQJ says I have to put a bee in my background image. She is correct. Bzz-bzz-bzz.

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>