Live’r Than You’ll Ever Be
January 29th, 2010This Kleptones CD is fantabulous, if a bit old. Seriously, download it. DO IT NOW.
This Kleptones CD is fantabulous, if a bit old. Seriously, download it. DO IT NOW.
I was out drinking with my friend Allen a few nights ago. He was asking people at the table to talk about technologies that they used every day that were perhaps less fulfilling then they could be.
I hemmed and hawed. I’ve put so much time into my day-to-day technology that every step of the day is smooth and slick, aided by solid technology along the way.
There are a few arenas where the technology just doesn’t stack up – the Digital ACM Library is some sort of Coldfusion monstrosity, held together by hope and duct-tape and dreams. Whenever I try to find good teaching materials, I’m consistently stumped by a half-assed collection of blogs, teacher forums, and what-have-you. And when it comes to webcomic output, none of the self-publishing solutions are any good at all.
But most of these markets are aggressively niche. I’m not a teacher, I’m not allowed to disseminate ACM materials (dammit), and while I aspire to be a webcomic author some-day, I just don’t have the attention span required to write something hilarious every day – like, say, Zach Wiener. Okay, I still might end up building a better comic publishing engine. It’s totally on my list.
But then I thought of the biggest fish in the pond of websites falling out of my radar.
Reddit.
Oh, yeah, it was good. It was really good, in it’s heydey, when it first started up, it’s community was small and technical and smart and funny, and all of the submissions were small and technical and smart and funny, but it’s become bloated with old articles and karma whores and pictures upon pictures of cats.
That’s good for reddit. For them it means mainstream success. reddit’s also starting to hemorrhage users because nothing on their front page is new anymore. They’ve managed to compile a looping “greatest hits” of the internet, but that’s it, the end of the game, nothing new under the sun.
So instead of hitting reddit for the new and amusing, I started hitting RSS feeds instead. Oh, RSS. I love RSS with a passion. It is a fine technology. Every day my inbox is flooded with a hundred articles, many of them fresh and relevant to my interests.
So there are blogs, which are original sources of technical information, and magazines, most of them staffed by the technically savvy. The magazine-blogs read everything and post it immediately, vast filters who produce reams of information that is relevant to your interests. BoingBoing is the big fish in this pond- every day it’s a flood of the best of the internet, annotated with clever writing. What you’re reading is maybe 10 people reading everything on the internet every day and posting their favourite bits. Most of Wired’s blogs and feeds feel the same way.
These feeds are not successful because they do the reporting themselves. These feeds are successful because they stay on top of everything on the internet and only post the best links that they can find – hot and fresh and delivered right to your browser. It’s like a reddit, but instead of a mob of cat-hungry jackasses, it’s a paid staff, and they find a lot of really good stuff.
In fact, that’s what my new project, Potater, is all about. It’s just me reading as many blogs as I can handle, filtering out the good bits, and serving them up at the end of the day like a hot platter of steaming news.
I like it, I’m having a lot of fun with the Potater project. If you’ll excuse my expression, though, it’s … well, small potaters. It’s sort of a just-me-havin’-fun project.
And for most people who understand RSS feeds, they aren’t going to just let me do the filtering for them. Instead, they are going to add me to their burgeoning list of RSS feeds. The trick is just that I might be subscribed to something that they are not. I have even more interesting links for people who have interests along my lines.
In my opinion, the new lifeblood of the internet is not in publishing. Publishing is cheap-as-free. Blogging, images, comics, videos, whatever a user has to offer, there’s a way to get it on the internet for free or almost free. Because of it, the entire publishing industry is starting to fall on it’s ass. When anybody can publish something for ten bucks a month and push it to ten thousand users directly, what’s the point of going through a distribution company? They just need some sort of monetization model and they’re good to go.
No, traditional publishing is dead, or at least it will be soon. Newspapers, scientific publications, music distributors, even the big movie and television distributors are going to lose out eventually.
But this creates new problems. People aren’t getting paid for the content that they create. There’s no quality control on the internet.
These are the two big problems, now that distribution is out of the way – the people who used to be in charge of these things were the editors. These are people who used to be paid hundreds of thousands of dollars a year to sit in a room and read everything that came their way. They’d short list the good stuff, arrange it all in a neat and tidy little package, hire people to take pictures, have it typeset and nicely designed and copy-edited, and then sell that as a book or a magazine or a compact disc.
And these middle men were undeniably getting fat off of the contributions of the people who were actually producing the good stuff.
But the gravy train is starting to come to an end. Publication on the internet is free. There’s no need to work with these massive arbiters of the public taste.
And lo and behold, with the internet comes a flowering of niches that were underserved by these publications. Fanfic, hot-off-the-press technical news, cartoonists – oh BOY were cartoonists fucked over by the original system – all of these were people who just didn’t get a piece of the pie the ‘old way’ – it’s no wonder that they’re leading the charge in the free publishing world.
Here I am, sitting, looking at a copy of Communications of the ACM that I’ve just received in the mail. It is stunning. Full colour, beautifully typeset, full of imagery, free of ads, and totally relevant to my interests. It has articles about search engines and MapReduce and triple-parity RAID and streaming SQL technology and a computing museum; articles about how biology lacks proper standards, business of software articles, and about recent research in x86 sandboxing. No ads, even.
I’m not saying there’s not a market for this kind of thing. This magazine is magnificent. It’s also expensive, and monthly. There’s a new space out there, a space where news is instant and unfiltered and raw, and the new champions of this space are the BoingBoings, the TechCrunches, the reddits – the sites which act as the new magazines, deciders of taste, compilers of internet goodness.
This is the unexplored territory, a rich territory, the battlefield upon which the new publication battles of the 21st century will be fought.
And RSS is where it all happens.
———————
So, we return to my original point. Reddit is great, because it’s community-filtered, but it’s terrible, because the news is stale and the user base is starting to get tragedy-of-the-commons stupid.
On the other hand, e-Magazines like BoingBoing and Wired are great, because they’re hand-filtered, fresh, and relevant, but because of the manpower required, they don’t serve niche markets effectively and they bleed money. On top of that, they run out of news really quickly, and they rarely filter through the reams upon reams of excellent self-published material out there. (BoingBoing is pretty great, mind you, but there’s only so much BoingBoing to go around, and it’s quite the info-flood, still.)
Now, gentlemen, perhaps you are starting to get the gist of what I am proposing. Community-powered RSS-filtering. Each ‘channel’ is a package of RSS feeds, managed by a single person, or a small group, or publicly. My personal channel might be all of the feeds that go into the making of Potater. Our group channel might be an amalgam of all of the coolest tech and vancouver news that we can find. A public channel might cover a broader topic, like ‘technology’ or ‘Ruby blogs’.
As these channels flow, flush to bursting with all of the data that they contain, individual users flag individual posts with ‘notability’ markers. “I really like this”. “I like this.” “I hate this. ” (etc..) The notability markers affect the real-time view of the data – heavily liked items appear larger, they glow and pulsate with the energy of attention. Heavily disliked items fade into obscurity, both literally and figuratively.
When items reach certain notability thresholds – the feed-consumer, the end-user, might decide the flow (‘N links/day’ or ‘N links/hour’), which would translate into a notability requirement that the user wouldn’t see (‘5 upvotes’) – when the items reach these thresholds, then they get plunked into the “output” feeds, which can then be ‘input’ feeds for other groups or users (or just end up directly in somebody’s feed-reader.) Either that or the feed publishers can just set their own notability thresholds.
These links can come out raw, or annotated – so a small, clever group, a BoingBoing, they could gather, annotate, and re-publish a ball of feeds every day. For public feeds, the annotations would be more along the lines of a community discussion. Perhaps the discussion itself would constitute a feed of it’s own.
This is still in a fuzzy sort of design phase, but this is the direction that I want to take Potater.
I even assembled an imaginary A-Team of people who I’d want to work on Potater with, and then mentally assigned them powers from the Planeteers. Because I can.
Allen, with the power of Fire, a tech-blogger, a feed-reader, an Apple engineer, a tireless advocate of the entrepreneurial spirit, a battle-hardened coder, a front-end man.
Yangman, with the power of Water, a fucking-prolific-hacker, a pythonista, an expirimenter with everything that nobody’s ever heard of, and a sexy-beast, to boot.
Demwell, with the power of Wind, with business chops and technical prowess in equal measure, with natural language processing and distributed systems experience, and a beard that just won’t quit.
and
ME, with the power of Earth, because I take forever to do anything and weigh a fucking ton. Also I can code and design some. Write a bit. Forge tools from stone to harvest simple grains. That sort of thing. I could build this sort of thing on my own, but I have the awful tendency to get super-excited about a project, do a tonne of planning and preliminary code, and then lose interest and fall asleep in some bushes. This is why I need other people to help push things forward!
I hesitate to hand out the power of Heart, because it’s a super-lame power. I’d have nominated Danly for that specific honor, but apparently he’s under a no-compete/NDA so restrictive as to render him useless for side-projects. Curse you, danly’s source of funding!
Don’t feel bad if you’re not on the list. I still think that you are awesome, and I would have included you if I didn’t run out of Captain Planet themed powers. I mean, I could hand out the bad-guy names, but who wants to be Looten Plunder?
Wait, can I change my power to “Being Looten Plunder”?
So, tell me what you think of this ill-defined masterpiece. Throw ideas. Throw poop if necessary. Not too much poop. Keep poop-throwing to a manageable level. Please.
Okay, so, last night, I launched Potater.
Now it’s time to talk about the completely ridiculous technical decisions I made when I was developing Potater.
The entire site is stored as a ball of JSON files. Heck, you can see them here if you like.
Here’s a tweeest, though – all of the text is stored as ‘restructured text‘ – this is because I edit the JSON files by hand (vim + screen, the man’s man’s editor) and I want them to remain human-readable and human writable.
Here are some benefits of the flat-file JSON approach:
These files contain enough data for you to easily tell what goes where. Each file corresponds with a Top 10 page, and the contents should be so obvious that I don’t even need to tell you how it works.
These files also constitute a workable API for toodling around with the site. Look, all of the data on the site, you can download the JSON files with wget and fiddle with them on your own. Want a listing of the number one entries for the last three weeks? Why not! JSON parsers are built into or libraries for just about every important language, now, so converting one of these files to an object is trivially easy.
Okay, so, backing up a database isn’t that hard. Backing up THIS database, though, is almost free it’s so easy. Just … get the files. In fact, that plays into my next feature…
The entire database is under version control, right next to the code. Rolling back changes to the data is no problem, captain! Backup is as easy as checking the whole thing out from a different computer.
Of course, there are a few features of a database that I don’t get, but let’s consider those:
My entire database only has file-level security. Well, all of the files are downloadable anyways. I don’t care if people can get at my data. It’s not a problem.
For any query I want to make, I have to write a function that processes all of the JSON files that I want to look at. This is pretty trivial given a small, well-defined set of data (like this), but I wouldn’t want to give up SQL queries for anything that might require several tables or joins – of course, I could also look at CouchDB, which yangman the fucking prolific hacker has recommended.
The data is read-only, my friends. Don’t need to worry about data integrity or transactions.
Yes, the entire website is Python-generated HTML. The Python reads the JSON and spits out all of the required HTML and the Atom feed. It’s pretty flexible about what can be _IN_ the JSON, so long as there’s something in there (only trouble is, the ‘default’ values look pretty default on a website.) When you hit the front-page, it’s just… well, Apache serving up HTML. I hear it does that pretty well.
There are benefits to this:
There’s no code-interpretation phase. The site just fetches the site and spits it out at you, as fast as it can.
The whole site is cached HTML, so …well, there’s not a lot to worry about, cache-wise. A few HTTP cache directives (I haven’t even installed mod_expires yet, that’s a next step) should be enough to keep the site peppy under just about any sort of load. Throw in some gzipping and I’ve pretty much done everything I need to keep the site humming along.
In order to pull this sort of caching off with a dynamic site, some serious moustache-heavy caching has to be done.
I could take the ball of generated .html and plunk it down anywhere on the internet. All but a few of the links (in the Atom feed) are relative, so it should work anywhere.
In the current configuration of the site, if anything goes wrong in the html-generation phase, it keeps everything that has been generated so-far in a ‘temp’ directory and leaves the ‘release’ version of the generated site alone. So errors should rarely propagate all the way to the front page – although the odd rendering error has been known to make it through. (I don’t always look at the ‘temp’ directory before I push it to release, because I’m a lazy schmoe.)
The technologies I used lend themselves to rapid dev and small code signatures – the whole codebase, stylesheet included, weighs less than 19kb.
There’s also a distinct penalty:
My site has no dynamic features whatsoever. Not a single one. Nothing server-side happens at all when you roam potater. So let’s look at the one dynamic feature that every blog, article, Tom, Dick and Harry has had embedded into their site since 1999: Comments.
Oh, wait, I can do that with Javascript. The Disqus comment system, by focussing entirely on comments and nothing else has quickly become one of the slickest comment systems on the internet. I don’t like storing my comments in the cloud, but unless an open-source competitor to Disqus pops up, I barely have a choice. On top of that, it means that all of the dynamic ugliness happens on Disqus’s side of the equation.
Let’s look at a few more thingamapoopers:
Everything else is CSS. And that bulky 120kb background image is going right in the cache (as soon as I set that up.)
Atom works nicely with Python, because Python works nicely with ISO 8601 date formatting. .
So, that’s about it for Potater’s features. Now let’s talk about plans for the near future!
The whole site, code, database, the works, is going up under a Creative Commons sharealike attribution license. Just have to add that as a dongle near the bottom. Heck, I could add a step in the ‘release’ script that just balls the whole thing up from socks-to-tits as a .tar.gz file and includes it next to the Creative Commons link. (A good hook for autobackups, too.)
Edit: The site is now under a proper GPL license.
There’s category metadata in the .json files, but the site doesn’t do anything with that. Category-specific pages shouldn’t be too difficult to spatula together.
Thanks to the freely-available JSON data, a PotaterWidget should not be too hard to generate.
While an empty Lorem Ipsum site w3c validates as HTML5, some invalid stuff can come through the JSON pipe. I’m looking at you, ‘&’ symbol. Fixing this, and making the site pass through an automatic W3C validation step before going live, are both ideas.
There’s no caching or gzipping yet! Baffling! With content so expressly optimized for such a situation, that’s just a travesty. I’m also going to look at lighter-weight HTML-serving options, like nginx or lighttpd – especially because all my web server has to do is serve up vanilla HTML.
Edit: As it turns out, Apache makes this sort of thing pretty darned foolproof.
A tool to parse my Google Reader OPML (xml containing all of my feeds) into a working web feed to share with others. It’s on the buildy-buildy list.
Also, I have to make Potater THE MOST POPULAR WEBSITE ON THE INTERNET. So, you know, I should .. tell some people about it or something?
Okay! You have to check this shit out! It is shit that is crazy! Crazy shit!
This Daily Top 10 that I have been writing has been so fun that I have launched a new website for it. This website resides at potater.com. There’s also a weekend roundup there, although I composed it really quickly so it might be a little bit exactly-the-same-as-usual.
It’s kinda late, and I’m kinda tired, so I’ll leave off talking about the totally ridiculous way I’ve built potater.com until tomorrow. What I will say now is that it supports Atom (not RSS, fuck RSS, doesn’t support ISO dates, haet.)
Also? Still lots of polish needs to be done. Like ‘category’ support, and ‘crazy speed optimizations’, and ‘creative commons license’, and ‘picture of Gary Busey’ and ‘api documentation’. More tomorrow. Now sleep.
What am I going to be doing for the weekend? Oh, if I can get it done, finishing up the new home for these Top 10 Lists. You can see a early non-functional design preview, here, if you’d like.
After the whole Ok Go buzz caused by their Open Letter two days ago, Wired interviewed them. The interview is solid and they include two new Ok Go music videos, which are clever if not overly catchy.
A quick look at fish farming. Can it be made sustainable? A few billion more hungry mouths on the planet are hoping so.
This is from a series of articles on how to eat food from all around the world, without leaving Vancouver. Of course, the “Moderne Burger” is just food from … well, here, so it’s not really that ethnic. Nevertheless, the description and image left my mouth watering, and it’s definitely on my list of places to hunt down.
This has been covered up and down and all-across-the-internet, last few days, but Wired has the best pictures. Some scientists discovered that slime mold can form fairly optimal paths between ‘food nodes’, and that if they lay out a system that looks like Tokyo, they get optimal paths that look like the Tokyo rail system.
Before I read this article, I did not even know what a “Shannon Limit” was. (It’s still a little fuzzy, I’ll admit.)
There’s no better way to improve your feel for good designs than to look at them. Here are a whole bunch.
60-years-ago today, the Tucker Car Corporation was acquitted of fraud. The story behind their failed car is a good, solid read. So go solid-read it!
The Engineer’s Guide To Drinks – which has nothing to do with TF2 – is merely a handy chart showing blueprints for a wide variety of alcoholic beverages.
It’s a long name, but the content is worth it. This is the showcase from which Portal’s predecessor, “Narbacular Drop” – and other, similarly unusual, clever, or just strange games have been born. (“Bontago”, for example. ) Most of the games can even be played, right now. So get to it! It’s the weekend!
The All-American Basketball Alliance, a basketball league that only allows white American-born players, will start in June. The commissioner, Don Lewis, is reported as saying something along the lines of “I’m not racist, but I’m tired of the way that those fucking niggers, spics, and towelheads play basketball. What if they pull out a gun and rape your daughter in the middle of the game? That would be terrible.” His opinions as to cotton-fields are as-yet unvoiced.
Flickr’s new faces API allows you to see who’s in a picture, and see pictures of a specific person. Very simple, too.
Apparently, people are getting sick of having three thousand plastic music gubbins in their living rooms, and the Rhythm Music genre is not selling as well as it used to. To anybody who did not see this coming thousands of miles away, I award you a condescending pat to the head.
In the world of wearable tech, a prototype corset gets tighter in response to air-quality, paradoxically making it even harder to breathe.
So, I found this comic about freelancing. Today. It’s been going on for a while. It’s not hilarious, but it is … recognizable, I guess? So just read the whole archive, have a few chuckles, and get on with your day.
I always wonder how a comic called “Wasted Talent” is about a talented and consistently improving cartoonist who is also a field engineer and married to a software developer, and they’ve purchased an apartment in the downtown core? I mean, that seems like a pretty efficient use of talent.
Okay, so, I’m not going to lie, I was just pulled in by the name of this article by Tim Bray. Seriously, isn’t that a pretty good pull? With a title like that, you’ll read anything that he wants to write. You have no choice. Apparently it’s a book review? I might have to buy this book now.
Okay, so, yesterday, I cut an article from the Top 10, one where the New York Times was going to put up a “paywall” to try to make a little bit of money off of the internet – because ‘new distribution models’ (i.e. free everything) are cutting traditional newspapers’ KNEES off. It didn’t make the cut, but it’s relevant to today’s link – the author of Saturn’s Children (the book about which the above article ‘Robot Sex Slave Blues’ was written) wrote a long article about how he don’t make no monies from the intarwob.
Which is interesting, but people have been writing the same article for years now. Now that everything on the internet is free, how the heck are we going to make money off of it?
Okay, okay, by now, any web professional knows everything there is to know about Web Security. It doesn’t hurt to read it one more time, just to keep things straight in your head. It was even recommended by Bruce Schneier!
Making fun of food advertising? I love it.
Windows Live (the current generation of Outlook Express => Windows Mail => Whatever) offers the following curious pre-checked box during installation:
“Set Bing as the default search provider in your browser, and prevent programs from interfering with this choice.”
“Microsoft Choice Guard is a small piece of software that’s downloaded during the Windows Live installation process to carry out any changes that you choose to make to your browser settings. If you choose to change your default search engine, Choice Guard looks for any programs on your computer that might interfere with this process and works around them to let you change your default search engine. Choice Guard does this to help ensure that choosing your default search engine is as easy as possible.”
Right off the bat, a software called “Choice Guard” has me wary. On one hand, preventing ‘malicious programs’ from setting your homepage to TelusPortal.crap or Toshiba.Aol.blug is probably not a bad thing… on the other hand… well… did you read that?