Figure 10: Not News

This tool allows you to explore the post history of (not) news aggregator Since Fark assigns short tags to their stories, it's possible to look not only at the sources linked, but also at how they are perceived. After you enjoy some selected trends and comparisons, feel free to investigate other domains. Domains with at least 3 posts are available, suggested sorted by total posts.

The line graph at the top shows the fraction of the stories linking to a given domain. The pie charts below it show the distribution of tags assigned to that domain (mouseover for details) with the total number of posts indicated. The color key is hidden by default because there are so many tags, but most tags are assigned their standard color. Click 'Show Key' to view it.

While this figure was designed to explore how various sources are treated, the history of tag usage is also interesting and that question is answered in bonus non-interactive plots: By daily count or normalized. Other than the notable introduction of 'Fail', tag usage hasn't changed much since 2006.

Fark's post history reveals not only its own culture and politics, but the history of the greater web. In its early days, Fark relied on far fewer sources, many of them aggregators themselves, some of which have since virtually disappeared from the web as a whole. Excite, for instance, was a common source in the late 90s. CNN has always been more popular with the site than FoxNews (by a huge margin initially), while MSNBC has all but disappeared as a source. Yahoo is the most common source of all time, largely because it hosts stories from the major news wires. Google, in contrast, seems to be used primarily as an image host for the site's 'Photoshop' contests. In that space, PhotoBucket gave way to Imgur around 2010; Imgur also sees more linking for reasons other than 'Photoshop' and 'Caption' contests. Current readers will not be surprised to hear that the Daily Mail and Huffington Post have skyrocketed in popularity as of late (the former with more than its share of 'Strange' stories). Finally, fads like Homestar Runner are evident and Fark's spat with Ananova is obvious in its sudden plunge.

Data were collected by scraping the Fark archive pages (with permission) from September 1999 (when tags were introduced) through July 2014. In this time 306,385 posts were published (plus a few in the first few months with either no link or no tag; these were dropped). Subdomains were merged up and posts were binned monthly.

Visualizing this dataset with three interesting keys (domain, tag, and date) proved tricky. There are way too many domains and tags to use a Sankey diagram. Stacked area plots by tag can be annoying to read with so many layers (as evidenced by the bonus plots) and can't be placed on the same axis or easily compared. Though I'm not thrilled with the dual line/pie configuration, I do think it answers the two questions I was interested in exploring (How site popularities have varied and how sites are perceived, as seen through tag distributions). Originally, a scrubber would have allowed highlighting only part of the line graph, with the pie charts responding accordingly, but this was scrapped as the time dependence of tag distributions is usually relatively weak and it adds complexity to an already somewhat busy interface.


  1. Archives (With permission. Thanks Drew!)