Crunching the numbers on the millions upon millions of Reddit comments just to crack the code on upvotes would be difficult work. Thankfully, two software engineers, a Google big data project, and some careful analysis have already done the heavy lifting, revealing the simple trick to getting the top comment in most threads: get there first.
Jason Michael Baumgartner of Pushshift.io has been sporadically releasing databases of Reddit’s trove of comments, and last November Max Woolf ran that mass of data through Google’s BigQuery to answer a question that had been posited by another Redditor: What percentage of top comments are the first comment? The answer BigQuery spat back out was a sizable 17.24 percent. Extend that range to the first five comments and it covers a whopping 56 percent.
It’s perfectly intuitive: The first comments get a head start on gathering upvotes, and upvotes signal quality to people who don’t have the time or patience to scroll farther into what are perceived as less valuable responses.
Image: llewellynjean from r/dataisbeautiful, based on work by Max Woolf
“Because first comments carry so much weight, it provides a bias toward power users who are on Reddit all the time,” Woolf told Gizmodo over email. A recent update to his work by redditor llewellynjean “is the same and has the same percentage breakpoints,” according to Woolf. Partly that’s because a few more months of comments are a drop in the bucket for a dataset of this size, though Reddit also hasn’t done any major overhaul of their comment sorting algorithm that we’ve been privy to.
While Reddit prides itself on being highly democratic and community-driven, the numbers indicate that the success of comments isn’t driven by their conversational value so much as by who had the time to camp out on a subreddit’s new submissions and plant their flag first.
“Dramatically increase the weighting of new comments such that they are more likely to be seen, even in megacrowded threads,” is the solution Woolf suspects might unmire comments from favoring speed over quality, though he seemed skeptical a change would ever be implemented. “From Reddit’s perspective, it’s possible they may not want newer comments to have a chance to be seen,” he wrote, “because proven-comments are safer and may drive more engagement.”
Woolf and llewellynjean’s data only pulls from threads with at least 30 parent-level comments (those replying directly to the post), and doesn’t factor in child-level comments (those replying to other commenters). Many subreddits also feature automated responses that post instantly and aren’t heavily voted on, which might also skew the results somewhat. As a sober, non-exhaustive analysis though, it provides some insight into yet another way to game the Frontpage of the Internet for all the comment karma it’s worth. [Max Woolf blog, r/dataisbeautiful]