16.10.10

What - Ocelot: The story of a tracker



What.CD is a private tracker. Thus, the entire site, staff, and community all revolve around a common piece of software - the tracker backend. Complementing the site frontend, which you're looking at now, the tracker itself handles connections between peers.

With over five million peers, our tracker receives an average of 3,500 hits per second, although after a period of tracker downtime, load can spike up to past 12,000 hits per second. This means that, when your client announces, the tracker has 80 microseconds to search through its database of over 900,000 torrents and 5,000,000 peers, compute a response, and send it back to you. That's a lot of stress on a piece of software!

We anticipated this problem, of course, back before the site even started. That's why we elected to use what was then the fastest private tracker backend in the world - XBTT.

Lauded for its speed, XBTT handled the peers very well for the first few months of the site's existence. We brought on a developer - asm - whose job was to tune it and modify it as needed, and he was able to do that just fine - for a few months. However, asm was reluctant to make any major changes. When we asked why, his response was that XBTT's code was too weird, and that he was afraid he'd break something.

A bit surprised, we lead site developers peered into the bowels of XBTT for the first time, and we found that he was correct. XBTT's internal code worked fine in practice, but strange/outdated design decisions and the inclusion of thousands of lines of unnecessary code gave us worries about how well it would scale to a swarm of the size we had planned, as well as whether we'd be able to continue modifying it to our needs.

So a plan was formed. We would create a tracker of our own.

Late winter 2007

It made perfect sense. We were already replacing the outdated TBDev source with our own new Gazelle source, so why not replace XBTT with another piece of software as well? Make it fast, make the code pretty, give it a cool-sounding exotic animal name, and we'd be set. It couldn't possibly take very long - trackers are very simple pieces of software, after all. The only problem was that XBTT had scared asm into hiding, the other developers were all php developers (php is a language that is fast to write and slow to run) and we wanted the tracker coded in C++ (slow to write, fast to run). The solution was thus to outsource.

January 2008

Our first developer choice was a young developer called rootkit. Immensely intelligent, but perhaps not the greatest people person in the world, rootkit decided that he wanted to write the tracker in haskell instead. We weren't too excited to have the tracker written in a weird language that no one understood, but he promised that it'd be fast so we let him go at it. We don't think he ever wrote more than a hundred lines of it before he gave up and stepped down.

While we searched for a new developer, WhatMan decided to try an experiment - to see if a php tracker could outperform XBTT. He hacked away for a weekend and created Lioness - a beautiful little tracker, no doubt one of the fastest php trackers ever made. Unfortunately, it wasn't quite fast enough for our needs - upon testing, the swarm crushed our poor webserver, and we were forced to go back to XBTT.

By this time, XBTT was barely able to keep up with the load. The timeouts had already started, and we did whatever we could, but in the end, the only thing that really helped was when we moved to our new (then) ridiculously oversized server in Canada.

March - May 2008

Another developer had been found! The guy was smart, mature, well educated, fluent in C++, and seemingly very able. We told him what we needed, and he started coding. A month later, the new dev - lenrek - had created the first tracker to call itself Ocelot.

lenrek's ocelot looked promising. It was new, shiny, and multithreaded. We figured that our problems were solved, but when we tried it out, it exploded. It is still unclear exactly why, just that it happened. That ocelot was tweaked and some more tests were run, but we eventually gave up. lenrek's ocelot was basically shelved, and attention turned, for the next year, back to making XBTT handle its load properly.

Fortunately for us, lenrek stayed on as a developer - although his ocelot didn't succeed, he's responsible in a large part for making the site work as well as it does today.

June 2009 - February 2010

In the next year of stagnation, ocelot was never quite forgotten, but working on it was never very motivating - especially with only one tracker dev. So we raised the XBTT announce interval from 30 minutes to 35, then to 40, then to 45. In the meantime, the idea of ocelot waited until we found someone to revitalize it. In June 2009, FZeroX found such a person - rconan.

rconan was incredibly intelligent, and came up with a plan for what everyone was pretty sure was going to be the most awesome tracker ever. High performance event queues, hashmaps, all that cool stuff. We outsourced the project to him, he started coding, and initial progress was very rapid.

Two hundred changes and additions to rconan's new ocelot were made between the months of August and October. Before we knew it, the new ocelot was all but finished - 4,000 lines of divine C++ code, with just "a few" bugs and features left to code. And then, rconan's real life started to get busier.

A couple of changes were made in November, a couple in December, one in January, and a final flurry of activity took place in February. When we asked for progress updates, ocelot was still a few bugfixes and features away from being ready for production, but no changes were ever made after February. As none of our in-house developers had been closely following the development of the new ocelot, we were unable to take over, and simply hoped that rconan's real-life obligations would clear up and he'd have the time to finish it.

In the meantime, we had raised XBTT's announce interval to the highest point we could justify - 47 minutes - and it was still timing out so often it became a joke. In April 2010, we gave it its own server and started load balancing multiple instances of it - starting out with 2 XBTTs, and then 3, and then 4. This gave us some breathing room, but not for long.

April - May 2010

At one point, A9 and oorza were arguing about java performance. A9 had the brilliant idea of daring oorza to write a high performance tracker in java, and work began on shadowolf. oorza proclaimed shadowolf "almost completely done" on May 12th, save a few outstanding bugs. We checked in on his progress at the end of August, and he was rewriting the entire plugin architecture, and considering using hadoop to store peers. We're unsure about shadowolf's current status.

August-September 2010

No updates had been made to ocelot in eight months, and rconan was nowhere to be found. The future of shadowolf was unclear. When a thread came up about ocelot in the forums, the staff were forced to admit that development on it had ceased, and that no update was liable to take place in the near future. It was a hard post to write, considering how the timeouts had become so bad that the joke wasn't funny anymore. Users would sometimes have to wait hours for the tracker to let them download things, stats were being lost left and right, and we were out of hardware to throw at the problem. Something had to be done.

Enter WhatMan. Having previously stayed out of the C++ tracker development arena due to a lack of confidence with his high-performance C++ coding skills, WhatMan was confused with as to why everyone was creating 4000+ line of code behemoths when trackers are, in reality, extremely simple pieces of software. So he lifted some key design choices from rconan's ocelot, created the rest of the design himself, and spent the last week of August hacking away at a brand new ocelot.

On September 1st, ocelot was ready for performance testing. We replaced one xbtt instance with it, and it scaled. So we replaced two, and it scaled. We tweaked it a bit, and then replaced the third and fourth instances, tweaked it a bit more, and replaced the load balancer. What four XBTT instances and a load balancer were failing to handle before, was now being handled by one, singlethreaded instance of the latest ocelot.

Then we pushed it harder - we lowered the announce interval to 40 minutes, and then to 30, and it scaled. Then we lowered it to 20 minutes, and linux broke before ocelot did. It was beautiful.

The dev team rejoiced, and banded together to add the remaining features and fix the remaining bugs. By September 3rd, ocelot was considered feature complete, and we let it run the entire swarm - one tracker for five million peers, at a 30 minute announce interval.

September 2010 - Now
Since then, ocelot's been purring along. It uses up 20%-30% of one CPU core, and 3GB of RAM - for comparison, our four XBTT instances used the same amount of RAM in total, and 50%-100% of a core each. It's 1547 lines of code long in total, which will be open-sourced at some point. The dev team has added the occasional bugfix, and there may be some bugs yet to be discovered, but our tracker is now more stable than it's been since we started. After over two and a half years, ocelot's journey to creation is finally finished.

0 komentarze:

Post a Comment