Very interesting blog
post by Todd Hoff at highscalability.com presenting “7 Years of YouTube
Scalability Lessons in 30 min” based on a presentation from Mike Solomon, one
of the original engineers at YouTube:
…. The key takeaway away of the talk for me was doing a lot with really simple tools. While
many teams are moving on to more complex ecosystems, YouTube really does keep
it simple. They program primarily in Python, use MySQL as their database,
they’ve stuck with Apache, and even new features for such a massive site start
as a very simple Python program.
That doesn’t mean YouTube doesn’t do cool stuff, they do, but what makes
everything work together is more a philosophy or a way of doing things than
technological hocus pocus. What made YouTube into one of the world’s largest
websites? Read on and see...
- 4 billion
Views a day
- 60 hours
of video is uploaded every minute
million devices are YouTube enabled
double in 2010
- The number
of videos has gone up 9 orders of magnitude and the number of developers has
only gone up two orders of magnitude.
- 1 million
lines of Python code
- Python - most of the lines of code for YouTube are still
in Python. Everytime you watch a YouTube video you are executing a bunch of
- Apache - when you think you need to get rid of it, you
don’t. Apache is a real rockstar technology at YouTube because they keep it simple.
Every request goes through Apache.
- Linux - the benefit of Linux is there’s always a way to
get in and see how your system is behaving. No matter how bad your app is
behaving, you can take a look at it with Linux tools like strace and tcpdump.
- MySQL - is used a lot. When you watch a video you are
getting data from MySQL. Sometime it’s used a relational database or a blob
store. It’s about tuning and making choices about how you organize your data.
- Vitess- a new project released by YouTube, written in Go, it’s a
frontend to MySQL. It does a lot of optimization on the fly, it rewrites
queries and acts as a proxy. Currently it serves every YouTube database
request. It’s RPC based.
- Zookeeper - a distributed lock server. It’s used for
configuration. Really interesting piece of technology. Hard to use correctly so
read the manual
- Wiseguy - a CGI servlet container.
- Spitfire - a templating system. It has an abstract syntax
tree that let’s them do transformations to make things go faster.
formats - no
matter which one you use, they are all expensive. Measure. Don’t use pickle.
Not a good choice. Found protocol buffers slow. They wrote their own BSON
implementation, which is 10-15 time faster than the one you can download.
Read the blog
Watch the video