view counter

YouTube Scalability Lessons

Thanks to Oracle's MySQL Blog for this story

Very interesting blog
post
by Todd Hoff at highscalability.com presenting “7 Years of YouTube
Scalability Lessons in 30 min” based on a presentation from Mike Solomon, one
of the original engineers at YouTube:

…. The key takeaway away of the talk for me was doing a lot with really simple tools. While
many teams are moving on to more complex ecosystems, YouTube really does keep
it simple. They program primarily in Python, use MySQL as their database,
they’ve stuck with Apache, and even new features for such a massive site start
as a very simple Python program.


That doesn’t mean YouTube doesn’t do cool stuff, they do, but what makes
everything work together is more a philosophy or a way of doing things than
technological hocus pocus. What made YouTube into one of the world’s largest
websites? Read on and see...


Stats

  • 4 billion
    Views a day
  • 60 hours
    of video is uploaded every minute
  • 350+
    million devices are YouTube enabled
  • Revenue
    double in 2010
  • The number
    of videos has gone up 9 orders of magnitude and the number of developers has
    only gone up two orders of magnitude.
  • 1 million
    lines of Python code

Stack

  • Python - most of the lines of code for YouTube are still
    in Python. Everytime you watch a YouTube video you are executing a bunch of
    Python code.
  • Apache - when you think you need to get rid of it, you
    don’t. Apache is a real rockstar technology at YouTube because they keep it simple.
    Every request goes through Apache.
  • Linux - the benefit of Linux is there’s always a way to
    get in and see how your system is behaving. No matter how bad your app is
    behaving, you can take a look at it with Linux tools like strace and tcpdump.
  • MySQL - is used a lot. When you watch a video you are
    getting data from MySQL. Sometime it’s used a relational database or a blob
    store. It’s about tuning and making choices about how you organize your data.
  • Vitess- a  new project released by YouTube, written in Go, it’s a
    frontend to MySQL. It does a lot of optimization on the fly, it rewrites
    queries and acts as a proxy. Currently it serves every YouTube database
    request. It’s RPC based.
  • Zookeeper - a distributed lock server. It’s used for
    configuration. Really interesting piece of technology. Hard to use correctly so
    read the manual
  • Wiseguy - a CGI servlet container.
  • Spitfire - a templating system. It has an abstract syntax
    tree that let’s them do transformations to make things go faster.
  • Serialization
    formats
    - no
    matter which one you use, they are all expensive. Measure. Don’t use pickle.
    Not a good choice. Found protocol buffers slow. They wrote their own BSON
    implementation, which is 10-15 time faster than the one you can download.

...Contiues.

Read the blog

Watch the video

Read the entire article at its source

view counter