========= PyWebSF ========= Python Web Performance :Author: Alec Flett :Date: 7/28/2009 What I will talk about ====================== * httpd containers (Apache, nginx, lighttpd) * programming models (threaded vs asynchronous vs hybrid) * Scaling challenges * deployment challenges (memory constraints, CPU utilization) What I won't cover ================== .. class:: incremental * Just making Python faster * SQL, databases - queries, ORM, etc * Yahoo! performance recommendations - follow them! * HTTP Caching - do it! * Windows - you're on your own What is performance? ==================== Basic measurements: .. class:: incremental * Throughput - requests served per second * measured in requests/second * 17 requests/second = 58ms per request? no! * Latency - average time from start to end of request Performance: latency vs throughput ================================== Degenerate cases: * 1000 requests/second, average 2000ms per request * 500 requests in flight at any given time! Probably lots of resource contention * 100 requests/second, average 200ms per request * less than one request in flight at any given time. Machine at 50% capacity. Inefficient use of resources Performance: costs ================== * Cost: measured in CPU time, memory time * Cost is a measure of efficiency * perfectly efficient: CPUs all at exactly 100%, network saturated, low load .. class:: incremental * but plan for spikes! Python: Challenges ================== * The GIL means threading doesn't work * Python's memory model is heap-intensive, so forking is expensive, little shared memory * Python processes are memory hungry * WSGI implies a call-stack-per-request Programming models: Threads =========================== * Threaded application server: ``paster``, CherryPy * Very WSGI-friendly * Good multi-core support * Except, Python threading sucks! * No, really, it sucks * I/O bound applications benefit, but barely, from threading * BTW: Monkey patch httplib! Programming models: Threads (2) =============================== * Typically single-process model * Very memory efficient * can be stateful without extra moving parts * Other examples: Tomcat, Apache Programming models: Forked ========================== * Single threaded processes * Apache prefork: mod_python, mod_wsgi "embedded" mode * many independent python interpreters * stateless model of development * Maximum potential performance under load Programming models: Daemons =========================== .. class:: incremental * Actual web server: model not important * Socket-based communication to child processes: small I/O overhead! * lighttpd, nginx, Apache: mod_proxy, FastCGI, SCGI - annoying to maintain, jury is still out on performance Programming models: Daemons (2) =============================== * Apache mod_wsgi in daemon mode - good results so far * Child processes CAN be multi-threaded. FastCGI/SCGI implementations don't do so well. * Try just a few threads per daemon - 1-4 * How many daemons? Depends HIGHLY on your application * Too many: memory hungry, CPU thrashing, low latency/throughput * Too few: CPU starvation, low throughput Programming models: mod_wsgi ============================ * My personal favorite * It "just works" * Written by mod_python guys - very smart, very responsive, very active development * Daemon mode extremely easy to manage, more efficient than FastCGI, SCGI * Apache can queue requests Other Programming models: Async =============================== * Twisted - pure python, long-running * Lighttpd with FastCGI + daemons * Not WSGI-friendly.. at all. .. class:: incremental * Except Stackless, but nobody does that yet Other Programming models: Async (2) =================================== * Extremely efficient use of CPU, I/O, memory * No call stack per request => no WSGI! * Without WSGI, you're writing a lot of custom code * Still only a single CPU per process, but very good for I/O bound applications Programming models: summary =========================== * Apache with mod_wsgi and daemon mode provides good balance of performance and maintainability * Figure out if you're I/O bound or CPU bound * Experiment! Tools & techniques ================== * Develop with ``paster``, deploy with Apache * ``repoze.profile`` - invaluable for request measurement * ``ps``, Scaling ======= * Vertical scaling: your application gets faster with a faster CPU, more network bandwidth, etc - better latency * Horizontal scaling: your application serves more requests as you add more machines - better throughput Horizontal scaling ================== * Be stateless - forked/daemon model supportive here * But, use memcache for state if you need it * Use a database if you need to really save state Architecture for Horizontal Scaling =================================== * Think in terms of a vertical stack of layers * Each layer gets wider/more distributed * shard database servers * shard memcache * application servers are stateless - use memcache/database for state * Need to load-balance across application servers Deployment ========== Classic stack: * HTTP Cache * Web servers * Memcache * Database * Finding the right balance of app servers, database servers, memcache, etc * Whatever resource gets maxed out, add more