Joomla!-开源天空

2009-01-10
您所在的位置: 首页 > Web开发技术 > PHP技术 > drupal性能优化第二部分

drupal性能优化第二部分

Joomla! 开源天空  作者:管理员  2008-11-26 22:02
  • 摘要:本文阐述了进行drupal性能调试和监控的工具。

drupal性能测试工具 

Introduction

  This paper will outline my test environment, software used and procedures for benchmarking and profiling. This process can apply to both scaling up and scaling out architectures. Scaling out will require a slightly different approach due to the fact that the entire webserver suite and/or MySQL will be split across multiple servers. 

Test System

  My test system is an AMD 2100 with 2GB RAM and single PATA harddrive. (Stop laughing.) It has XAMPP 1.6.6 installed which contains Apache 2.2.8, PHP 5.2.5 and MySQL 5.0.51a with eAccelerator turned off. Drupal 5.7 and the devel module have been installed. The test Drupal site contains 5000 nodes, with 50,000 comments, 5000 vocabulary and terms with a max length of 12. These first attempts at improving Drupal performance will be done with Drupal core only. The only nonstandard module that has been installed is the devel module.

Devel Module

  The devel module was used to generate all the random test data for the site. After working with it, I noticed it causes a large amount of overhead to the site. During initial familiarization with the module, it created all the test data just fine, but using the module to analyze queries and performance was another matter. Usually the first top 3 longest running queries were from the devel module itself. I'm going to develop a better way to collect this data without the high impact of using the devel module.

  Replacing the devel modules ability to capture the queries can be done within MySQL itself by adding the following to my.cnf file and restarting mysqld:

[mysqld]
 
log = /opt/lampp/var/mysql/drupal.log
log-queries-not-using-indexes
long_query_time = 1
log-slow-queries = /var/log/mysql/drupal-slow.log


  With these mysqld settings, the devel module can be turned off. These settings will also provide more information like logging queries that don't use indexes, something the devel module can't do, plus a greater depth of analysis.

Preparation

  Before we start baselining, we need to make sure MySQL caching is turned off:

mysql» SHOW VARIABLES LIKE 'query_cache%';
+------------------------------+---------+
| Variable_name | Value |
+------------------------------+---------+
| query_cache_limit            | 1048576 | 
| query_cache_min_res_unit     |    4096 | 
| query_cache_size             |       0 | «- query cache is turned off. 
| query_cache_type             |      ON | 
| query_cache_wlock_invalidate |     OFF | 
+------------------------------+---------+
5 rows in set (0.02 sec)

mysql»


  We don't want MySQL caching any of our queries. If it does, then it'll store the results of a query in the cache. The results of subsequent queries will be in the cache and returned much faster than reading tables or indexes. We don't want this right now, but I'll turn it back on later to see how it improves performance.

  On the PHP side, there shouldn't be any caching, nor any other web caching being used.

http_load Baseline

  We're going to start off with baselining the opening homepage. http_load can have multiple URLs in a file to benchmark more than one URL on a given site. I created a directory to hold all the project data.

$ echo 'http://localhost/drupal/' »» urls

  Running http_load with a single connection:

$ http_load -parallel 1 -seconds 20 urls
53 fetches, 1 max parallel, 1.05491e+06 bytes, in 20 seconds
19904 mean bytes/connection
2.65 fetches/sec, 52745.6 bytes/sec
msecs/connect: 0.158604 mean, 0.186 max, 0.148 min
msecs/first-response: 355.007 mean, 520.553 max, 332.182 min
HTTP response codes:
  code 200 -- 53

  This first baseline was taken with the devel module enabled. Turning it off produced this http_load results:

$ http_load -parallel 1 -seconds 20 urls
60 fetches, 1 max parallel, 1.1748e+06 bytes, in 20.0011 seconds
19580 mean bytes/connection
2.99984 fetches/sec, 58736.9 bytes/sec
msecs/connect: 0.163783 mean, 0.241 max, 0.152 min
msecs/first-response: 317.243 mean, 453.476 max, 283.361 min
HTTP response codes:
  code 200 -- 60


  Turning off the devel module resulted in a 12% increase in fetches and a first-response improvement of 11%. These two numbers are pretty much the most important numbers to consider. Probably not a good idea to use the devel module on a production system.

  By increasing the -parallel option, we can increase the concurrent requests to the Drupal system:

$ http_load -parallel 10 -seconds 20 urls
60 fetches, 10 max parallel, 1.1748e+06 bytes, in 20 seconds
19580 mean bytes/connection
3 fetches/sec, 58740 bytes/sec
msecs/connect: 0.1933 mean, 0.549 max, 0.07 min
msecs/first-response: 2867.04 mean, 3269.07 max, 2664.81 min
HTTP response codes:
  code 200 -- 60


  As you can see, my little workstation pretty much detonates with 10 concurrent requests with a first response time of nearly 3 seconds!

MySQL Configuration

  We need to log all the queries that are used when the homepage is requested.

[mysqld]

log = /opt/lampp/var/mysql/drupal.log
log-queries-not-using-indexes
long_query_time = 1
log-slow-queries = /var/log/mysql/drupal-slow.log


  All queries from Drupal will get logged in order in the drupal.log file. The partial contents look like this:


080706 23:56:33 8 Connect root@localhost on 
  8 Init DB Drupal
  8 Query SET NAMES "utf8"
  8 Query SELECT CASE WHEN status=1 THEN 0 ELSE 1 END FROM access WHERE type = 'host' AND LOWER('127.0.0.1') LIKE LOWER(mask) ORDER BY status DESC LIMIT 0, 1
  8 Query SELECT u.*, s.* FROM users u INNER JOIN sessions s ON u.uid = s.uid WHERE s.sid = 'feceaae02b66389ab7d19505b9a9f0a4'
  8 Query SELECT r.rid, r.name FROM role r INNER JOIN users_roles ur ON ur.rid = r.rid WHERE ur.uid = 1
  8 Query SELECT data, created, headers, expire FROM cache WHERE cid = 'variables'
  8 Query SELECT COUNT(pid) FROM url_alias
  8 Query SELECT src FROM url_alias WHERE dst = 'node'
  8 Query SELECT name, filename, throttle FROM system WHERE type = 'module' AND status = 1 ORDER BY weight ASC, filename ASC
  8 Query SELECT data, created, headers, expire FROM cache_menu WHERE cid = '1:en'
  8 Query SELECT COUNT(*) FROM node_access WHERE nid = 0 AND ((gid = 0 AND realm = 'all')) AND grant_view »= 1
  8 Query SELECT COUNT(*) FROM node n WHERE n.promote = 1 AND n.status = 1
  8 Query SELECT n.nid, n.sticky, n.created FROM node n WHERE n.promote = 1 AND n.status = 1 ORDER BY n.sticky DESC, n.created DESC LIMIT 0, 10
  8 Query SELECT dst FROM url_alias WHERE src = 'rss.xml'
  8 Query SELECT * FROM system WHERE type = 'theme'
  8 Query SELECT dst FROM url_alias WHERE src = 'misc/feed.png'


  These queries were generated by the homepage URL: http://localhost/drupal/

  The drupal-slow.log will contain any queries that take 1 second or longer to run. It's partial content for the same URL:

/opt/lampp/sbin/mysqld, Version: 5.0.51a-log (Source distribution). started with:
Tcp port: 3306 Unix socket: /opt/lampp/var/mysql/mysql.sock
Time Id Command Argument
# Time: 080706 23:50:04
# User@Host: root[root] @ localhost []
# Query_time: 0 Lock_time: 0 Rows_sent: 11 Rows_examined: 46
use Drupal;
SELECT name, filename, throttle FROM system WHERE type = 'module' AND status = 1 ORDER BY weight ASC, filename ASC;
# User@Host: root[root] @ localhost []
# Query_time: 0 Lock_time: 0 Rows_sent: 1 Rows_examined: 35
SELECT * FROM system WHERE type = 'theme';


  Important lines are in bold. For the first SELECT statement it had a query time of 0, lock time of 0, the rows returned were 11, and the number of rows that were searched was 46 out of the system table. You'll notice that the query run time is 0, but we've configured MySQL to log all queries that take 1 second or longer to run, and log them in the slow query log. Why are 0 time queries in the slow query log? Remember we've also configured MySQL to log all queries not using an index. The slow query log will contain both, queries that took 1 second to run and queries that didn't use indexes. A slow running query looks like this:

# Time: 080709 17:04:02
# User@Host: root[root] @ localhost []
# Query_time: 1 Lock_time: 0 Rows_sent: 2 Rows_examined: 11
SELECT DISTINCT b.* FROM blocks b LEFT JOIN blocks_roles r ON b.module = r.module AND b.delta = r.delta WHERE b.theme = 'garland' AND b.status = 1 AND (r.rid IN (1) OR r.rid IS NULL) ORDER BY b.region, b.weight, b.module;


  It's important to note that queries listed in drupal-slow.log may not be the only queries that need optimization. It's entirely possible that a query that runs for less than 1 second, but are called several hundred times could also be a candidate for optimization. Drupal-slow.log is just one place to start in the optimization process. Both log files will get rather large, the drupal.log file contains 172 queries just to render the homepage! There are several tools available to help analyze all this data. We'll need to determine which queries on a running system consume the most resources. We'll combine the logging and http_load to generate some http traffic to fill the logs with results.

Logfile Analysis

  There are a few tools available to analyze the logfiles. We'll be using the mysql statement log analyzer to analyze both log files. Using http_load to create some http traffic:

$ http_load -parallel 10 -seconds 120 urls
389 fetches, 10 max parallel, 7.61662e+06 bytes, in 120.002 seconds
19580 mean bytes/connection
3.24162 fetches/sec, 63471 bytes/sec
msecs/connect: 0.181763 mean, 0.511 max, 0.046 min
msecs/first-response: 2855.27 mean, 3271.22 max, 2710.4 min
HTTP response codes:
  code 200 -- 389


The command generated a general logfile almost 6MB, and slow query logfile of 602K. This is too much information to manually process. An easy to use log analyzer is mysqlsla, and can be downloaded from http://hackmysql.com. This is a Perl script that requires the Perl DBI and DBD::mysql installed. It can be used to process both the general log and the slow query log. The command:

$ mysqlsla --explain --general drupal.log --flat --sort ce --socket mysql.sock -u root

The output from mysqlsla is rather lengthy, even simply trying to analyze the homepage. Here's the number one query:

$ mysqlsla --explain --general drupal.log --flat --sort ce --socket mysql.sock -u root
Reading general log 'drupal.log'.
38519 total queries, 33 unique.
Sorting by 'ce'.

__ 001 _______________________________________________________________________

Count : 11693 (30%)
Database : Drupal
Execution time: 1.000 ms
Count * Exec : 11.693 s
Rows (EXPLAIN): 1 produced, 1 read
EXPLAIN : 
  id: 1
  select_type: SIMPLE
  table: url_alias
  type: ref
  possible_keys: src
  key: src
  key_len: 386
  ref: const
  rows: 1
  Extra: Using where


select dst from url_alias where src = 'S' 


Going over the output:

Count: How many times the query was run.
Database: Self explanitory
Execution Time: How long the query took to run
Count * Exec: Total running time for this query, even with different WHERE clause.
Rows: From EXPLAIN
EXPLAIN: Output from EXPLAIN command.
Note: The WHERE clause can be confusing. The 'S' means the src is a string, 'N' would be integer/number.

EXPLAIN output:

id: identifies the SELECT statement
select_type: simple, subquery, derived, union or result
table: table or tables used in SELECT
type: join type
possible_keys: indexes which COULD be used
key_length: number of bytes used in index
ref: columns/constants being used from table(s)
rows: number of estimated scanned table rows
extra: additional info like using where, using filesort, using temporary

This is one query that doesn't show up in the slow query log. Looking at the number of times the query is run should be a red flag to investigate ways to make it faster. Mysqlsla is a great tool to help analyze queries. It combines EXPLAIN with other metrics to help organize large logs.

Benchmarking MySQL

The easiest way to benchmark the performance of a single query is using the BENCHMARK() function. This function can be used to test execution speed of a query. It's really great if you have a query that normally runs too fast to determine runtime. This query runs in 0.00 seconds:

mysql» SELECT dst FROM url_alias WHERE src = 'node/3705';
+-----------------+
| dst             |
+-----------------+
| node-3705-story | 
+-----------------+
1 row in set (0.00 sec)

mysql» 


Running it 10000000 times gives us:

mysql» set @sql := "SELECT dst FROM url_alias WHERE src = 'node/3705';";
Query OK, 0 rows affected (0.00 sec)

mysql» SELECT @sql;
+----------------------------------------------------+
| @sql |
+----------------------------------------------------+
| SELECT dst FROM url_alias WHERE src = 'node/3705'; | 
+----------------------------------------------------+
1 row in set (0.00 sec)

mysql» SELECT BENCHMARK(10000000, @sql);
+---------------------------+
| BENCHMARK(10000000, @sql) |
+---------------------------+
| 0                         | 
+---------------------------+
1 row in set (3.98 sec)

mysql»


It's important to be sure query caching is turned off or the results will be incorrect. BENCHMARK runs the query one after another, not in parallel. 

MySQL also includes an overall benchmarking suite. This suite doesn't target Drupal directly, but it's a great way to compare server systems, or general configuration changes to mysqld.  

Profiling MySQL

MySQL profiling is a technique used to see what's happening internally within MySQL. Having the ability to see what's happening within mysqld and the storage engines can help with understanding how queries impact the server. We'll analyze this query from the drupal.log:

mysql» SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid = v.vid WHERE r.nid = 3705 ORDER BY v.weight, t.weight, t.name;
+-----+-----+------------+---------------------------+--------+
| tid | vid | name | description | weight |
+-----+-----+------------+---------------------------+--------+
| 5 | 2 | dastaruhit | description of dastaruhit | 0 | 
| 24 | 2 | rora | description of rora | 0 | 
| 49 | 2 | vab | description of vab | 0 | 
| 20 | 2 | thavu | description of thavu | 1 | 
| 7 | 2 | mo | description of mo | 2 | 
+-----+-----+------------+---------------------------+--------+
5 rows in set (0.00 sec)

mysql» EXPLAIN SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid = v.vid WHERE r.nid = 3705 ORDER BY v.weight, t.weight, t.name\G

*************************** 1. row ***************************
  id: 1
  select_type: SIMPLE
  table: r
  type: ref
possible_keys: PRIMARY,nid,tid
  key: nid
  key_len: 4
  ref: const
  rows: 12
  Extra: Using temporary; Using filesort
*************************** 2. row ***************************
  id: 1
  select_type: SIMPLE
  table: t
  type: eq_ref
possible_keys: PRIMARY,vid
  key: PRIMARY
  key_len: 4
  ref: Drupal.r.tid
  rows: 1
  Extra: 
*************************** 3. row ***************************
  id: 1
  select_type: SIMPLE
  table: v
  type: eq_ref
possible_keys: PRIMARY
  key: PRIMARY
  key_len: 4
  ref: Drupal.t.vid
  rows: 1
  Extra: 
3 rows in set (0.00 sec)


The query used a temporary table and a filesort to produce the results. What kind of table? Was it in memory or on disk? What about sorting and indexes? Profiling will help answer questions like these.

To get started, flush all the STATUS variables:

mysql» FLUSH STATUS;
Query OK, 0 rows affected (0.00 sec)

mysql»


Run the query:

mysql» SELECT t.* FROM term_node r INNER JOIN term_data t ON r.tid = t.tid INNER JOIN vocabulary v ON t.vid = v.vid WHERE r.nid = 3705 ORDER BY v.weight, t.weight, t.name;

Determine what kind of temporary tables were created:

mysql» SHOW GLOBAL STATUS LIKE 'Created%';

+-------------------------+-------+

| Variable_name           | Value |

+-------------------------+-------+
| Created_tmp_disk_tables |     1 | 
| Created_tmp_files       |     0 | 
| Created_tmp_tables      |     4 | 
+-------------------------+-------+
3 rows in set (0.00 sec)

mysql»


This query created 1 temporary on disk table and 2 temporary in memory tables. How about sorting?

mysql» SHOW GLOBAL STATUS LIKE 'Sort%';

+-------------------+-------+

| Variable_name | Value |

+-------------------+-------+
| Sort_merge_passes |     0 | 
| Sort_range        |     0 | 
| Sort_rows         |     5 | 
| Sort_scan         |     1 | 
+-------------------+-------+
4 rows in set (0.00 sec)

mysql»


MySQL scanned a single table with 5 rows to produce the results. What about indexes?

mysql» SHOW GLOBAL STATUS LIKE 'Key%';
+------------------------+-------+
| Variable_name          | Value |
+------------------------+-------+
| Key_blocks_not_flushed |     0 | 
| Key_blocks_unused      | 13820 | 
| Key_blocks_used        |   678 | 
| Key_read_requests      |    21 | 
| Key_reads              |     0 | 
| Key_write_requests     |     0 | 
| Key_writes             |     0 | 
+------------------------+-------+
7 rows in set (0.01 sec)

mysql»


This query made 21 Key_read_requests but didn't use any indexes from the key cache, so it had to read the .MYI file to search through the index.

The are also Handler* status variables that show what the storage engine is doing too. This can become very involved, especially with the InnoDB storage engine.


Conclusion

The techniques of benchmarking and profiling MySQL helps in figuring out the bottlenecks in a system. The process is to create traffic, log queries, then analyze the logfiles to determine which queries need attention. Take those queries one by one, use EXPLAIN and "replay" them into the system and look at the various STATUS variables to see how they are impacting the server. In Part III, we'll go over some of the queries and tune the queries and mysqld. One thing that also needs to be kept in mind is Drupal profiling. There's the distinct possibility that we'll have some good queries, but Drupals interaction with MySQL is where the improvement may need to happen. Any server optimization will be left out.

  发表您的文章评论

您的姓名 (昵称)
标题:
评分: 很差一般较好很好
评论:
验证码:
请输入验证码