Introduction
Using the tools and techniques from Part II, we'll look at the areas of Drupal that can be improved. We'll go over queries, indexes and mysql configuration. This will be based upon loading the home page only.
Initial LoadLooking at a single loading of the home page produces a total of 122 SELECT queries, 3 SET commands, and 2 UPDATES. Only 30 SELECTS are unique, so there's plenty of duplicated queries. The slow query log contains 5 queries when the system is loaded with 10 concurrent connections for a 30 second period. The queries in the slow query log could also be queries that are not using indexes. We'll start with these queries first.
Slow Query Logmysql» SELECT name, filename, throttle FROM system WHERE type = 'module' AND status = 1 ORDER BY weight ASC, filename ASC;
+----------------+-----------------------------------------------+----------+
| name | filename | throttle |
+----------------+-----------------------------------------------+----------+
| block | modules/block/block.module | 0 |
| color | modules/color/color.module | 0 |
| comment | modules/comment/comment.module | 0 |
| filter | modules/filter/filter.module | 0 |
| help | modules/help/help.module | 0 |
| menu | modules/menu/menu.module | 0 |
| node | modules/node/node.module | 0 |
| system | modules/system/system.module | 0 |
| taxonomy | modules/taxonomy/taxonomy.module | 0 |
| user | modules/user/user.module | 0 |
| watchdog | modules/watchdog/watchdog.module | 0 |
| devel_generate | sites/all/modules/devel/devel_generate.module | 0 |
+----------------+-----------------------------------------------+----------+
12 rows in set (0.00 sec)
mysql» EXPLAIN SELECT name, filename, throttle FROM system WHERE type = 'module' AND status = 1 ORDER BY weight ASC, filename ASC\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: system
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 35
Extra: Using where; Using filesort
1 row in set (0.00 sec)
mysql» SHOW CREATE TABLE system;
CREATE TABLE `system` (
`filename` varchar(255) NOT NULL default '',
`name` varchar(255) NOT NULL default '',
`type` varchar(255) NOT NULL default '',
`description` varchar(255) NOT NULL default '',
`status` int(11) NOT NULL default '0',
`throttle` tinyint(4) NOT NULL default '0',
`bootstrap` int(11) NOT NULL default '0',
`schema_version` smallint(6) NOT NULL default '-1',
`weight` int(11) NOT NULL default '0',
PRIMARY KEY (`filename`),
KEY `weight` (`weight`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
mysql»
We can see that this query isn't using any indexes. The EXPLAIN output shows NULL for possible_keys, key, key_len and ref. This table contains 35 rows, so it's doing a full table scan to return the result. Benchmarking this query produces:
mysql» SET @sql := "SELECT name, filename, throttle FROM system WHERE type = 'module' AND status = 1 ORDER BY weight ASC, filename ASC;" ;
Query OK, 0 rows affected (0.01 sec)
mysql» SELECT BENCHMARK(1000000, @sql);
+--------------------------+
| BENCHMARK(1000000, @sql) |
+--------------------------+
| 0 |
+--------------------------+
1 row in set (0.76 sec)
mysql»
mysql» CREATE INDEX idx_type_status on system (type, status);
Query OK, 35 rows affected (0.01 sec)
Records: 35 Duplicates: 0 Warnings: 0
mysql» SELECT BENCHMARK(1000000, @sql);
+--------------------------+
| BENCHMARK(1000000, @sql) |
+--------------------------+
| 0 |
+--------------------------+
1 row in set (0.66 sec)
Creating an index on the type and status column helped by 13% . Instead of a full table scan, EXPLAIN reports only 12 rows accessed:
mysql» EXPLAIN SELECT name, filename, throttle FROM system WHERE type = 'module' AND status = 1 ORDER BY weight ASC, filename ASC\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: system
type: ref
possible_keys: idx_type_status
key: idx_type_status
key_len: 771
ref: const,const
rows: 12
Extra: Using where; Using filesort
1 row in set (0.00 sec)
mysql»
Another speedup can be realized by removing the ORDER BY so MySQL doesn't have to create a temporary table, then having PHP sort the data. There would be a trade off with this approach that overall benchmarking would need to be used.
mysql» set @sql1 = "SELECT name, filename, throttle FROM system WHERE type = 'module' AND status = 1;";
Query OK, 0 rows affected (0.00 sec)
mysql» SELECT BENCHMARK(1000000, @sql1);
+---------------------------+
| BENCHMARK(1000000, @sql1) |
+---------------------------+
| 0 |
+---------------------------+
1 row in set (0.52 sec)
mysql»
Covering indexes can't be created due to MySQL's index limit of 1000 bytes. In looking at the table, there could be ways to change the table schema to fit all the columns needed for a covering index.
The next query:
mysql» SELECT * FROM system WHERE type = 'theme'\G
*************************** 1. row ***************************
filename: themes/garland/page.tpl.php
name: garland
type: theme
description: themes/engines/phptemplate/phptemplate.engine
status: 1
throttle: 0
bootstrap: 0
schema_version: 0
weight: 0
1 row in set (0.00 sec)
mysql» EXPLAIN SELECT * FROM system WHERE type = 'theme'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: system
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 35
Extra: Using where
1 row in set (0.00 sec)
mysql» set @sql := "EXPLAIN SELECT * FROM system WHERE type = 'theme';";
Query OK, 0 rows affected (0.00 sec)
mysql» SELECT BENCHMARK(1000000, @sql);
+--------------------------+
| BENCHMARK(1000000, @sql) |
+--------------------------+
| 0 |
+--------------------------+
1 row in set (0.65 sec)
Since this query has the type column listed first, it can use the exact same index that was created for the first query:
mysql» EXPLAIN SELECT * FROM system WHERE type = 'theme'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: system
type: ref
possible_keys: idx_type_status
key: idx_type_status
key_len: 767
ref: const
rows: 1
Extra: Using where
1 row in set (0.00 sec)
mysql»
There's also another query that's very close to the two above:
mysql» SELECT * FROM system WHERE type = 'theme' AND name = 'garland'\G
*************************** 1. row ***************************
filename: themes/garland/page.tpl.php
name: garland
type: theme
description: themes/engines/phptemplate/phptemplate.engine
status: 1
throttle: 0
bootstrap: 0
schema_version: 0
weight: 0
1 row in set (0.00 sec)
mysql»
mysql» EXPLAIN SELECT * FROM system WHERE type = 'theme' AND name = 'garland'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: system
type: ref
possible_keys: idx_type_status
key: idx_type_status
key_len: 767
ref: const
rows: 1
Extra: Using where
1 row in set (0.00 sec)
mysql»
By careful query design, it's possible to have a single index support more than one query type. MySQL supports more than one index on a table, but only one index can be used at a time. Having many indexes on a table can impact UPDATES and INSERTS, so minimizing the number of indexes per table will help the performance of writing to tables.
Another approach to these queries would be to use a single query that returns the most data. All the queries above have the exact same result set returned showing that the other two queries are duplicated and may be eliminated.
Next query:
mysql» SELECT nt.type, nt.* FROM node_type nt ORDER BY nt.type ASC\G
*************************** 1. row ***************************
type: page
type: page
name: Page
module: node
description: If you want to add a static page, like a contact page or an about page, use a page.
help:
has_title: 1
title_label: Title
has_body: 1
body_label: Body
min_word_count: 0
custom: 1
modified: 1
locked: 0
orig_type: page
*************************** 2. row ***************************
type: story
type: story
name: Story
module: node
description: Stories are articles in their simplest form: they have a title, a teaser and a body, but can be extended by other modules. The teaser is part of the body too. Stories may be used as a personal blog or for news articles.
help:
has_title: 1
title_label: Title
has_body: 1
body_label: Body
min_word_count: 0
custom: 1
modified: 1
locked: 0
orig_type: story
2 rows in set (0.00 sec)
This particular query will always do a full table scan. It's returning the same column twice, type. That could also be removed to minimize the amount of data that needs to get sent back to the client. The nt.* already has the type column in it. In general, when I see any SELECT * type queries, I always wonder if all columns are really needed. If not, columns should be specified so the correct indexes can be created for that table.
Final query in the slow query log:
mysql» SELECT DISTINCT b.* FROM blocks b LEFT JOIN blocks_roles r ON b.module = r.module AND b.delta = r.delta WHERE b.theme = 'garland' AND b.status = 1 AND (r.rid IN (2) OR r.rid IS NULL) ORDER BY b.region, b.weight, b.module\G
*************************** 1. row ***************************
module: user
delta: 0
theme: garland
status: 1
weight: 0
region: left
custom: 0
throttle: 0
visibility: 0
pages:
title:
*************************** 2. row ***************************
module: user
delta: 1
theme: garland
status: 1
weight: 0
region: left
custom: 0
throttle: 0
visibility: 0
pages:
title:
2 rows in set (0.01 sec)
mysql» EXPLAIN SELECT DISTINCT b.* FROM blocks b LEFT JOIN blocks_roles r ON b.module = r.module AND b.delta = r.delta WHERE b.theme = 'garland' AND b.status = 1 AND (r.rid IN (2) OR r.rid IS NULL) ORDER BY b.region, b.weight, b.module\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: r
type: system
possible_keys: PRIMARY
key: NULL
key_len: NULL
ref: NULL
rows: 0
Extra: const row not found
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: b
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 7
Extra: Using where; Using temporary; Using filesort
2 rows in set (0.00 sec)
mysql» set @sql := "SELECT DISTINCT b.* FROM blocks b LEFT JOIN blocks_roles r ON b.module = r.module
» AND b.delta = r.delta WHERE b.theme = 'garland' AND b.status = 1
» AND (r.rid IN (2) OR r.rid IS NULL) ORDER BY b.region, b.weight, b.module;";
mysql» SELECT BENCHMARK(1000000, @sql);
+--------------------------+
| BENCHMARK(1000000, @sql) |
+--------------------------+
| 0 |
+--------------------------+
1 row in set (0.72 sec)
Putting an index on blocks.theme and blocks.status gives a slight improvement:
mysql» CREATE INDEX idx on blocks (theme, status);
Query OK, 7 rows affected (0.00 sec)
Records: 7 Duplicates: 0 Warnings: 0
mysql» SELECT BENCHMARK(1000000, @sql);
+--------------------------+
| BENCHMARK(1000000, @sql) |
+--------------------------+
| 0 |
+--------------------------+
1 row in set (0.69 sec)
mysql» EXPLAIN SELECT DISTINCT b.* FROM blocks b LEFT JOIN blocks_roles r ON b.module = r.module AND b.delta = r.delta WHERE b.theme = 'garland' AND b.status = 1 AND (r.rid IN (2) OR r.rid IS NULL) ORDER BY b.region, b.weight, b.module\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: r
type: system
possible_keys: PRIMARY
key: NULL
key_len: NULL
ref: NULL
rows: 0
Extra: const row not found
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: b
type: ref
possible_keys: idx
key: idx
key_len: 768
ref: const,const
rows: 2
Extra: Using where; Using temporary; Using filesort
2 rows in set (0.00 sec)
mysql»
Eliminating the ORDER BY and having PHP do the sorting could also make an improvement. This query is rather interesting because it's joining two tables together but it's not using any columns from the second table. This query returns the same result set:
mysql» SELECT DISTINCT b.* FROM blocks b WHERE b.theme = 'garland' AND b.status = 1 ORDER BY b.region, b.weight, b.module\G
*************************** 1. row ***************************
module: user
delta: 0
theme: garland
status: 1
weight: 0
region: left
custom: 0
throttle: 0
visibility: 0
pages:
title:
*************************** 2. row ***************************
module: user
delta: 1
theme: garland
status: 1
weight: 0
region: left
custom: 0
throttle: 0
visibility: 0
pages:
title:
2 rows in set (0.01 sec)
By eliminating the JOIN to the blocks_role table a 14% increase in performance is seen:
mysql» SELECT BENCHMARK(1000000, @sql);
+--------------------------+
| BENCHMARK(1000000, @sql) |
+--------------------------+
| 0 |
+--------------------------+
1 row in set (0.59 sec)
That's the end for the slow query log.
General LogStarting with the General log, the query that's run the most is:
select dst from url_alias where src = 'S', where S is a string. This query is run 42 times in a single connection, and for some reason Drupal is creating queries for data that doesn't exist. This could be due to my test data being incomplete.
Benchmarking:
mysql» set @sql := "SELECT dst FROM url_alias WHERE src = 'node/3705';";
Query OK, 0 rows affected (0.00 sec)
mysql» SELECT BENCHMARK(1000000, @sql);
+--------------------------+
| BENCHMARK(1000000, @sql) |
+--------------------------+
| 0 |
+--------------------------+
1 row in set (0.74 sec)
What's going on:
mysql» EXPLAIN SELECT dst FROM url_alias WHERE src = 'node/3705'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: url_alias
type: ref
possible_keys: src
key: src
key_len: 386
ref: const
rows: 1
Extra: Using where
1 row in set (0.00 sec)
Using a covering index is possible on this table:
mysql» CREATE INDEX idx on url_alias (src,dst);
Query OK, 3706 rows affected (0.11 sec)
Records: 3706 Duplicates: 0 Warnings: 0
mysql» EXPLAIN SELECT dst FROM url_alias WHERE src = 'node/3705'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: url_alias
type: ref
possible_keys: idx
key: idx
key_len: 386
ref: const
rows: 1
Extra: Using where; Using index
1 row in set (0.00 sec)
mysql»
The Using index in the Extra column means the dataset can be returned from the index alone, eliminating the need to use a table. It benchmarks:
mysql» SELECT BENCHMARK(1000000, @sql);
+--------------------------+
| BENCHMARK(1000000, @sql) |
+--------------------------+
| 0 |
+--------------------------+
1 row in set (0.63 sec)
mysql»
Showing almost 15% improvement. The biggest improvement would be to consolidate into a single query using IN. As an example:
mysql» SELECT dst FROM url_alias WHERE src IN ('node/3705','node/3706','node/3673','node/3672');
It Benchmarks:
mysql» SELECT BENCHMARK(1000000, @sql);
+--------------------------+
| BENCHMARK(1000000, @sql) |
+--------------------------+
| 0 |
+--------------------------+
1 row in set (0.66 sec)
This is a little slower than just the WHERE version, but if you multiply the 4 seperate queries using WHERE:
0.63x4 = 2.52 seconds
The improvement would be about 71% increase in performance in this example.
The next frequently called query:
select data, created, headers, expire from cache_filter where cid = 'S' .
mysql» EXPLAIN SELECT data, created, headers, expire FROM cache_filter WHERE cid='1:16a45438cd9716a31fa9fbf24badbb8a'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: cache_filter
type: const
possible_keys: PRIMARY
key: PRIMARY
key_len: 767
ref: const
rows: 1
Extra:
1 row in set (0.00 sec)
mysql» show create table cache_filter\G
*************************** 1. row ***************************
Table: cache_filter
Create Table: CREATE TABLE `cache_filter` (
`cid` varchar(255) NOT NULL default '',
`data` longblob,
`expire` int(11) NOT NULL default '0',
`created` int(11) NOT NULL default '0',
`headers` text,
PRIMARY KEY (`cid`),
KEY `expire` (`expire`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
mysql»
On the surface, this query looks pretty good. Upon inspecting the table schema, the only improvement I can see is to reduce the size of the cid field. This field is a MD5 checksum, with some additional data prepended consuming 35 CHARS. The cid field is VARCHAR(255), but should be CHAR(36) instead. With the longblobs and text fields, covering indexes can't be created. Since this query doesn't require these fields in the WHERE clause, changing the table to move those fields to a separate table might improve performance, but at the cost of rewriting the SQL statement and PHP code to reflect those changes.
ConclusionI've picked the top queries, and have shown a few schema optimizations to improve Drupal performance. The same process is used for the remainder of the queries, and since this paper is already about 13 pages long I thought this was enough to demonstrate what can be done. I'd say a conservative performance increase of around 10% can be realized for the opening anonymous homepage.