Maintenance
There are a few housekeeping tasks that should be done on a regular base:
- To update index key distributions: run
ANALYZE TABLE
once a day (e.g. right after daily backup-dump) on all InnoDB and MyISAM tables
- Tables will be locked during optimization (MyISAM: read-lock, InnoDB: write-lock)!
- Optimization will be replicated to slaves (good) unless told otherwise
- To defrag data, recover space and update index statistics: run
OPTIMIZE TABLE
once a month (e.g. right after backup-dump) on all InnoDB and MyISAM tables
- Tables will be locked during optimization!
- Optimization will be replicated to slaves (good) unless told otherwise
Index / Query structure
Looking at the index structure of the most important tables, three different classes of problems could be determined:
Missing indexes:
Analyzing the long running queries (i.e. using
innotop
) we found a number of them that could benefit from extra (or even dedicated) indexes. Of course one has to take into account the overhead to maintain these indexs during DML-statements like
INSERT
,
UPDATE
,
DELETE
. However, a BOINC project, in particular when using locality scheduling, is more often reading (
SELECT
) than writing, so the extra overhead is usually compensated by the improved read performance. One also has to consider that many columns without indexes only contain state information, hence can be indexed rather cheaply (index with small cardinality).
Index redundancy:
There were a couple of redundant indexes which means they could be dropped (or changed, see below) without affecting the optimizer negatively. Redundant indexes simply add index maintenance overhead without contributing any advantage. Consider the following example:
index1(a)
and
index2(a,b)
. In this case the
index1
can be dropped because
index2
already covers column
a
.
Important: for this to work, the respective column(s) has(have) to be the leftmost column(s)! An additional advantage could be that
index2
could serve as a so called
covering index when used in statement like this:
SELECT a, b FROM x WHERE a = m AND b = n
. In this case no data would have to be fetched from table
x
because
index2
already containes it - it covers the queried columns.
Index column order:
There were a number of multi-column indexes that shared the very same first column. Consider the following scenario of existing indexes:
index1(a,b)
,
index2(a,c)
,
index3(a,d)
. These three indexes most likely have been created because column
a
will be used in conjunction with either column
b
,
c
or
d
in different queries. However, one needs to know that the optimizer doesn't care about the column order in this case. That means that the optimizer will use
index1
for a query like
SELECT x FROM y WHERE a = m AND b = n
, regardless of the column order (
index1(a,b)
or
index1(b,a)
). Therefore it's better to reorder the remaining indexes
index2
and
index3
, hence avoiding index redundancy on column
a
and providing indexes of columns
c
and
d
without additional overhead. Please note that it can sometimes even be better to swap all three indexes and create a dedicated index for column
a
, but that obviously depends on the actual queries being issued (which columns are used in which contexts) and the actual cardinalities of the columns concerned.
Table: result
Original index structure
+--------+------------+-------------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------+------------+-------------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
| result | 0 | PRIMARY | 1 | id | A | 1454559 | NULL | NULL | | BTREE | |
| result | 0 | name | 1 | name | A | 1454559 | NULL | NULL | | BTREE | |
| result | 1 | res_wuid | 1 | workunitid | A | 1454559 | NULL | NULL | | BTREE | |
| result | 1 | res_app_state | 1 | appid | A | 9 | NULL | NULL | | BTREE | |
| result | 1 | res_app_state | 2 | server_state | A | 9 | NULL | NULL | | BTREE | |
| result | 1 | res_filedel | 1 | file_delete_state | A | 9 | NULL | NULL | | BTREE | |
| result | 1 | res_userid_id | 1 | userid | A | 69264 | NULL | NULL | | BTREE | |
| result | 1 | res_userid_id | 2 | id | A | 1454559 | NULL | NULL | | BTREE | |
| result | 1 | res_userid_val | 1 | userid | A | 63451 | NULL | NULL | | BTREE | |
| result | 1 | res_userid_val | 2 | validate_state | A | 132232 | NULL | NULL | | BTREE | |
| result | 1 | res_hostid_id | 1 | hostid | A | 161617 | NULL | NULL | | BTREE | |
| result | 1 | res_hostid_id | 2 | id | A | 1454559 | NULL | NULL | | BTREE | |
| result | 1 | res_wu_user | 1 | workunitid | A | 727279 | NULL | NULL | | BTREE | |
| result | 1 | res_wu_user | 2 | userid | A | 1454559 | NULL | NULL | | BTREE | |
| result | 1 | ind_res_st | 1 | server_state | A | 9 | NULL | NULL | | BTREE | |
| result | 1 | ind_res_st | 2 | priority | A | 9 | NULL | NULL | | BTREE | |
| result | 1 | res_st_create | 1 | server_state | A | 9 | NULL | NULL | | BTREE | |
| result | 1 | res_st_create | 2 | create_time | A | 727279 | NULL | NULL | | BTREE | |
| result | 1 | res_st_random | 1 | server_state | A | 9 | NULL | NULL | | BTREE | |
| result | 1 | res_st_random | 2 | random | A | 1454559 | NULL | NULL | | BTREE | |
| result | 1 | res_host_state | 1 | hostid | A | 96970 | NULL | NULL | | BTREE | |
| result | 1 | res_host_state | 2 | server_state | A | 161617 | NULL | NULL | | BTREE | |
+--------+------------+-------------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
Optimized index structure
Added three indexes:
-
outcome
(small cardinality)
-
client_state
(small cardinality)
-
received_time
(overhead compensated by drop of res_wuid
, see below)
Dropped redundant indexes:
-
res_wuid
(covered by res_wu_user
)
Changed index column order:
-
res_userid_val(userid,validate_state)
res_val_userid(validate_state,userid)
(userid
redundant and covered by res_userid_id
, important "free" additional index for validate_state
)
-
res_st_create(server_state,create_time)
res_create_server_state(create_time, server_state)
(server_state
redundant and covered by ind_res_st
, important "free" additional index for create_time
)
Possible changes (being conservative, not yet implemented):
- add
exit_status
- swap column order of
-
res_st_random
(server_state
redundant and covered by ind_res_st
)
Note: why index a column with random content, just adds overhead without benefit?! maybe it could even be dropped!?
-
ind_res_st
(this would require to also swap res_host_state
to have one index still covering server_state
! That'd be ok as hostid
is redundant anyway and covered by res_hostid_id
)
Important: ind_res_st
is used in an optimizer hint (FORCE INDEX
) in boinc_db.cpp
, is this really necessary (bad design, error-prone)?
+--------+------------+-------------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------+------------+-------------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
| result | 0 | PRIMARY | 1 | id | A | 1454559 | NULL | NULL | | BTREE | |
| result | 0 | name | 1 | name | A | 1454559 | NULL | NULL | | BTREE | |
| result | 1 | res_app_state | 1 | appid | A | 9 | NULL | NULL | | BTREE | |
| result | 1 | res_app_state | 2 | server_state | A | 9 | NULL | NULL | | BTREE | |
| result | 1 | res_filedel | 1 | file_delete_state | A | 9 | NULL | NULL | | BTREE | |
| result | 1 | res_userid_id | 1 | userid | A | 69264 | NULL | NULL | | BTREE | |
| result | 1 | res_userid_id | 2 | id | A | 1454559 | NULL | NULL | | BTREE | |
| result | 1 | res_val_userid | 1 | validate_state | A | 9 | NULL | NULL | | BTREE | |
| result | 1 | res_val_userid | 2 | userid | A | 132232 | NULL | NULL | | BTREE | |
| result | 1 | res_hostid_id | 1 | hostid | A | 161617 | NULL | NULL | | BTREE | |
| result | 1 | res_hostid_id | 2 | id | A | 1454559 | NULL | NULL | | BTREE | |
| result | 1 | res_wu_user | 1 | workunitid | A | 727279 | NULL | NULL | | BTREE | |
| result | 1 | res_wu_user | 2 | userid | A | 1454559 | NULL | NULL | | BTREE | |
| result | 1 | ind_res_st | 1 | server_state | A | 9 | NULL | NULL | | BTREE | |
| result | 1 | ind_res_st | 2 | priority | A | 9 | NULL | NULL | | BTREE | |
| result | 1 | res_create_server_state | 1 | create_time | A | 727279 | NULL | NULL | | BTREE | |
| result | 1 | res_create_server_state | 2 | server_state | A | 727279 | NULL | NULL | | BTREE | |
| result | 1 | res_st_random | 1 | server_state | A | 9 | NULL | NULL | | BTREE | |
| result | 1 | res_st_random | 2 | random | A | 1454559 | NULL | NULL | | BTREE | |
| result | 1 | res_host_state | 1 | hostid | A | 96970 | NULL | NULL | | BTREE | |
| result | 1 | res_host_state | 2 | server_state | A | 161617 | NULL | NULL | | BTREE | |
| result | 1 | res_outcome | 1 | outcome | A | 9 | NULL | NULL | | BTREE | |
| result | 1 | res_client_state | 1 | client_state | A | 9 | NULL | NULL | | BTREE | |
| result | 1 | res_recv_time | 1 | received_time | A | 727279 | NULL | NULL | | BTREE | |
+--------+------------+-------------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+
Table: workunit
Original index structure
+----------+------------+---------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------+------------+---------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+
| workunit | 0 | PRIMARY | 1 | id | A | 706229 | NULL | NULL | | BTREE | |
| workunit | 0 | name | 1 | name | A | 706229 | NULL | NULL | | BTREE | |
| workunit | 1 | wu_val | 1 | appid | A | 19 | NULL | NULL | | BTREE | |
| workunit | 1 | wu_val | 2 | need_validate | A | 19 | NULL | NULL | | BTREE | |
| workunit | 1 | wu_timeout | 1 | transition_time | A | 353114 | NULL | NULL | | BTREE | |
| workunit | 1 | wu_assim | 1 | appid | A | 19 | NULL | NULL | | BTREE | |
| workunit | 1 | wu_assim | 2 | assimilate_state | A | 19 | NULL | NULL | | BTREE | |
| workunit | 1 | wu_filedel | 1 | file_delete_state | A | 19 | NULL | NULL | | BTREE | |
+----------+------------+---------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+
Optimized index structure
Added three indexes:
-
canonical_resultid
-
mod_time
-
app_id
(required for following changes of index orders, small cardinality)
Changed index column order (
app_id
redundant, now covered by
wu_appid
):
-
wu_val(appid,need_validate)
wu_val(need_validate,appid)
("free" additional index for need_validate
)
-
wu_assim(appid,assimilate_state)
wu_assim(assimilate_state, appid)
("free" additional index for assimilate_state
)
+----------+------------+---------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------+------------+---------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+
| workunit | 0 | PRIMARY | 1 | id | A | 706229 | NULL | NULL | | BTREE | |
| workunit | 0 | name | 1 | name | A | 706229 | NULL | NULL | | BTREE | |
| workunit | 1 | wu_val | 1 | need_validate | A | 19 | NULL | NULL | | BTREE | |
| workunit | 1 | wu_val | 2 | appid | A | 19 | NULL | NULL | | BTREE | |
| workunit | 1 | wu_timeout | 1 | transition_time | A | 353114 | NULL | NULL | | BTREE | |
| workunit | 1 | wu_assim | 1 | assimilate_state | A | 19 | NULL | NULL | | BTREE | |
| workunit | 1 | wu_assim | 2 | appid | A | 19 | NULL | NULL | | BTREE | |
| workunit | 1 | wu_filedel | 1 | file_delete_state | A | 19 | NULL | NULL | | BTREE | |
| workunit | 1 | wu_cano_resid | 1 | canonical_resultid | A | 706229 | NULL | NULL | | BTREE | |
| workunit | 1 | wu_modtime | 1 | mod_time | A | 706229 | NULL | NULL | | BTREE | |
| workunit | 1 | wu_appid | 1 | appid | A | 16 | NULL | NULL | | BTREE | |
+----------+------------+---------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+
Table: host
Original index structure
+-------+------------+------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+
| host | 0 | PRIMARY | 1 | id | A | 1739946 | NULL | NULL | | BTREE | |
| host | 1 | host_user | 1 | userid | A | 869973 | NULL | NULL | | BTREE | |
| host | 1 | host_avg | 1 | expavg_credit | A | 1739946 | NULL | NULL | | BTREE | |
| host | 1 | host_tot | 1 | total_credit | A | 1739946 | NULL | NULL | | BTREE | |
+-------+------------+------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+
Optimized index structure
Added three indexes:
-
rpc_time
-
create_time
-
p_fpops
Modified index (added second column, helps scheduler):
-
host_user(userid)
host_user_cpid(userid, host_cpid)
+-------+------------+------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+
| host | 0 | PRIMARY | 1 | id | A | 2263839 | NULL | NULL | | BTREE | |
| host | 1 | host_avg | 1 | expavg_credit | A | 1131919 | NULL | NULL | | BTREE | |
| host | 1 | host_tot | 1 | total_credit | A | 2263839 | NULL | NULL | | BTREE | |
| host | 1 | host_rpc_time | 1 | rpc_time | A | 2263839 | NULL | NULL | | BTREE | |
| host | 1 | host_create_time | 1 | create_time | A | 2263839 | NULL | NULL | | BTREE | |
| host | 1 | host_fpops | 1 | p_fpops | A | 2263839 | NULL | NULL | | BTREE | |
| host | 1 | host_user_cpid | 1 | userid | A | 2263839 | NULL | NULL | | BTREE | |
| host | 1 | host_user_cpid | 2 | host_cpid | A | 2263839 | NULL | NULL | YES | BTREE | |
+-------+------------+------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+
Table: user
Original index structure
+-------+------------+---------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+---------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+
| user | 0 | PRIMARY | 1 | id | A | 423498 | NULL | NULL | | BTREE | |
| user | 0 | email_addr | 1 | email_addr | A | 423498 | NULL | NULL | | BTREE | |
| user | 0 | authenticator | 1 | authenticator | A | 423498 | NULL | NULL | YES | BTREE | |
| user | 1 | ind_tid | 1 | teamid | A | 138 | NULL | NULL | | BTREE | |
| user | 1 | user_name | 1 | name | A | 423498 | NULL | NULL | YES | BTREE | |
| user | 1 | user_tot | 1 | total_credit | A | 423498 | NULL | NULL | | BTREE | |
| user | 1 | user_avg | 1 | expavg_credit | A | 423498 | NULL | NULL | | BTREE | |
+-------+------------+---------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+
Optimized index structure
No changes so far...