同時應用兩個條件時查詢變慢-有解無憂

我有一個users帶有bio欄位的表，并且通過該表與自身有“n：n”關系，followers因此每個用戶U都可以關注許多其他用戶。我的用戶搜索查詢非常慢。

所有查詢都會獲得前 20 個搜索結果 ( limit 20)。
搜索簡歷中包含“創始人”的用戶需要 0.3 秒。
搜索關注 X 的用戶需要 0.03 秒。
搜索在他們的簡歷中有“創始人”并關注 X 的用戶需要 118 秒。

查詢兩個過濾器：

select distinct `twitter_user`.`id`
from `twitter_user`
         join `twitter_user_follower`
              on (
                          `twitter_user_follower`.`follower_twitter_user_id` =
                          `twitter_user`.`id`
                      and `twitter_user_follower`.`twitter_user_id` = 4899565692
                      and `twitter_user_follower`.`follower_download_id` = 7064
                  )
where MATCH(twitter_user.description) AGAINST('founder')
limit 20 offset 0

表定義：

CREATE TABLE `twitter_user` (
  `id` bigint NOT NULL,
  `name` varchar(128) NOT NULL,
  `email` varchar(128) DEFAULT NULL,
  `screen_name` varchar(128) DEFAULT NULL,
  `location` varchar(256) DEFAULT NULL,
  `description` varchar(512) DEFAULT NULL,
  `url` varchar(256) DEFAULT NULL,
  `is_protected` bit(1) DEFAULT NULL,
  `followers_count` int DEFAULT NULL,
  `is_verified` bit(1) DEFAULT NULL,
  `friends_count` int DEFAULT NULL,
  `created_at` bigint DEFAULT NULL,
  `favourites_count` int DEFAULT NULL,
  `utc_offset` int DEFAULT NULL,
  `time_zone` varchar(128) DEFAULT NULL,
  `statuses_count` int DEFAULT NULL,
  `profile_image_url` varchar(512) DEFAULT NULL,
  `internal_json` json DEFAULT NULL,
  `row_timestamp` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `updated_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `twitter_user_username_index` (`screen_name`),
  KEY `twitter_user_ts` (`row_timestamp`),
  FULLTEXT KEY `twitter_user_description_ft_index` (`description`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

CREATE TABLE `twitter_user_follower` (
  `id` bigint NOT NULL AUTO_INCREMENT,
  `twitter_user_id` bigint NOT NULL,
  `follower_twitter_user_id` bigint NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `follower_download_id` bigint DEFAULT NULL,
  `updated_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `twitter_user_follower_twitter_user_id_index` (`twitter_user_id`),
  KEY `twitter_user_follower_follower_download_id_index` (`follower_download_id`),
  KEY `tuf_twitter_user_follower_download_key` (`twitter_user_id`,`follower_download_id`,`follower_twitter_user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=68494675 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

Explain輸出：

ID	選擇型別	桌子	磁區	型別	可能的鍵	鑰匙	key_len	參考	行	過濾	額外的
1	簡單的	twitter_user	無效的	全文	PRIMARY,twitter_user_username_index,twitter_user_ts,twitter_user_description_ft_index	twitter_user_description_ft_index	0	常量	1	100.00	使用哪里；ft_hints: no_ranking; 使用臨時
1	簡單的	twitter_user_follower	無效的	參考	twitter_user_follower_twitter_user_id_index,twitter_user_follower_follower_download_id_index,tuf_twitter_user_follower_download_key	tuf_twitter_user_follower_download_key	25	常量，常量，si_data_db.twitter_user.id	1	100.00	使用索引；清楚的

樹輸出：

-> Limit: 20 row(s)  (cost=4.77..4.77 rows=1)
-> Table scan on <temporary>  (cost=2.51..2.51 rows=1)
    -> Temporary table with deduplication  (cost=4.77..4.77 rows=1)
        -> Limit table size: 20 unique row(s)
            -> Nested loop inner join  (cost=2.16 rows=1)
                -> Filter: (match twitter_user.`description` against (''founder''))  (cost=1.06 rows=1)
                    -> Full-text index search on twitter_user using twitter_user_description_ft_index (description=''founder'')  (cost=1.06 rows=1)
                -> Limit: 1 row(s)  (cost=1.10 rows=1)
                    -> Covering index lookup on twitter_user_follower using tuf_twitter_user_follower_download_key (twitter_user_id=4899565692, follower_download_id=7064, follower_twitter_user_id=twitter_user.id)  (cost=1.10 rows=1)

這個查詢仍然很慢：

SELECT `follower`.`follower_twitter_user_id`
FROM (
         SELECT `follower_twitter_user_id`
         FROM `twitter_user_follower`
         WHERE `twitter_user_id` = 4899565692
           AND `follower_download_id` = 7440
     ) AS follower
         JOIN `twitter_user` ON `follower`.`follower_twitter_user_id` =  `twitter_user`.`id`
WHERE MATCH(twitter_user.description) AGAINST(' founder' IN BOOLEAN MODE)
limit 20 offset 0;

Explain輸出：

ID	選擇型別	桌子	磁區	型別	可能的鍵	鑰匙	key_len	參考	行	過濾	額外的
1	簡單的	twitter_user	無效的	全文	主要，twitter_user_description_ft_index	twitter_user_description_ft_index	0	常量	1	100.00	使用哪里；ft_hints: no_ranking
1	簡單的	twitter_user_follower	無效的	參考	twitter_user_follower_twitter_user_id_index,twitter_user_follower_follower_download_id_index,tuf_twitter_user_follower_download_key	tuf_twitter_user_follower_download_key	25	常量，常量，si_data_db.twitter_user.id	1	100.00	使用索引

Explain分析輸出：

-> Limit: 20 row(s)  (cost=2.16 rows=1) (actual time=3779.933..91032.297 rows=20 loops=1)
    -> Nested loop inner join  (cost=2.16 rows=1) (actual time=3779.932..91032.285 rows=20 loops=1)
        -> Filter: (match twitter_user.`description` against (' founder' in boolean mode))  (cost=1.06 rows=1) (actual time=94.166..90001.280 rows=198818 loops=1)
            -> Full-text index search on twitter_user using twitter_user_description_ft_index (description=' founder')  (cost=1.06 rows=1) (actual time=94.163..89909.371 rows=198818 loops=1)
        -> Covering index lookup on twitter_user_follower using tuf_twitter_user_follower_download_key (twitter_user_id=4899565692, follower_download_id=7440, follower_twitter_user_id=twitter_user.id)  (cost=1.10 rows=1) (actual time=0.005..0.005 rows=0 loops=198818)

users表是 125GB，followers表是磁盤上的 5GB。如果我將查詢轉換為兩個連接的選擇，它會在 45 秒內運行：

select t1.id from
(select follower_twitter_user_id as id from `twitter_user_follower`
 where (
                   `twitter_user_follower`.`twitter_user_id` = 4899565692
               and `twitter_user_follower`.`follower_download_id` = 8039
           )) t1
inner join
(
    select `twitter_user`.`id`
    from `twitter_user` where MATCH(twitter_user.description) AGAINST(' create' IN BOOLEAN MODE)
) t2 on t1.id = t2.id
limit 20 offset 0

Explain輸出：

 -> Limit: 20 row(s)  (cost=2.18 rows=1)
    -> Nested loop inner join  (cost=2.18 rows=1)
        -> Filter: (match twitter_user.`description` against (' create' in boolean mode))  (cost=1.08 rows=1)
            -> Full-text index search on twitter_user using twitter_user_description_ft_index (description=' create')  (cost=1.08 rows=1)
        -> Covering index lookup on twitter_user_follower using tuf_twitter_user_follower_download_key (twitter_user_id=4899565692, follower_download_id=8039, follower_twitter_user_id=twitter_user.id)  (cost=1.10 rows=1)

為什么運行需要 45 秒？

uj5u.com熱心網友回復：

試試下面的。改變

MATCH(twitter_user.description) AGAINST('founder')

至

MATCH(twitter_user.description) AGAINST(' founder' IN BOOLEAN MODE)

此外，這DISTINCT可能不是必需的。

奧德庫？

埋在評論中，我看到了一個DELETE INSERT，這在部分表格中造成了很多流失。

InnoDB 的 FULLTEXT在這種情況下可能效率不高
如果大多數行沒有改變，那么洗掉插入是低效的，并且會導致比可能需要的更多的流失。

看看INSERT ... ON DUPLICATE KEY UPDATE ...能不能用那個來代替delete insert。如果大多數行沒有改變，那么這可能會更快，并且可能對全文索引等內容的影響更小。

如果該 Delete 確實洗掉了一些行，那么 IODKU (upsert) 就不夠了。使用類似的第二遍INSERT ... SELECT ... LEFT JOIN可能是插入“新”行的解決方案。（我在這里在不同的背景關系中提到了這一點：規范化；參見 SQL#1。）

定期（每周？）運行OPTIMIZE TABLE. 但是請保留一些時間，看看這一步是否真的有幫助。

2個步驟

首先，我仍然不清楚您每小時收到的資料。它只是關于一個用戶的資訊嗎？它是否包括要洗掉的行，并帶有一些指示它們將被洗掉而不是更新的指示？等等。

如果是單用戶...

DELETE只有需要洗掉的行。這涉及到一個多表洗掉，LEFT JOIN以查看缺少的內容。
INSERT ... SELECT ... LEFT JOIN ...插入或更新現有行。

uj5u.com熱心網友回復：

你能試試這個并將解釋發布給我們嗎？

SELECT `follower`.`follower_twitter_user_id`
FROM (
  SELECT `follower_twitter_user_id`
  FROM `twitter_user_follower`
  WHERE `twitter_user_id` = 4899565692
    AND `follower_download_id` = 7064
) AS follower
JOIN `twitter_user` ON `follower`.`follower_twitter_user_id` =  `twitter_user`.`id`
MATCH(twitter_user.description) AGAINST(' founder' IN BOOLEAN MODE)
limit 20 offset 0;

uj5u.com熱心網友回復：

一個嘗試的選項，以盡量減少開銷并最大限度地增加任何短路

SELECT
   `twitter_user`.`id`
FROM
  `twitter_user`
WHERE
  MATCH(twitter_user.description) AGAINST (' founder' IN BOOLEAN MODE)
  AND
  EXISTS (
    SELECT
      *
    FROM
      `twitter_user_follower`
    WHERE
          `follower_twitter_user_id` = `twitter_user`.`id
      AND `twitter_user_id` = 4899565692
      AND `follower_download_id` = 7064
  )
LIMIT 20
OFFSET 0

轉載請註明出處，本文鏈接：https://www.uj5u.com/shujuku/514358.html

標籤：mysqlsql表现查询优化

上一篇：C#.Net6存盤庫模式對服務器資源的性能影響

下一篇：周末下雨/晴天/雨天的mongo請求的表現