使用SQL高效查詢節點和邊-有解無憂

我有 2 個存盤網路資訊的 SQL Server 表，EF 架構是：

public partial class edge
{
    public long edge_id { get; set; }
    public string source { get; set; }
    public string target { get; set; }
    public Nullable<System.DateTime> edgedate { get; set; }
}

public partial class node
{
    public string node_id { get; set; }
    public string name { get; set; }
    public string address { get; set; }
}

我正在從 UI 傳遞特定于邊緣和節點的過濾器以構建到 SQL 查詢中，如下所示：

select * 
from [dbo].[Nodes] 
where name = 'John Doe' 
   or address = '123 Fake Street'  

select * 
from [dbo].[Edges] 
where edgedate >= '01/12/2020 00:00:00' 
  and edgedate <= '01/12/2021 23:59:59

然而，這些查詢必須考慮整個網路，即必須將節點過濾器應用于邊緣，反之亦然 -

-- nodes example with edge filters applied
select * 
from [dbo].[Nodes] 
where name = 'John Doe' 
   or address = '123 Fake Street'  
   and node_id in (select source 
                   from EDGESTEMP 
                   where edgedate >= '01/12/2020 00:00:00' 
                     and edgedate <= '01/12/2021 23:59:59'
                   union 
                   select target 
                   from EDGESTEMP 
                   where edgedate >= '01/12/2020 00:00:00' 
                     and edgedate <= '01/12/2021 23:59:59')

這在小規模網路上運行良好，但是如果我處理的網路有 100 萬條邊和 50 萬個節點，則運行這些查詢的性能會受到影響，因為檢查每個實體中的另一個表時的in陳述句。

我已經在查詢的所有附屬列上添加了索引，但是需要知道是否有更有效的方法來做到這一點？

附加資訊

查詢計劃 -這里

Clustered indexes are set on each primary key i.e. node_id and edge_id and non clustered set on the rest e.g. -

CREATE NONCLUSTERED INDEX [NonClusteredIndex-20211017-194859] ON [dbo].[NODES]
(
    [name] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO

uj5u.com熱心網友回復：

您可以使用exists來進行更有效的查詢，而不是使用union。

此外，您不應該使用不明確的日期文字。我不確定這些日期是 1 月 12 日還是 12 月 1 日。另外，對于日期時間范圍查詢，您不應該使用 >= 和 <=，而是使用 >= 和 <。您可以在代碼中看到這些調整：

select * 
from [dbo].[Nodes] n
where (name = 'John Doe' 
   or address = '123 Fake Street')  
   and exists (select * 
               from EDGESTEMP e
                   where (n.node_id = e.source or n.node_id = e.target)
                         and e.edgedate >= '20200112' 
                         and e.edgedate < '20210113');

順便說一句，我假設您已經在源、目標和邊緣日期上建立了索引。如果沒有創建它們。

uj5u.com熱心網友回復：

首先，您的查詢似乎有一個邏輯錯誤：or.

其次，UNION ALL通常比好UNION，盡管在半連接中通常無關緊要，例如IN或EXISTS

select n.* 
from [dbo].[Nodes] n
where (n.name = 'John Doe' 
       or n.address = '123 Fake Street')
 and node_id in (
    select source
    from EDGES 
    where edgedate >= '01/12/2020 00:00:00' 
      and edgedate <= '01/12/2021 23:59:59'
    union all
    select target 
    from EDGES
    where edgedate >= '01/12/2020 00:00:00' 
      and edgedate <= '01/12/2021 23:59:59'
);

最后，對于這個查詢，你可能應該有以下索引

NODES (name) INCLUDE (node_id)

NODES (address) INCLUDE (node_id)

EDGES (edgedate) INCLUDE (source, target)

這種or情況可能仍會導致問題，因為您可能仍會在上進行索引掃描NODES。如果是這樣，您可能需要重寫查詢以強制使用索引聯合。

select n.* 
from (
    select *
      from [dbo].[Nodes] n
      where n.name = 'John Doe'
    union
    select *
      from [dbo].[Nodes] n
      where n.address = '123 Fake Street'
) n
where node_id in (
    select source
    from EDGES 
    where edgedate >= '01/12/2020 00:00:00' 
      and edgedate <= '01/12/2021 23:59:59'
    union all
    select target 
    from EDGES
    where edgedate >= '01/12/2020 00:00:00' 
      and edgedate <= '01/12/2021 23:59:59'
);

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/328794.html

標籤：sql-server nodes edges

上一篇：如何分離SQL中WHERE子句的重疊結果？

下一篇：SQLCount條目，每個日期都有值