Linux kernel的中斷子系統
文章目錄
- Linux kernel的中斷子系統
- 1、GICV3簡介
- 1.1 GICv3定義的4種中斷型別:
- 1.2 中斷號
- 1.3 GICV3編程模型
- 1.4 GICV3中斷親和度路由
- 2、GIC的設備樹描述
- 3、ARM GICV3代碼分析
- 3.1 irq chip driver宣告
- 3.2 gic_of_init流程
- 3.3 gic_init_bases流程
- 4、中斷域irq_domain以及中斷映射
- 4.1 irq_domain
- 4.2 中斷映射的完整程序
- 5、ARM64中斷處理程序
- 5.1 ARM64底層中斷處理流程(GICV3和匯編部分)
- 5.1.1 例外向量表
- 5.1.2 linux內核底層中斷處理流程(匯編部分)
- 5.1.2.1 保存中斷背景關系
- 5.1.2.2 恢復中斷背景關系
- 5.2 ARM64高層中斷處理(C語言部分)
- 5.2.1 el1_irq
- 5.2.2 gic_handle_irq
- 6、中斷函式的注冊
- 7、中斷的上下半部機制(TODO)
- 8、ITS介紹以及代碼分析
- 8.1 ITS概述
- 8.2 The ITS table
- 8.3 The ITS Command
- 8.4 ITS代碼分析
- 8.4.1 ITS資料結構介紹
- 8.4.2 ITS的初始化
- 8.4.2.1 its初始化函式its_init
- 8.4.2.2 its_cpu_init
- 8.4.3 its中斷上報
- 參考資料
內核版本:5.10
中斷控制器:gicv3
1、GICV3簡介
1.1 GICv3定義的4種中斷型別:
SPI,共享外設中斷,該中斷來源于外設,但是該中斷可以對所有的cpu core有效,
PPI,私有外設中斷,中斷來源于外設,但是該中斷只對指定的core有效,
SGI,軟中斷,用于給其它的core發送中斷信號,
LPI,(Locality-specific Peripheral Interrupt) ,自定義外設中斷,這個是gicv3特有的中斷,
In particular, LPIs are always message-based interrupts, and their configuration is held in tables in memory rather than registers.
NOTE: LPIs are only supported when GICD_CTLR.ARE_NS==1.
1.2 中斷號

1.3 GICV3編程模型

對應的暫存器編程介面描述如下:(參考GICv3_Software_Overview_Official_Release_B)
Distributor (GICD_*)
The Distributor registers are memory-mapped, and contain global settings that affect all PEs
connected to the interrupt controller. The Distributor provides a programming interface for:
? Interrupt prioritization and distribution of SPIs.
? Enabling and disabling SPIs.
? Setting the priority level of each SPI.
? Routing information for each SPI.
? Setting each SPI to be level-sensitive or edge-triggered.
? Generating message-based SPIs.
? Controlling the active and pending state of SPIs.
? Controls to determine the programmers’ model that is used in each Security state (affinity
routing or legacy).
Redistributors (GICR_*)
For each connected PE there is a Redistributor. The Redistributors provides a programming
interface for:
? Enabling and disabling SGIs and PPIs.
? Setting the priority level of SGIs and PPIs.
? Setting each PPI to be level-sensitive or edge-triggered.
? Assigning each SGI and PPI to an interrupt group.
? Controlling the state of SGIs and PPIs.
? Base address control for the data structures in memory that support the associated
interrupt properties and pending state for LPIs.
? Power management support for the connected PE.
CPU interfaces (ICC_*_ELn)—(non memory-mapped)
Each Redistributor is connected to a CPU interface. The CPU interface provides a programming
interface for:
? General control and configuration to enable interrupt handling.
? Acknowledging an interrupt.
? Performing a priority drop and deactivation of interrupts.
? Setting an interrupt priority mask for the PE.
? Defining the preemption policy for the PE.
? Determining the highest priority pending interrupt for the PE.
In GICv3 the CPU Interface registers are accessed as System registers (ICC_*_ELn).
Software must enable the System register interface before using these registers. This is controlled by the SRE bit in the ICC_SRE_ELn registers, where “n” specifies the Exception level (EL1-EL3). (這里表明GICV3,CPU interfaces對應的暫存器是作為CPU內部的系統暫存器去訪問的,而不是通過memory mapped去訪問的)
NOTE: In GICv1 and GICv2 the CPU Interface registers were memory mapped (GICC_*).
NOTE: Software can check for GIC System register support by reading ID_AA64PFR0_EL1 for the PE, see ARM? Architecture Reference Manual, ARMv8, for ARMv8-A architecture profile for details.
1.4 GICV3中斷親和度路由

gicv3使用hierarchy來標識一個具體的cpu core,類似于ipv4,<affinity level 3>.<affinity level 2>.<affinity level 1>.<affinity level 0> 組成一個PE的路由,
每一個core的affinity值可以通過MPDIR_EL1暫存器獲取, 每一個affinity占用8bit,
配置對應core的MPIDR值,可以將中斷路由到該core上,
四個等級affinity的定義是根據SOC自己的定義,
比如可能affinity3代表socketid,affinity2 代表clusterid, affnity1代表coreid, affnity0代表thread id,
2、GIC的設備樹描述
(1)參考rk3399 SDK下面的interrupt controller描述檔案
rk3399_kernel/kernel/Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.txt
(2)參考標準內核里面的interrupt controller描述檔案
kernel/Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.yaml
一個gicv3定義的例子如下:
gic: interrupt-controller@2c010000 {
compatible = "arm,gic-v3";
#interrupt-cells = <4>;
#address-cells = <2>;
#size-cells = <2>;
ranges;
interrupt-controller;
redistributor-stride = <0x0 0x40000>; // 256kB stride
#redistributor-regions = <2>;
reg = <0x0 0x2c010000 0 0x10000>, // GICD
<0x0 0x2d000000 0 0x800000>, // GICR 1: CPUs 0-31
<0x0 0x2e000000 0 0x800000>; // GICR 2: CPUs 32-63
<0x0 0x2c040000 0 0x2000>, // GICC
<0x0 0x2c060000 0 0x2000>, // GICH
<0x0 0x2c080000 0 0x2000>; // GICV
interrupts = <1 9 4>;
gic-its@2c200000 {
compatible = "arm,gic-v3-its";
msi-controller;
#msi-cells = <1>;
reg = <0x0 0x2c200000 0 0x20000>;
};
gic-its@2c400000 {
compatible = "arm,gic-v3-its";
msi-controller;
#msi-cells = <1>;
reg = <0x0 0x2c400000 0 0x20000>;
};
};
- compatible: 用于匹配GICv3驅動
- #interrupt-cells: 這是一個中斷控制器節點的屬性,它宣告了該中斷控制器的中斷指示符(-interrupts)中 cell 的個數
- #address-cells , #size-cells, ranges:用于尋址, #address-cells表示reg中address元素的個數,#size-cells用來表示length元素的個數
- interrupt-controller: 表示該節點是一個中斷控制器
- redistributor-stride: 一個GICR的大小
- #redistributor-regions: GICR域個數,
- **reg :**GIC的物理基地址,分別對應GICD,GICR,GICC…
- interrupts: 分別代表中斷型別,中斷號,中斷型別, PPI中斷親和, 保留欄位,
a為0表示SPI,1表示PPI;b表示中斷號(注意SPI/PPI的中斷號范圍);c為1表示沿中斷,4表示電平中斷, - msi-controller: 表示節點是MSI控制器
3、ARM GICV3代碼分析
基于內核版本5.10
代碼路徑:
kernel/drivers/irqchip/irq-gic-v3.c
GICV3的初始化流程:
3.1 irq chip driver宣告
IRQCHIP_DECLARE(gic_v3, "arm,gic-v3", gic_of_init);
定義IRQCHIP_DECLARE之后,相應的內容會保存到__irqchip_of_table里邊,
#define IRQCHIP_DECLARE(name, compat, fn) OF_DECLARE_2(irqchip, name, compat, fn)
#define OF_DECLARE_2(table, name, compat, fn) \
_OF_DECLARE(table, name, compat, fn, of_init_fn_2)
#define _OF_DECLARE(table, name, compat, fn, fn_type) \
static const struct of_device_id __of_table_##name \
__used __section(__##table##_of_table) \
= { .compatible = compat, \
.data = (fn == (fn_type)NULL) ? fn : fn }
宏展開程序如下:
--IRQCHIP_DECLARE(gic_v3, "arm,gic-v3", gic_of_init);
--OF_DECLARE_2(irqchip, gic_v3, "arm,gic-v3", gic_of_init);
--_OF_DECLARE(irqchip, gic_v3, "arm,gic-v3", gic_of_init, of_init_fn_2);
--
static const struct of_device_id __of_table_gic_v3 \
__used __section(__irqchip_of_table) \
= { .compatible = "arm,gic-v3", \
.data = gic_of_init }
展開以后可以看到,在vmlinux的__irqchip_of_table段中保存了一個of_device_id型別的結構體,其compatible欄位為"arm,gic-v3",data指標指向gic_of_init函式,
_irqchip_of_table在vmlinux.lds檔案里邊被放到了 _irqchip_begin和__irqchip_of_end之間
#ifdef CONFIG_IRQCHIP
#define IRQCHIP_OF_MATCH_TABLE() \
. = ALIGN(8); \
VMLINUX_SYMBOL(__irqchip_begin) = .; \
*(__irqchip_of_table) \
*(__irqchip_of_end)
#endif
在內核啟動初始化中斷的函式中,of_irq_init 函式會去查找設備節點資訊,該函式的傳入引數就是 __irqchip_of_table 段,由于 IRQCHIP_DECLARE 已經將資訊填充好了,of_irq_init 函式會根據 “arm,gic-v3” 去查找對應的設備節點,并獲取設備的資訊,or_irq_init 函式中,如果檢測到設備樹中存在和"arm,gic-v3"匹配的節點,并且節點下面存在interrupt-controller欄位,那么最侄訓回呼 IRQCHIP_DECLARE 宣告的回呼函式,也就是data指向的 gic_of_init函式,而這個函式就是 GIC 驅動的初始化入口,of_irq_init函式在drivers/of/irq.c中實作,
--drivers/irqchip/irqchip.c
void __init irqchip_init(void)
{
of_irq_init(__irqchip_of_table);
acpi_probe_device_table(irqchip);
}
對應arm64架構,在arch/arm64/kernel/irq.c的init_IRQ函式中會呼叫irqchip_init函式:
void __init init_IRQ(void)
{
init_irq_stacks();
irqchip_init();
if (!handle_arch_irq)
panic("No interrupt controller found.");
if (system_uses_irq_prio_masking()) {
/*
* Now that we have a stack for our IRQ handler, set
* the PMR/PSR pair to a consistent state.
*/
WARN_ON(read_sysreg(daif) & PSR_A_BIT);
local_daif_restore(DAIF_PROCCTX_NOIRQ);
}
}
init_IRQ則是在內核啟動流程中呼叫,init/main.c的main函式如下:
/* init some links before init_ISA_irqs() */
early_irq_init();
init_IRQ();
tick_init();
rcu_init_nohz();
init_timers();
hrtimers_init();
softirq_init();
timekeeping_init();
3.2 gic_of_init流程
static int __init gic_of_init(struct device_node *node, struct device_node *parent)
{
void __iomem *dist_base;
struct redist_region *rdist_regs;
u64 redist_stride;
u32 nr_redist_regions;
int err, i;
dist_base = of_iomap(node, 0);//index為0,映射GICD(GIC Distributor)的暫存器地址空間 ------------- (1)
if (!dist_base) {
pr_err("%pOF: unable to map gic dist registers\n", node);
return -ENXIO;
}
err = gic_validate_dist_version(dist_base);//檢測gic的版本是否是v3或者v4,讀GICD_PIDR2暫存器 --------------- (2)
if (err) {
pr_err("%pOF: no distributor detected, giving up\n", node);
goto out_unmap_dist;
}
if (of_property_read_u32(node, "#redistributor-regions", &nr_redist_regions))
//讀取設備樹中#redistributor-regions的值,3個cpu clusters對應3個GICR域? --------------- (3)
nr_redist_regions = 1;
rdist_regs = kcalloc(nr_redist_regions, sizeof(*rdist_regs),
GFP_KERNEL);
if (!rdist_regs) {
err = -ENOMEM;
goto out_unmap_dist;
}
for (i = 0; i < nr_redist_regions; i++) {
struct resource res;
int ret;
ret = of_address_to_resource(node, 1 + i, &res);
rdist_regs[i].redist_base = of_iomap(node, 1 + i);
if (ret || !rdist_regs[i].redist_base) {
pr_err("%pOF: couldn't map region %d\n", node, i);
err = -ENODEV;
goto out_unmap_rdist;
}
rdist_regs[i].phys_base = res.start;
}//映射每一個GICR的基地址 --------- (4)
if (of_property_read_u64(node, "redistributor-stride", &redist_stride))
redist_stride = 0;//讀取DTS中redistributor-stride的值,redistributor-stride代表GICR域中每一個GICR的大小,正常情況下一個CPU core對應一個GICR(redistributor-stride必須是64KB的倍數),一個cpu cluster中cpu core的個數乘以redistributor-stride的值等于一個GICR域的大小,----------- (5)
gic_enable_of_quirks(node, gic_quirks, &gic_data);
err = gic_init_bases(dist_base, rdist_regs, nr_redist_regions,
redist_stride, &node->fwnode);//gic初始化函式 ------------- (6)
if (err)
goto out_unmap_rdist;
gic_populate_ppi_partitions(node);
if (static_branch_likely(&supports_deactivate_key))
gic_of_setup_kvm_info(node);//虛擬化相關設定
return 0;
out_unmap_rdist:
for (i = 0; i < nr_redist_regions; i++)
if (rdist_regs[i].redist_base)
iounmap(rdist_regs[i].redist_base);
kfree(rdist_regs);
out_unmap_dist:
iounmap(dist_base);
return err;
}
(1)映射GICD的暫存器地址空間, 通過設備結點直接進行設備記憶體區間的 ioremap(),index是記憶體段的索引,若設備結點的reg屬性有多段,可通過index標示要ioremap的是哪一段,只有1段的情況, index為0,采用Device Tree后,大量的設備驅動通過of_iomap()進行映射,而不再通過傳統的ioremap,
(2) 驗證GICD的版本是否為GICv3 or GICv4, 主要通過讀GICD_PIDR2暫存器bit[7:4]. 0x1代表GICv1, 0x2代表GICv2…以此類推,
(3) 通過DTS讀取redistributor-regions的值,redistributor-regions代表GICR獨立的區域數量(地址連續),
假設一個64核的arm64 服務器,redistributor-regions=2, 那么64個核可以用2個連續的GICR連續空間表示,
(4) 為一個GICR域 分配基地址,
(5) 通過DTS讀取redistributor-stride的值. redistributor-stride代表GICR域中每一個GICR的大小,正常情況下一個CPU對應一個GICR(redistributor-stride必須是64KB的倍數)
(6) 主要處理流程,下面介紹,
(7) 可以設定一組PPI的親和性,(TODO:分析PPI親和度的設定程序)
3.3 gic_init_bases流程
static int __init gic_init_bases(void __iomem *dist_base,
struct redist_region *rdist_regs,
u32 nr_redist_regions,
u64 redist_stride,
struct fwnode_handle *handle)
{
u32 typer;
int err;
if (!is_hyp_mode_available())
static_branch_disable(&supports_deactivate_key);
if (static_branch_likely(&supports_deactivate_key))
pr_info("GIC: Using split EOI/Deactivate mode\n");
gic_data.fwnode = handle;
gic_data.dist_base = dist_base;
gic_data.redist_regions = rdist_regs;
gic_data.nr_redist_regions = nr_redist_regions;
gic_data.redist_stride = redist_stride; //初始化全域結構體static struct gic_chip_data gic_data
/*
* Find out how many interrupts are supported.
*/
typer = readl_relaxed(gic_data.dist_base + GICD_TYPER);//讀取GICD_TYPER暫存器的值,后面可以根據typer計算得到所支持的SPI中斷號最大值為多少 ------------- (1)
gic_data.rdists.gicd_typer = typer;
gic_enable_quirks(readl_relaxed(gic_data.dist_base + GICD_IIDR),
gic_quirks, &gic_data);
pr_info("%d SPIs implemented\n", GIC_LINE_NR - 32);
//展開GIC_LINE_NR可以計算得到SPI中斷號的最大值,GICD_TYPER暫存器bit[4:0], 如果該欄位的值為N,則最大SPI INTID為32(N + 1)-1
pr_info("%d Extended SPIs implemented\n", GIC_ESPI_NR);
/*
* ThunderX1 explodes on reading GICD_TYPER2, in violation of the
* architecture spec (which says that reserved registers are RES0).
*/
if (!(gic_data.flags & FLAGS_WORKAROUND_CAVIUM_ERRATUM_38539))
gic_data.rdists.gicd_typer2 = readl_relaxed(gic_data.dist_base + GICD_TYPER2);
gic_data.domain = irq_domain_create_tree(handle, &gic_irq_domain_ops,
&gic_data);//向系統中注冊一個irq domain資料結構 ------------- (2)
gic_data.rdists.rdist = alloc_percpu(typeof(*gic_data.rdists.rdist));
gic_data.rdists.has_rvpeid = true;
gic_data.rdists.has_vlpis = true;
gic_data.rdists.has_direct_lpi = true;
gic_data.rdists.has_vpend_valid_dirty = true;
if (WARN_ON(!gic_data.domain) || WARN_ON(!gic_data.rdists.rdist)) {
err = -ENOMEM;
goto out_free;
}
irq_domain_update_bus_token(gic_data.domain, DOMAIN_BUS_WIRED);// ------------- (3)
gic_data.has_rss = !!(typer & GICD_TYPER_RSS);// ------------- (4)
pr_info("Distributor has %sRange Selector support\n",
gic_data.has_rss ? "" : "no ");
if (typer & GICD_TYPER_MBIS) {
err = mbi_init(handle, gic_data.domain);// ------------- (5)
if (err)
pr_err("Failed to initialize MBIs\n");
}
set_handle_irq(gic_handle_irq);// 設定arch相關的irq handler,gic_irq_handle是內核gic中斷處理的入口函式 ------------- (6)
gic_update_rdist_properties();
gic_dist_init();
gic_cpu_init();
gic_smp_init();
gic_cpu_pm_init();
if (gic_dist_supports_lpis()) {
its_init(handle, &gic_data.rdists, gic_data.domain);
its_cpu_init();
} else {
if (IS_ENABLED(CONFIG_ARM_GIC_V2M))
gicv2m_init(handle, gic_data.domain);
}
gic_enable_nmi_support();
return 0;
out_free:
if (gic_data.domain)
irq_domain_remove(gic_data.domain);
free_percpu(gic_data.rdists.rdist);
return err;
}
(1) 確認支持SPI 中斷號最大的值為多少,GICv3最多支持1020個中斷(SPI+SGI+SPI).GICD_TYPER暫存器bit[4:0], 如果該欄位的值為N,則最大SPI INTID為32(N + 1)-1, 例如,0x00011指定最大SPI INTID為127,
(2) 向系統中注冊一個irq domain的資料結構. irq_domain主要作用是將硬體中斷號映射到IRQ number, 參考第4節:中斷域irq_domain以及中斷映射
(3) 主要作用是給irq_find_host()函式使用,找到對應的irq_domain, 這里使用 DOMAIN_BUS_WIRED,主要作用就是區分其他domain, 如MSI,
(4) 判斷GICD 是否支持rss, rss(Range Selector Support)表示SGI中斷親和性的范圍 GICD_TYPER暫存器bit[26], 如果該欄位為0,表示中斷路由(IRI) 支持affinity 0-15的SGI,如果該欄位為1, 表示支持affinity 0 - 255的SGI,
(5) 判斷是否支持通過寫GICD暫存器生成訊息中斷,GICD_TYPER暫存器bit[16],
(6) 設定arch相關的irq handler,gic_irq_handle是內核gic中斷處理的入口函式,
(7) 更新vlpi相關配置,gic虛擬化相關,
(8) 初始化ITS, Interrupt Translation Service, 用來決議LPI中斷, 初始化之前需要先判斷GIC是否支持LPI,該功能在ARM里是可選的,
(9) 該函式主要包含兩個作用, 1.設定核間通信函式,當一個CPU core上的軟體控制行為需要傳遞到其他的CPU上的時候,就會呼叫這個callback函式(例如在某一個CPU上運行的行程呼叫了系統呼叫進行reboot),對于GIC v3,這個callback定義為gic_raise_softirq. 2. 設定CPU 上下線流程中和GIC相關的狀態機,
(10) 初始化GICD,
(11) 初始化CPU interface,
(12) 初始化GIC電源管理,
4、中斷域irq_domain以及中斷映射
gic的中斷處理程式是從ack一個硬體中斷開始的, 在gic的中斷處理程序中,會根據中斷的映射去尋找對應的虛擬中斷號, 再去進行后續的中斷處理,
那么問題來了,為什么要有一個虛擬中斷號的概念?
當前的SOC,通常內部會有多個中斷控制器(比如gic interrupt controller, gpio interrupt controller), 每一個中斷控制器對應多個中斷號, 而硬體中斷號在不同的中斷控制器上是會重復編碼的, 這時僅僅用硬中斷號已經不能唯一標識一個外設中斷, 對于軟體工程師而言,我們不需要care是中斷哪個中斷控制器的第幾個中斷號, 因此linux kernel提供了一個虛擬中斷號的概念,
4.1 irq_domain
linux kernel提供irq_domain的管理框架, 將hwirq映射到虛擬中斷號上,每一個中斷控制器都需要注冊一個irq_domain,
irq_domian資料結構:
/**
* struct irq_domain - Hardware interrupt number translation object
* @link: Element in global irq_domain list.
* @name: Name of interrupt domain
* @ops: pointer to irq_domain methods
* @host_data: private data pointer for use by owner. Not touched by irq_domain
* core code.
* @flags: host per irq_domain flags
* @mapcount: The number of mapped interrupts
*
* Optional elements
* @fwnode: Pointer to firmware node associated with the irq_domain. Pretty easy
* to swap it for the of_node via the irq_domain_get_of_node accessor
* @gc: Pointer to a list of generic chips. There is a helper function for
* setting up one or more generic chips for interrupt controllers
* drivers using the generic chip library which uses this pointer.
* @parent: Pointer to parent irq_domain to support hierarchy irq_domains
* @debugfs_file: dentry for the domain debugfs file
*
* Revmap data, used internally by irq_domain
* @revmap_direct_max_irq: The largest hwirq that can be set for controllers that
* support direct mapping
* @revmap_size: Size of the linear map table @linear_revmap[]
* @revmap_tree: Radix map tree for hwirqs that don't fit in the linear map
* @linear_revmap: Linear table of hwirq->virq reverse mappings
*/
struct irq_domain {
struct list_head link;
const char *name;
const struct irq_domain_ops *ops;
void *host_data;
unsigned int flags;
unsigned int mapcount;
/* Optional data */
struct fwnode_handle *fwnode;
enum irq_domain_bus_token bus_token;
struct irq_domain_chip_generic *gc;
#ifdef CONFIG_IRQ_DOMAIN_HIERARCHY
struct irq_domain *parent;
#endif
#ifdef CONFIG_GENERIC_IRQ_DEBUGFS
struct dentry *debugfs_file;
#endif
/* reverse map data. The linear map gets appended to the irq_domain */
irq_hw_number_t hwirq_max;
unsigned int revmap_direct_max_irq;
unsigned int revmap_size;
struct radix_tree_root revmap_tree;
struct mutex revmap_tree_mutex;
unsigned int linear_revmap[];
};
link: 用于將irq domain連接到全域鏈表irq_domain_list中;
name: irq domain的名稱;
ops: irq domain映射操作使用方法的集合;
mapcount: 映射好的中斷的數量;
fwnode: 對應中斷控制器的device node;
parent: 指向父級irqdomain的指標,用于支持級聯irq_domain;
hwirq_max: 該irq domain支持的中斷最大數量;
revmap_tree: Radix Tree 映射的根節點;
linear_revmap: hwirq->virq 反向映射的線性表;
從該結構體中我們可以看出irq_domain支持多種型別的映射,
irq_domain映射型別:
(1) 線性映射
線性映射保留一張固定的表,通過hwirq number來索引.當hwirq被映射后, 會相應地分配一個irq_desc, IRQ number就被存在表中,當hwirqs是固定的而且小于256, 用線性映射更好,它的優勢是尋找時間固定,并且irq_descs只在in-use IRQs分配,缺點是表格和hwirq 最大numbers一樣大.
irq_domain_add_linear
(2) 樹映射
此種方法使用radix tree來維護映射, 通過key來查找此方法適合hwirq number非常大的時候, 因為它不需要分配和hwirq一樣大的table,
缺點是查表效率依賴table里的entries數量,
irq_domain_add_tree
(3) 不映射
當有些硬體可以對hwirq number編程時,IRQ number被編進硬體暫存器里,那么就不需要映射了,這種情況下通過irq_create_direct_mapping()實作,
irq_domain_add_nomap()
4.2 中斷映射的完整程序
(1)interrupt controller初始化的程序中,注冊irq domain
在前面介紹的gic_of_init函式中,gic會去注冊irq_domain(申請一個irq_domain資料結構,并且添加到全域鏈表irq_domain_list):
-->gic_of_init
-->gic_init_bases
-->irq_domain_create_tree
-->__irq_domain_add
__irq_domain_add(fwnode, 0, ~0, 0, ops, host_data);//第二個引數size=0,第四個引數direct_max=0,表明gic-v3的irq_domain的映射型別為radix tree mapping
__irq_domain_add的函式實作在kernel/irq/irqdomain.c中,如下:
/**
* __irq_domain_add() - Allocate a new irq_domain data structure
* @fwnode: firmware node for the interrupt controller
* @size: Size of linear map; 0 for radix mapping only
* @hwirq_max: Maximum number of interrupts supported by controller
* @direct_max: Maximum value of direct maps; Use ~0 for no limit; 0 for no
* direct mapping
* @ops: domain callbacks
* @host_data: Controller private data pointer
*
* Allocates and initializes an irq_domain structure.
* Returns pointer to IRQ domain, or NULL on failure.
*/
struct irq_domain *__irq_domain_add(struct fwnode_handle *fwnode, int size,
irq_hw_number_t hwirq_max, int direct_max,
const struct irq_domain_ops *ops,
void *host_data)
{
struct irqchip_fwid *fwid;
struct irq_domain *domain;
static atomic_t unknown_domains;
domain = kzalloc_node(sizeof(*domain) + (sizeof(unsigned int) * size),
GFP_KERNEL, of_node_to_nid(to_of_node(fwnode)));
if (!domain)
return NULL;
if (is_fwnode_irqchip(fwnode)) {
fwid = container_of(fwnode, struct irqchip_fwid, fwnode);
switch (fwid->type) {
case IRQCHIP_FWNODE_NAMED:
case IRQCHIP_FWNODE_NAMED_ID:
domain->fwnode = fwnode;
domain->name = kstrdup(fwid->name, GFP_KERNEL);
if (!domain->name) {
kfree(domain);
return NULL;
}
domain->flags |= IRQ_DOMAIN_NAME_ALLOCATED;
break;
default:
domain->fwnode = fwnode;
domain->name = fwid->name;
break;
}
} else if (is_of_node(fwnode) || is_acpi_device_node(fwnode) ||
is_software_node(fwnode)) {
char *name;
/*
* fwnode paths contain '/', which debugfs is legitimately
* unhappy about. Replace them with ':', which does
* the trick and is not as offensive as '\'...
*/
name = kasprintf(GFP_KERNEL, "%pfw", fwnode);
if (!name) {
kfree(domain);
return NULL;
}
strreplace(name, '/', ':');
domain->name = name;
domain->fwnode = fwnode;
domain->flags |= IRQ_DOMAIN_NAME_ALLOCATED;
}
if (!domain->name) {
if (fwnode)
pr_err("Invalid fwnode type for irqdomain\n");
domain->name = kasprintf(GFP_KERNEL, "unknown-%d",
atomic_inc_return(&unknown_domains));
if (!domain->name) {
kfree(domain);
return NULL;
}
domain->flags |= IRQ_DOMAIN_NAME_ALLOCATED;
}
fwnode_handle_get(fwnode);
/* Fill structure */
INIT_RADIX_TREE(&domain->revmap_tree, GFP_KERNEL);
mutex_init(&domain->revmap_tree_mutex);
domain->ops = ops;
domain->host_data = host_data;
domain->hwirq_max = hwirq_max;
domain->revmap_size = size;
domain->revmap_direct_max_irq = direct_max;
irq_domain_check_hierarchy(domain);
mutex_lock(&irq_domain_mutex);
debugfs_add_domain_dir(domain);
list_add(&domain->link, &irq_domain_list);//將irq_domain添加到全域鏈表irq_domain_list
mutex_unlock(&irq_domain_mutex);
pr_debug("Added domain %s\n", domain->name);
return domain;
}
EXPORT_SYMBOL_GPL(__irq_domain_add);
從函式的注釋我們也可以看出來,__irq_domain_add用于申請并且初始化一個irq_domain結構體,
irq_domain分配的記憶體大小為sizeof(*domain) + (sizeof(unsigned int) * size), (sizeof(unsigned int) * size)大小的空間是用于linear_revmap[]成員,最后,irq_domain添加到全域的鏈表irq_domain_list中,
(2)內核在設備初始化程序中(決議設備樹創建device的程序),創建硬中斷號和虛擬中斷號的映射關系
內核啟動程序在進行系統初始化時,do_initcall()函式會呼叫系統中所有的initcall回呼函式進行初始化,其中of_platform_default_populate_init()函式定義為arch_initcall_sync型別的初始化函式,
--drivers/of/platform.c
static int __init of_platform_default_populate_init(void)
{
struct device_node *node;
device_links_supplier_sync_state_pause();
if (!of_have_populated_dt())
return -ENODEV;
/*
* Handle certain compatibles explicitly, since we don't want to create
* platform_devices for every node in /reserved-memory with a
* "compatible",
*/
for_each_matching_node(node, reserved_mem_matches)
of_platform_device_create(node, NULL, NULL);
node = of_find_node_by_path("/firmware");
if (node) {
of_platform_populate(node, NULL, NULL, NULL);
of_node_put(node);
}
/* Populate everything else. */
fw_devlink_pause();
of_platform_default_populate(NULL, NULL, NULL);
fw_devlink_resume();
return 0;
}
of_platform_default_populate()函式會列舉并且初始化總線上的設備,比如一般設備樹中soc節點的compatiable為"simple-bus",那么該函式匹配到這個欄位以后就會去列舉soc下面所有的設備,最侄訓決議總線下面每個device node的設備樹資訊,填充device結構體,并完成device的注冊,我們關注這個程序當中的各個device的中斷映射,
-->do_one_initcall()
-->of_platform_default_populate(NULL, NULL, NULL);
-->of_platform_populate(root, of_default_bus_match_table, lookup, parent);
-->of_platform_bus_create(child, matches, lookup, parent, true);
-->of_platform_device_create_pdata(bus, bus_id, platform_data, parent);
-->of_device_alloc(np, bus_id, parent);
-->of_irq_to_resource_table(np, res, num_irq)
-->of_irq_to_resource(dev, i, res)
-->of_irq_get(dev, index);
//在of_irq_get函式中會完成該device node中和中斷有關的內容的決議,并且建立映射關系
/**
* of_irq_get - Decode a node's IRQ and return it as a Linux IRQ number
* @dev: pointer to device tree node
* @index: zero-based index of the IRQ
*
* Returns Linux IRQ number on success, or 0 on the IRQ mapping failure, or
* -EPROBE_DEFER if the IRQ domain is not yet created, or error code in case
* of any other failure.
*/
int of_irq_get(struct device_node *dev, int index)
{
int rc;
struct of_phandle_args oirq;
struct irq_domain *domain;
rc = of_irq_parse_one(dev, index, &oirq);//決議device node中和中斷相關的資訊
if (rc)
return rc;
domain = irq_find_host(oirq.np);//oirq.np指向interrupt-controller,通過這個函式拿到前面gic的初始化程序中已經注冊的irq_domain
if (!domain)
return -EPROBE_DEFER;
return irq_create_of_mapping(&oirq);
}
EXPORT_SYMBOL_GPL(of_irq_get);
of_irq_parse_one()函式用于決議dts檔案中device node和中斷有關的屬性,比如interrupt屬性,irq_create_of_mapping函式完成具體的硬體中斷到虛擬中斷號的映射程序,
--> irq_create_of_mapping
-->irq_create_fwspec_mapping
--> irq_find_matching_fwspec // 找到device node對應的irq_domain, 每一個irq_domain都定義了一系列的映射相關的方法
--> irq_domain_translate //決議中斷資訊,如硬體中斷號, 中斷觸發型別
--> domain->ops->translate (gic_irq_domain_translate)//在這里回呼irq_domain ops提供的translate方法,這個方法在gic-v3的驅動代碼中定義
--> irq_domain_alloc_descs // 映射硬體中斷號到虛擬中斷號
-->__irq_domain_alloc_irqs
--> irq_domain_alloc_descs//分配一個虛擬中斷號 從allocated_irqs位圖中取第一個空閑的bit位作為虛擬中斷號
-->irq_domain_alloc_irqs_hierarchy
--> domain->ops->alloc (gic_irq_domain_alloc)
--> gic_irq_domain_map //gic創建硬中斷和虛擬中斷號的映射,并且根據中斷型別設定struct irq_desc->handle_irq處理函式
-->irq_domain_set_info
-->irq_domain_set_hwirq_and_chip
-->irq_domain_insert_irq(virq + i);//在irq_domain的radix tree中插入hwirq和irq_data的映射關系
-->irq_domain_set_mapping(domain, data->hwirq, data);
-->radix_tree_insert(&domain->revmap_tree, hwirq, irq_data);
/**
* irq_domain_set_hwirq_and_chip - Set hwirq and irqchip of @virq at @domain
* @domain: Interrupt domain to match
* @virq: IRQ number
* @hwirq: The hwirq number
* @chip: The associated interrupt chip
* @chip_data: The associated chip data
*/
int irq_domain_set_hwirq_and_chip(struct irq_domain *domain, unsigned int virq,
irq_hw_number_t hwirq, struct irq_chip *chip,
void *chip_data)
{
struct irq_data *irq_data = irq_domain_get_irq_data(domain, virq);
if (!irq_data)
return -ENOENT;
irq_data->hwirq = hwirq;
irq_data->chip = chip ? chip : &no_irq_chip;
irq_data->chip_data = chip_data;
return 0;
}
EXPORT_SYMBOL_GPL(irq_domain_set_hwirq_and_chip);
在irq_domain_set_hwirq_and_chip函式中,通過虛擬中斷號virq獲取irq_data資料結構,并把硬體中斷號hwirq設定給irq_data->hwirq,就完成了硬體中斷號到軟體虛擬中斷號的映射,引數chip在這里指的就是在gic-v3驅動中定義的gic-v3底層操作相關的方法集合,
static struct irq_chip gic_chip = {
.name = "GICv3",
.irq_mask = gic_mask_irq,
.irq_unmask = gic_unmask_irq,
.irq_eoi = gic_eoi_irq,
.irq_set_type = gic_set_type,
.irq_set_affinity = gic_set_affinity,
.irq_retrigger = gic_retrigger,
.irq_get_irqchip_state = gic_irq_get_irqchip_state,
.irq_set_irqchip_state = gic_irq_set_irqchip_state,
.irq_nmi_setup = gic_irq_nmi_setup,
.irq_nmi_teardown = gic_irq_nmi_teardown,
.ipi_send_mask = gic_ipi_send_mask,
.flags = IRQCHIP_SET_TYPE_MASKED |
IRQCHIP_SKIP_SET_WAKE |
IRQCHIP_MASK_ON_SUSPEND,
};
中斷映射程序資料結構之間的關系如下:

這里注意一下gic_irq_domain_translate函式通過設備樹引數計算得到硬體中斷號的代碼,設備樹中分配的中斷號加上offset才是真正的硬體中斷號,
switch (fwspec->param[0]) {
case 0: /* SPI */
*hwirq = fwspec->param[1] + 32;
break;
case 1: /* PPI */
*hwirq = fwspec->param[1] + 16;
break;
case 2: /* ESPI */
*hwirq = fwspec->param[1] + ESPI_BASE_INTID;
break;
case 3: /* EPPI */
*hwirq = fwspec->param[1] + EPPI_BASE_INTID;
break;
case GIC_IRQ_TYPE_LPI: /* LPI */
*hwirq = fwspec->param[1];
break;
case GIC_IRQ_TYPE_PARTITION:
*hwirq = fwspec->param[1];
if (fwspec->param[1] >= 16)
*hwirq += EPPI_BASE_INTID - 16;
else
*hwirq += 16;
break;
default:
return -EINVAL;
完整的映射程序總結如下:
1、gic的初始化流程中申請并且注冊一個irq_domain,每一個intr controller都對應一個irq_domain
--gic_of_init
--gic_init_bases
--irq_domain_create_tree
2、內核的啟動流程中,會決議并且遍歷設備樹,在決議總線下面每個device node的設備樹資訊,填充device結構體,并完成device注冊的程序中,會建立硬體中斷號和虛擬中斷號之間的映射關系
----of_device_alloc(np, bus_id, parent);
----of_irq_to_resource_table(np, res, num_irq)// kernel/drivers/of/irq.c
----of_irq_to_resource(dev, i, res)
----of_irq_get(dev, index)
----irq_create_of_mapping // kernel/irq/irqdomain.c 在這里完成具體的映射程序
----irq_create_fwspec_mapping(&fwspec); //struct irq_fwspec fwspec通過決議設備樹得到
----domain = irq_find_matching_fwspec(fwspec, DOMAIN_BUS_WIRED);// (1)找到該設備中斷對應的intr controller的irq domain
----irq_domain_translate(domain, fwspec, &hwirq, &type)// (2)回呼irq_domain ops提供的translate方法,這個方法在gic-v3的驅動代碼中定義,決議中斷資訊,如硬體中斷號, 中斷觸發型別等
----virq = irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, fwspec);// (3)分配一個虛擬中斷號,從allocated_irqs位圖中取第一個空閑的bit位作為虛擬中斷號virq,分配一個irq_desc結構體,通過radix_tree建立virq和irq_desc的映射關系
----virq = irq_domain_alloc_descs(irq_base, nr_irqs, 0, node, affinity);
----virq = __irq_alloc_descs(virq, virq, cnt, node, THIS_MODULE,affinity);// kernel/kernel/irq/irqdesc.c
----start = bitmap_find_next_zero_area(allocated_irqs, IRQ_BITMAP_BITS,from, cnt, 0);
----ret = alloc_descs(start, cnt, node, affinity, owner);
----desc = alloc_desc(start + i, node, flags, mask, owner);// 為irq_desc分配記憶體,申請一個irq_desc結構體
----irq_insert_desc(start + i, desc); //將virq和irq_desc插入radix tree,建立兩者的映射關系
----irq_domain_alloc_irq_data(domain, virq, nr_irqs)// (4)現在已經分配好virq和virq對應的irq_desc,在這里為其對應的irq_data分配記憶體并且申請一個irq_data結構體
----ret = irq_domain_alloc_irqs_hierarchy(domain, virq, nr_irqs, arg); // (5)回呼irq_domain ops提供的alloc方法,填充irq_data結構體,irq_data->hwirq = hwirq;,irq_data->chip = chip;chip指向在gic-v3驅動中定義的gic-v3底層操作相關的方法集合,
----gic_irq_domain_alloc
----gic_irq_domain_map(domain, virq + i, hwirq + i);
----irq_domain_set_info
----irq_domain_set_hwirq_and_chip(domain, virq, hwirq, chip, chip_data);
----irq_domain_insert_irq(virq + i);// (6)將hwirq和irq_data插入irq_domain下revmap_tree指標指向的radix_tree,建立硬體中斷號和irq_data之間的映射關系
----irq_domain_set_mapping(domain, data->hwirq, data);
----radix_tree_insert(&domain->revmap_tree, hwirq, irq_data);
1、在列舉設備樹中的設備時,會去決議設備對應的中斷資訊,首先根據設備樹里面中斷的拓撲關系得到該設備中斷對應的中斷控制器的irq_domain,
2、回呼irq_domain ops提供的translate方法,完成該設備中斷的具體決議作業,得到硬體中斷號hwirq,中斷觸發型別irq type等資訊,
3、從allocated_irqs位圖中取第一個空閑的bit位作為虛擬中斷號virq,申請記憶體并且分配一個irq_desc結構體,通過radix_tree建立virq和irq_desc的映射關系,
4、申請記憶體并且分配一個irq_data結構體,前面分配的irq_desc結構體中的irq_data指標指向這個irq_data,
5、回呼irq_domain ops提供的alloc方法,填充irq_data結構體,irq_data->hwirq = hwirq;,irq_data->chip = chip;chip指向在gic-v3驅動中定義的gic-v3底層操作相關的方法集合,
6、將hwirq和irq_data插入irq_domain下revmap_tree指標指向的radix_tree,建立硬體中斷號和irq_data之間的映射關系,
可以看到在映射程序中,涉及到一個位圖資料結構,allocated_irqs位圖用于空閑虛擬中斷號的分配,還涉及到兩個radix tree資料結構,一個radix tree用于保存virq和irq__desc的映射關系,另外一個radix tree (domain->revmap_tree)用于保存hwirq和irq_data的映射關系,
至此每個設備的device node和irq_domain、virq、hwirq、irq_desc、irq_data、chip之間的鏈接關系已經全部確立,
5、ARM64中斷處理程序
5.1 ARM64底層中斷處理流程(GICV3和匯編部分)
當GIC接收到一個中斷信號以后,處理流程如下:
1、進入pending狀態 (參考GICv3_Software_Overview_Official_Release_B第五節Handling Interrupts)
- Group enables
- Interrupt enables
- Routing controls
- Interrupt priority & priority mask
- Running priority
2、cpu應答中斷以后 (參考Arm_Architecture_Reference_Manual_Armv8_for Armv8-A_architecture_profile D.1.10節)
- 處理器的狀態保存在對應的例外等級的SPSR_ELx中
- 回傳地址保存在對應的例外等級
- PSTATE暫存器里的DAIF域都設定為1,相當于把除錯例外、系統錯誤(SError)、IRQ以及FIQ都關閉了
- 設定堆疊指標,指向對應例外等級里的堆疊
- 處理器等級切換到對應的例外等級,然后跳轉到例外向量表執行
5.1.1 例外向量表
例外向量表存放的基地址可以通過向量基址暫存器(Vcetor Base Address Register,VBAR)來設定,VBAR是例外向量表的基地址暫存器,
ARMv8架構針對不同的運行狀態定義了4種不同型別的例外向量表,如下:



當前例外等級指的是系統中當前最高等級的例外等級,假設當前系統只運行linux內核并且不包含虛擬化和安全特性,那么當前系統最高的例外等級就是EL1,在EL1下運行linux內核程式,在更低一級的EL0下運行用戶態程式,基于這種假設,對于上面例外向量表中的4種運行狀態可以做以下說明:
- **使用SP0暫存器的當前例外等級:**表示當前系統運行在EL1(內核態)時使用EL0的堆疊指標SP,這是一種錯誤型別,
- **使用SPx暫存器的當前例外等級:**表示當前系統運行在EL1時使用EL1的SP,這說明系統在內核態發生了例外,這是很常見的場景,
- **在AArch64執行環境下的低例外等級:**表示當前系統運行在EL0(用戶態)并且執行ARM64指令集的程式時發生了例外,
- **在AArch32執行環境下的低例外等級:**表示當前系統運行在EL0并且執行ARM32指令集程式時發生了例外,
與上表對應,linux內核中關于例外向量表的定義在arch/arm64/kernel/entry.S中,如下:
/*
* Exception vectors.
*/
.pushsection ".entry.text", "ax"
.align 11
SYM_CODE_START(vectors)
kernel_ventry 1, sync_invalid // Synchronous EL1t
kernel_ventry 1, irq_invalid // IRQ EL1t
kernel_ventry 1, fiq_invalid // FIQ EL1t
kernel_ventry 1, error_invalid // Error EL1t
kernel_ventry 1, sync // Synchronous EL1h
kernel_ventry 1, irq // IRQ EL1h
kernel_ventry 1, fiq_invalid // FIQ EL1h
kernel_ventry 1, error // Error EL1h
kernel_ventry 0, sync // Synchronous 64-bit EL0
kernel_ventry 0, irq // IRQ 64-bit EL0
kernel_ventry 0, fiq_invalid // FIQ 64-bit EL0
kernel_ventry 0, error // Error 64-bit EL0
#ifdef CONFIG_COMPAT
kernel_ventry 0, sync_compat, 32 // Synchronous 32-bit EL0
kernel_ventry 0, irq_compat, 32 // IRQ 32-bit EL0
kernel_ventry 0, fiq_invalid_compat, 32 // FIQ 32-bit EL0
kernel_ventry 0, error_compat, 32 // Error 32-bit EL0
#else
kernel_ventry 0, sync_invalid, 32 // Synchronous 32-bit EL0
kernel_ventry 0, irq_invalid, 32 // IRQ 32-bit EL0
kernel_ventry 0, fiq_invalid, 32 // FIQ 32-bit EL0
kernel_ventry 0, error_invalid, 32 // Error 32-bit EL0
#endif
SYM_CODE_END(vectors)
5.1.2 linux內核底層中斷處理流程(匯編部分)
假設IRQ發生在內核態,也就是CPU正在EL1下執行內核程式的時候發生了外設中斷,下面分析linux內核在匯編部分的中斷處理流程,
kernel_ventry是一個宏,這里不給出完整的展開代碼,簡化后的部分代碼如下:
.macro kernel_ventry, el, label, regsize = 64 //el, label是宏的引數
.align 7 //align是一條偽指令 align 7表示按照2^7位元組對齊
sub sp, sp, #S_FRAME_SIZE
b el\()\el\()_\label
/*
S_FRAME_SIZE表示堆疊框的大小--sizeof(struct pt_regs);
sub sp, sp, #S_FRAME_SIZE指令讓堆疊針sp下移S_FRAME_SIZE;
b el\()\el\()_\label指令的說明如下:
第一個“el”表示el字符,“\()”表示宏引數的結束字符,第二個“\el”表示宏的引數el,“\label”表示引數label
*/
當IRQ發生在內核態時,CPU會跳轉到例外向量表對應的表項,也就是
kernel_ventry 1, irq // IRQ EL1h
這個時候展開
b el\()\el\()_\label /* \el=1 \label=irq */
---->
b el1_irq
最侄訓跳轉到el1_irq標簽中,
/*
* EL1 mode handlers.
*/
.align 6
SYM_CODE_START_LOCAL_NOALIGN(el1_irq)
kernel_entry 1
el1_interrupt_handler handle_arch_irq
kernel_exit 1
SYM_CODE_END(el1_irq)
el1_irq是中斷處理的核心模塊,
- kernel_entry是一個宏,用來保存中斷背景關系,
- el1_interrupt_handler同樣是一個宏,用來處理中斷,最終目的是呼叫handle_arch_irq函式,跳轉到c語言的中斷處理部分,
- kernel_exit宏和kernel_entry宏是成對出現的,用來恢復中斷背景關系,
5.1.2.1 保存中斷背景關系
1、堆疊框
Linux內核中定義了一個pt_regs資料結構來描述內核堆疊上暫存器的排列資訊,
arch/arm64/include/asm/ptrace.h
/*
* This struct defines the way the registers are stored on the stack during an
* exception. Note that sizeof(struct pt_regs) has to be a multiple of 16 (for
* stack alignment). struct user_pt_regs must form a prefix of struct pt_regs.
*/
struct pt_regs {
union {
struct user_pt_regs user_regs;
struct {
u64 regs[31];
u64 sp;
u64 pc;
u64 pstate;
};
};
u64 orig_x0;
#ifdef __AARCH64EB__
u32 unused2;
s32 syscallno;
#else
s32 syscallno;
u32 unused2;
#endif
u64 orig_addr_limit;
/* Only valid when ARM64_HAS_IRQ_PRIO_MASKING is enabled. */
u64 pmr_save;
u64 stackframe[2];
/* Only valid for some EL1 exceptions. */
u64 lockdep_hardirqs;
u64 exit_rcu;
};
pt_regs資料結構定義了34個暫存器,分別代表x0~x30、SP暫存器、PC暫存器以及PSTATE暫存器,另外還包括了stackframe等資訊,
Linux內核定義了很多宏來訪問pt_regs資料結構對應的堆疊框,如前面使用到的S_FRAME_SIZE,有很多匯編代碼會直接使用這些宏,
arch/arm64/kernel/asm-offsets.c
DEFINE(S_FRAME_SIZE, sizeof(struct pt_regs));
2、保存中斷背景關系
kernel_entry宏用來保存中斷背景關系,該宏有一個引數el,el=1時會保存發生在EL1的例外現場,當el=0時,保存EL0的例外現場,
.macro kernel_entry, el, regsize = 64
.if \regsize == 32
mov w0, w0 // zero upper 32 bits of x0
.endif
stp x0, x1, [sp, #16 * 0]
stp x2, x3, [sp, #16 * 1]
stp x4, x5, [sp, #16 * 2]
stp x6, x7, [sp, #16 * 3]
stp x8, x9, [sp, #16 * 4]
stp x10, x11, [sp, #16 * 5]
stp x12, x13, [sp, #16 * 6]
stp x14, x15, [sp, #16 * 7]
stp x16, x17, [sp, #16 * 8]
stp x18, x19, [sp, #16 * 9]
stp x20, x21, [sp, #16 * 10]
stp x22, x23, [sp, #16 * 11]
stp x24, x25, [sp, #16 * 12]
stp x26, x27, [sp, #16 * 13]
stp x28, x29, [sp, #16 * 14]
.if \el == 0
clear_gp_regs
mrs x21, sp_el0
ldr_this_cpu tsk, __entry_task, x20
msr sp_el0, tsk
/*
* Ensure MDSCR_EL1.SS is clear, since we can unmask debug exceptions
* when scheduling.
*/
ldr x19, [tsk, #TSK_TI_FLAGS]
disable_step_tsk x19, x20
/* Check for asynchronous tag check faults in user space */
check_mte_async_tcf x22, x23
apply_ssbd 1, x22, x23
ptrauth_keys_install_kernel tsk, x20, x22, x23
scs_load tsk, x20
.else
add x21, sp, #S_FRAME_SIZE
get_current_task tsk
/* Save the task's original addr_limit and set USER_DS */
ldr x20, [tsk, #TSK_TI_ADDR_LIMIT]
str x20, [sp, #S_ORIG_ADDR_LIMIT]
mov x20, #USER_DS
str x20, [tsk, #TSK_TI_ADDR_LIMIT]
/* No need to reset PSTATE.UAO, hardware's already set it to 0 for us */
.endif /* \el == 0 */
mrs x22, elr_el1
mrs x23, spsr_el1
stp lr, x21, [sp, #S_LR]
/*
* In order to be able to dump the contents of struct pt_regs at the
* time the exception was taken (in case we attempt to walk the call
* stack later), chain it together with the stack frames.
*/
.if \el == 0
stp xzr, xzr, [sp, #S_STACKFRAME]
.else
stp x29, x22, [sp, #S_STACKFRAME]
.endif
add x29, sp, #S_STACKFRAME
#ifdef CONFIG_ARM64_SW_TTBR0_PAN
alternative_if_not ARM64_HAS_PAN
bl __swpan_entry_el\el
alternative_else_nop_endif
#endif
stp x22, x23, [sp, #S_PC]
/* Not in a syscall by default (el0_svc overwrites for real syscall) */
.if \el == 0
mov w21, #NO_SYSCALL
str w21, [sp, #S_SYSCALLNO]
.endif
/* Save pmr */
alternative_if ARM64_HAS_IRQ_PRIO_MASKING
mrs_s x20, SYS_ICC_PMR_EL1
str x20, [sp, #S_PMR_SAVE]
mov x20, #GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET
msr_s SYS_ICC_PMR_EL1, x20
alternative_else_nop_endif
/* Re-enable tag checking (TCO set on exception entry) */
#ifdef CONFIG_ARM64_MTE
alternative_if ARM64_MTE
SET_PSTATE_TCO(0)
alternative_else_nop_endif
#endif
/*
* Registers that may be useful after this macro is invoked:
*
* x20 - ICC_PMR_EL1
* x21 - aborted SP
* x22 - aborted PC
* x23 - aborted PSTATE
*/
.endm
- 首先,保存x0~x29暫存器到堆疊中,在前面介紹的例外向量表項中已經把SP指向了堆疊框的底部
sub sp, sp, #S_FRAME_SIZE
因此在堆疊框的底部保存了x0暫存器的值,依此類推保存了x1-x29暫存器的值,stp指令是多位元組存盤指令,
- 然后,處理例外發生在EL0或者EL1的場景,
當例外發生在EL0時,執行以下操作:
(1)呼叫clear_gp_regs宏來清楚x0-x29暫存器的值,
(2)保存sp_el0的值到x21暫存器中,
(3)ldr_this_cpu是一個宏,該宏有3個引數,引數1是task_struct資料結構;引數2是task_struct的Per-CPU變數,用來獲取當前cpu的當前行程的資料結構task_struct;引數3是一個臨時使用的通用暫存器x20,
(4)把thread_info.flags的值加載到x19暫存器當中,其中TSK_TI_FLAGS是thread_info.flags在task_struct資料結構中的偏移量,
(5)disable_step_tsk是一個宏,如果行程允許單步除錯,那么關閉MDSCR_EL1中的軟體單步控制功能,
當例外發生在EL1時,執行以下操作:
(1)x21暫存器指向堆疊頂,
(2)get_thread_info是一個宏,通過sp_el0暫存器來獲取task_struct資料結構中的指標,
(3)獲取thread_info.addr_limit的值,然后保存在堆疊框的orig_addr_limit位置上,
(4)設定USER_DS到task_struct的thread_info.addr_limit.
接下來是EL0和EL1都會進行的操作:
把ELR_EL1的值保存到x22暫存器中;
把SPSR_EL1的值保存到x23暫存器中;
把LR和x21暫存器保存到堆疊框的regs[30]的位置上,
再接下來如果例外發生在EL0,那么把堆疊框的stackframe[]欄位清零,xzr表示64位的零暫存器,如果例外發生在EL1,那么把堆疊框的stackframe[]欄位保存在x29和x22暫存器中,
接下來,x29暫存器指向堆疊框的stackframe位置,
接下來,把ELR_EL1的值保存到堆疊框的PC暫存器,把SPSR_EL1的值保存到PSTATE暫存器,(前面已經將這個暫存器的值分別保存到x22、x23暫存器當中),
如果例外發生在EL0,那么接下來把當前行程的task_struct指標保存到SP_EL0暫存器,
5.1.2.2 恢復中斷背景關系
kernel_exit宏可以用來恢復中斷背景關系,
.macro kernel_exit, el
.if \el != 0
disable_daif
/* Restore the task's original addr_limit. */
ldr x20, [sp, #S_ORIG_ADDR_LIMIT]
str x20, [tsk, #TSK_TI_ADDR_LIMIT]
/* No need to restore UAO, it will be restored from SPSR_EL1 */
.endif
/* Restore pmr */
alternative_if ARM64_HAS_IRQ_PRIO_MASKING
ldr x20, [sp, #S_PMR_SAVE]
msr_s SYS_ICC_PMR_EL1, x20
mrs_s x21, SYS_ICC_CTLR_EL1
tbz x21, #6, .L__skip_pmr_sync\@ // Check for ICC_CTLR_EL1.PMHE
dsb sy // Ensure priority change is seen by redistributor
.L__skip_pmr_sync\@:
alternative_else_nop_endif
ldp x21, x22, [sp, #S_PC] // load ELR, SPSR
#ifdef CONFIG_ARM64_SW_TTBR0_PAN
alternative_if_not ARM64_HAS_PAN
bl __swpan_exit_el\el
alternative_else_nop_endif
#endif
.if \el == 0
ldr x23, [sp, #S_SP] // load return stack pointer
msr sp_el0, x23
tst x22, #PSR_MODE32_BIT // native task?
b.eq 3f
#ifdef CONFIG_ARM64_ERRATUM_845719
alternative_if ARM64_WORKAROUND_845719
#ifdef CONFIG_PID_IN_CONTEXTIDR
mrs x29, contextidr_el1
msr contextidr_el1, x29
#else
msr contextidr_el1, xzr
#endif
alternative_else_nop_endif
#endif
3:
scs_save tsk, x0
/* No kernel C function calls after this as user keys are set. */
ptrauth_keys_install_user tsk, x0, x1, x2
apply_ssbd 0, x0, x1
.endif
msr elr_el1, x21 // set up the return data
msr spsr_el1, x22
ldp x0, x1, [sp, #16 * 0]
ldp x2, x3, [sp, #16 * 1]
ldp x4, x5, [sp, #16 * 2]
ldp x6, x7, [sp, #16 * 3]
ldp x8, x9, [sp, #16 * 4]
ldp x10, x11, [sp, #16 * 5]
ldp x12, x13, [sp, #16 * 6]
ldp x14, x15, [sp, #16 * 7]
ldp x16, x17, [sp, #16 * 8]
ldp x18, x19, [sp, #16 * 9]
ldp x20, x21, [sp, #16 * 10]
ldp x22, x23, [sp, #16 * 11]
ldp x24, x25, [sp, #16 * 12]
ldp x26, x27, [sp, #16 * 13]
ldp x28, x29, [sp, #16 * 14]
ldr lr, [sp, #S_LR]
add sp, sp, #S_FRAME_SIZE // restore sp
.if \el == 0
alternative_insn eret, nop, ARM64_UNMAP_KERNEL_AT_EL0
#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
bne 4f
msr far_el1, x30
tramp_alias x30, tramp_exit_native
br x30
4:
tramp_alias x30, tramp_exit_compat
br x30
#endif
.else
/* Ensure any device/NC reads complete */
alternative_insn nop, "dmb sy", ARM64_WORKAROUND_1508412
eret
.endif
sb
.endm
主要操作如下:
當例外發生在EL1時,恢復task_struct中的thread_info.addr_limit值,然后從堆疊框的S_PC位置加載ELR和SPSR的值到x21和x22暫存器中,
如果例外發生在EL0,執行以下操作,
(1)從堆疊框的S_SP位置加載堆疊框的最高地址(sp_top)到x23暫存器,然后設定到SP_EL0暫存器中,
(2)處理當前行程是32位的應用程式的情況,
接下來,把剛才從堆疊框中讀取的ELR暫存器的值恢復到ELR_EL1中,
接下來,把剛才從堆疊框中讀取的SPSR暫存器的值恢復到SPSR_EL1中,
接下來,從堆疊框中依此恢復x0-x29暫存器的值,
接下來,恢復LR的地址,
接下來,設定SP指向堆疊頂,
最后,通過ERET指令從例外現場回傳,ERET指令會使用ELR_ELx和SPSR_ELx的值來恢復現場,
在這里注意,整個IRQ處理程序中沒有看到關閉IRQ的代碼,實際上,當有中斷發生時,ARM64會自動把處理器的狀態PSTATE保存到SPSR_ELx里,并且會自動設定PSTATE暫存器里的DAIF域為1,相當于把除錯例外、系統錯誤(SError)、IRQ以及FIQ都關閉了,
當中斷處理完成以后使用ERET指令來恢復中斷現場,把之前保存的SPSR_ELx的值恢復到PSTATE暫存器里,相當于打開了IRQ,
5.2 ARM64高層中斷處理(C語言部分)
5.2.1 el1_irq
再回過頭來看el1_irq的匯編代碼:
SYM_CODE_START_LOCAL_NOALIGN(el1_irq)
kernel_entry 1
el1_interrupt_handler handle_arch_irq
kernel_exit 1
SYM_CODE_END(el1_irq)
el1_interrupt_handler是一個宏,展開后如下:
.macro el1_interrupt_handler, handler:req
enable_da_f
mov x0, sp
bl enter_el1_irq_or_nmi
irq_handler \handler
#ifdef CONFIG_PREEMPTION
ldr x24, [tsk, #TSK_TI_PREEMPT] // get preempt count
alternative_if ARM64_HAS_IRQ_PRIO_MASKING
/*
* DA_F were cleared at start of handling. If anything is set in DAIF,
* we come back from an NMI, so skip preemption
*/
mrs x0, daif
orr x24, x24, x0
alternative_else_nop_endif
cbnz x24, 1f // preempt count != 0 || NMI return path
bl arm64_preempt_schedule_irq // irq en/disable is done inside
1:
#endif
mov x0, sp
bl exit_el1_irq_or_nmi
.endm
.macro el0_interrupt_handler, handler:req
user_exit_irqoff
enable_da_f
tbz x22, #55, 1f
bl do_el0_irq_bp_hardening
1:
irq_handler \handler
.endm
- enable_da_f是一個宏,通過msr指令來把pstate暫存器的除錯例外(D域)以及SError中斷(A域)和FIQ(F域)的掩碼位清零,也就是打開上面三種例外和中斷功能,但是IRQ此時還是關閉的,因為我們現在正在處理IRQ,打開IRQ會帶來復雜的中斷嵌套問題,當前的Linux內核不支持中斷嵌套,
- irq_handler同樣是一個宏,用來處理中斷,后面會詳細分析,
- 如果是在內核空間發生的例外,中斷回傳之前會檢查是否允許搶占,然后再檢查是否可以搶占被中斷的行程,通過檢查當前行程的task_thread_info中的preempt_count欄位,當preempt_count為0時,表示當前行程可以被安全搶占,這個時候跳轉到arm64_preempt_schedule_irq執行一次搶占調度,(注:如果是在用戶空間發生的中斷例外,那么回傳用戶空間前只會檢查是否能搶占當前被中斷的行程,不會檢查是否允許搶占,因為用戶態不會被禁用搶占)
繼續展開irq_handle宏:
/*
* Interrupt handling.
*/
.macro irq_handler, handler:req
ldr_l x1, \handler //引數\handler為handle_arch_irq
mov x0, sp
irq_stack_entry
blr x1
irq_stack_exit
.endm
在前面介紹gicv3的初始化程序時,會設定和arch有關的irq handler:
--gic_of_init
--gic_init_bases
--set_handle_irq(gic_handle_irq);
#ifdef CONFIG_GENERIC_IRQ_MULTI_HANDLER
int __init set_handle_irq(void (*handle_irq)(struct pt_regs *))
{
if (handle_arch_irq)
return -EBUSY;
handle_arch_irq = handle_irq;
return 0;
}
#endif
handle_arch_irq是一個全域的函式指標,在gicv3的驅動中會指向gic_handle_irq()函式,這個函式就是gicv3的c語言中斷處理的入口函式,我們看下irq_handle的匯編代碼:
(1)加載全域函式指標handle_arch_irq的地址到x1暫存器,
(2)保存sp到x0,
(3)irq_stack_entry是一個宏,主要目的是切換行程內核堆疊為irq堆疊,irq堆疊在init_IRQ->init_irq_stacks時初始化,每個cpu一個,同時它也會將行程內核堆疊指標保存到x19暫存器,便于中斷處理完成以后恢復到內核堆疊,
(4)跳轉到x1暫存器中保存的地址,也就是跳轉到gic_handle_irq()函式,
(5)irq_stack_exit,前面已經將行程內核堆疊指標保存在x19暫存器,這里會切換irq堆疊到行程的內核堆疊,
5.2.2 gic_handle_irq
static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs)
{
u32 irqnr;
irqnr = do_read_iar(regs); //cpu通過讀gic cpu interface的ICC_IAR1_EL1暫存器來應答該中斷并且得到硬體中斷號 ----(1)
/* Check for special IDs first */
if ((irqnr >= 1020 && irqnr <= 1023))
return;
if (gic_supports_nmi() &&
unlikely(gic_read_rpr() == GICD_INT_NMI_PRI)) {
gic_handle_nmi(irqnr, regs);
return;
}
if (gic_prio_masking_enabled()) {
gic_pmr_mask_irqs();
gic_arch_enable_irqs();
}
if (static_branch_likely(&supports_deactivate_key))
gic_write_eoir(irqnr); //往ICC_EOIR1_EL1寫入硬體中斷號,表示中斷結束 ----(2)
else
isb();
if (handle_domain_irq(gic_data.domain, irqnr, regs)) {//中斷處理主體函式 ----(3)
WARN_ONCE(true, "Unexpected interrupt received!\n");
gic_deactivate_unhandled(irqnr);//往ICC_DIR_EL1寫入硬體中斷號,表示deactive該中斷 ----(4)
}
}
(2)和(4),為什么來了一個中斷以后就寫EOI表示中斷結束,此時中斷都還沒有執行,
在GIC v3協議中定義, 處理完中斷后,軟體必須通知中斷控制器已經處理了中斷,以便狀態機可以轉換到下一個狀態,
GICv3架構將中斷的完成分為2個階段:
Priority Drop: 將運行優先級降回到中斷之前的值,
**Deactivation:**更新當前正在處理的中斷的狀態機, 從活動狀態轉換到非活動狀態,
這兩個階段可以在一起完成,也可以分為2步完成, 取決于EOImode的值,
如果EOIMode = 0, 對ICC_EOIR1_EL1暫存器的操作代表2個階段(priority drop 和 deactivation)一起完成,
如果EOImode = 1, 對ICC_EOIR1_EL1暫存器的操作只會導致Priority Drop, 如果想要表示中斷已經處理完成,還需要寫ICC_DIR_EL1,
所以回答上面的問題, 當前Linux GIC的代碼,默認irq chip是EIOmode=1, 所以單獨的寫EOIR1_EL1不是代表中斷結束,
static int gic_irq_domain_map(struct irq_domain *d, unsigned int irq,
irq_hw_number_t hw)
{
if (static_branch_likely(&supports_deactivate_key))
chip = &gic_eoimode1_chip;
}
繼續分析中斷控制器中斷處理的主體函式handle_domain_irq:
--handle_domain_irq(gic_data.domain, irqnr, regs)
--__handle_domain_irq(domain, hwirq, true, regs);
/**
* __handle_domain_irq - Invoke the handler for a HW irq belonging to a domain
* @domain: The domain where to perform the lookup
* @hwirq: The HW irq number to convert to a logical one
* @lookup: Whether to perform the domain lookup or not
* @regs: Register file coming from the low-level handling code
*
* Returns: 0 on success, or -EINVAL if conversion has failed
*/
int __handle_domain_irq(struct irq_domain *domain, unsigned int hwirq,
bool lookup, struct pt_regs *regs)
{
struct pt_regs *old_regs = set_irq_regs(regs);
unsigned int irq = hwirq;
int ret = 0;
irq_enter();//顯式的告訴linux內核現在要進入中斷背景關系
#ifdef CONFIG_IRQ_DOMAIN
if (lookup)
irq = irq_find_mapping(domain, hwirq);//通過硬體中斷號hwirq獲取虛擬中斷號virq ----(1)
#endif
/*
* Some hardware gives randomly wrong interrupts. Rather
* than crashing, do something sensible.
*/
if (unlikely(!irq || irq >= nr_irqs)) {
ack_bad_irq(irq);
ret = -EINVAL;
} else {
generic_handle_irq(irq);//中斷處理 ----(2)
}
irq_exit();//與irq_enter相反,表示中斷已經處理完成
set_irq_regs(old_regs);
return ret;
}
#ifdef CONFIG_IRQ_DOMAIN
(1)將硬中斷號hwirq作為索引,在domain->revmap_tree指向的radix tree中找到對應的irq_data,回傳irq_data資料結構中保存的虛擬中斷號virq,
(2)generic_handle_irq(irq);
/**
* generic_handle_irq - Invoke the handler for a particular irq
* @irq: The irq number to handle
*
*/
int generic_handle_irq(unsigned int irq)
{
struct irq_desc *desc = irq_to_desc(irq); //以virq作為索引,找到對應的irq_desc ----(1)
struct irq_data *data;
if (!desc)
return -EINVAL;
data = irq_desc_get_irq_data(desc);
if (WARN_ON_ONCE(!in_irq() && handle_enforce_irqctx(data)))
return -EPERM;
generic_handle_irq_desc(desc);// ----(2)
return 0;
}
/*
* Architectures call this to let the generic IRQ layer
* handle an interrupt.
*/
static inline void generic_handle_irq_desc(struct irq_desc *desc)
{
desc->handle_irq(desc);//呼叫desc->handle_irq指向的中斷處理回呼函式
}
(1)之前在分析中斷域的時候已經介紹過,linux內核通過一個radix tree–irq_desc_tree保存虛擬中斷號virq和irq_desc之間的映射關系,在這里以virq作為索引,找到對應的irq_desc資料結構,
(2)generic_handle_irq_desc(desc);會呼叫desc->handle_irq指向的中斷處理回呼函式完成中斷的處理,
--generic_handle_irq_desc(desc);
--desc->handle_irq(desc);
desc->handle_irq指向的回呼函式是在哪里定義的呢?
在這里我們回顧一下硬體中斷號到虛擬中斷號的映射程序,在列舉設備樹中總線下的每一個device node的時候,會去建立該device的中斷映射關系,
----of_device_alloc(np, bus_id, parent);
----of_irq_to_resource_table(np, res, num_irq)// kernel/drivers/of/irq.c
----of_irq_to_resource(dev, i, res)
----of_irq_get(dev, index)
----irq_create_of_mapping // kernel/irq/irqdomain.c 在這里完成具體的映射程序
----irq_create_fwspec_mapping(&fwspec); //struct irq_fwspec fwspec通過決議設備樹得到
----virq = irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, fwspec);// (3)分配一個虛擬中斷號,從allocated_irqs位圖中取第一個空閑的bit位作為虛擬中斷號virq,分配一個irq_desc結構體,通過radix_tree建立virq和irq_desc的映射關系
----ret = irq_domain_alloc_irqs_hierarchy(domain, virq, nr_irqs, arg); // (5)回呼irq_domain ops提供的alloc方法,填充irq_data結構體,irq_data->hwirq = hwirq;,irq_data->chip = chip;chip指向在gic-v3驅動中定義的gic-v3底層操作相關的方法集合,
----gic_irq_domain_alloc
----gic_irq_domain_map
----irq_domain_set_info(d, irq, hw, chip, d->host_data,handle_fasteoi_irq, NULL, NULL);
對應SPI型別和LPI型別的中斷,在irq_domain_set_info函式中會將desc->handle()回呼函式指向handle_fasteoi_irq();
kernel/irq/chip.c
/**
* handle_fasteoi_irq - irq handler for transparent controllers
* @desc: the interrupt description structure for this irq
*
* Only a single callback will be issued to the chip: an ->eoi()
* call when the interrupt has been serviced. This enables support
* for modern forms of interrupt handlers, which handle the flow
* details in hardware, transparently.
*/
void handle_fasteoi_irq(struct irq_desc *desc)
{
struct irq_chip *chip = desc->irq_data.chip;
raw_spin_lock(&desc->lock);
if (!irq_may_run(desc))
goto out;
desc->istate &= ~(IRQS_REPLAY | IRQS_WAITING);
/*
* If its disabled or no action available
* then mask it and get out of here:
*/
if (unlikely(!desc->action || irqd_irq_disabled(&desc->irq_data))) { // ----(1)
desc->istate |= IRQS_PENDING;
mask_irq(desc);
goto out;
}
kstat_incr_irqs_this_cpu(desc); //我們一般在終端通過cat /proc/interrupts查看中斷計數,這個計數是在這里增加的
if (desc->istate & IRQS_ONESHOT)//如果該中斷的型別是IRQS_ONESHOT,那么呼叫mask_irq函式屏蔽該中斷源
mask_irq(desc);
handle_irq_event(desc);// ----(2)
cond_unmask_eoi_irq(desc, chip);// ----(3)
raw_spin_unlock(&desc->lock);
return;
out:
if (!(chip->flags & IRQCHIP_EOI_IF_HANDLED))
chip->irq_eoi(&desc->irq_data);
raw_spin_unlock(&desc->lock);
}
EXPORT_SYMBOL_GPL(handle_fasteoi_irq);
(1)如果該中斷沒有指定action描述符或者中斷關閉了IRQD_IRQ_DISABLED,那么設定該中斷狀態為IRQS_PENDING, 并呼叫irq_mask()函式屏蔽該中斷,
(2)handle_irq_event函式是中斷處理的核心函式,
(3)當中斷處理完成以后需要中斷控制器的irq_chip資料結構里的irq_eoi回呼函式來發送一個EOI信號,通知中斷控制器中斷已經處理完畢,
handle_irq_event()->handle_percpu_devid_irq()->__handle_irq_event_percpu()
irqreturn_t __handle_irq_event_percpu(struct irq_desc *desc, unsigned int *flags)
{
irqreturn_t retval = IRQ_NONE;
unsigned int irq = desc->irq_data.irq;
struct irqaction *action;
record_irq_time(desc);
for_each_action_of_desc(desc, action) { // for回圈用于遍歷中斷描述符irq_desc中的action鏈表,在中斷函式request_threaded_irq注冊的程序中,會將中斷對應的irqaction添加到該鏈表中
irqreturn_t res;
/*
* If this IRQ would be threaded under force_irqthreads, mark it so.
*/
if (irq_settings_can_thread(desc) &&
!(action->flags & (IRQF_NO_THREAD | IRQF_PERCPU | IRQF_ONESHOT)))
lockdep_hardirq_threaded();
trace_irq_handler_entry(irq, action);
res = action->handler(irq, action->dev_id);// 依次執行回呼函式action->handler
trace_irq_handler_exit(irq, action, res);
if (WARN_ONCE(!irqs_disabled(),"irq %u handler %pS enabled interrupts\n",
irq, action->handler))
local_irq_disable();
switch (res) {
case IRQ_WAKE_THREAD:
/*
* Catch drivers which return WAKE_THREAD but
* did not set up a thread function
*/
if (unlikely(!action->thread_fn)) {
warn_no_thread(irq, action);
break;
}
__irq_wake_thread(desc, action); // 如果action->handler回傳值是IRQ_WAKE_THREAD,說明存在執行緒化中斷函式,在這里會喚醒對應的中斷執行緒進行處理
fallthrough; /* to add to randomness */
case IRQ_HANDLED:
*flags |= action->flags;
break;
default:
break;
}
retval |= res;
}
return retval;
}
下面給出ARM64高層中斷處理的流程圖:

6、中斷函式的注冊
在撰寫外設驅動時通常需要注冊中斷,在linux2.6.30內核以后新增了執行緒化的中斷注冊函式request_threaded_irq(),目的是降低中斷處理對系統實時延遲的影響,
kernel/kernel/irq/manage.c
/**
* request_threaded_irq - allocate an interrupt line
* @irq: Interrupt line to allocate
* @handler: Function to be called when the IRQ occurs.
* Primary handler for threaded interrupts
* If NULL and thread_fn != NULL the default
* primary handler is installed
* @thread_fn: Function called from the irq handler thread
* If NULL, no irq thread is created
* @irqflags: Interrupt type flags
* @devname: An ascii name for the claiming device
* @dev_id: A cookie passed back to the handler function
*
* This call allocates interrupt resources and enables the
* interrupt line and IRQ handling. From the point this
* call is made your handler function may be invoked. Since
* your handler function must clear any interrupt the board
* raises, you must take care both to initialise your hardware
* and to set up the interrupt handler in the right order.
*
* If you want to set up a threaded irq handler for your device
* then you need to supply @handler and @thread_fn. @handler is
* still called in hard interrupt context and has to check
* whether the interrupt originates from the device. If yes it
* needs to disable the interrupt on the device and return
* IRQ_WAKE_THREAD which will wake up the handler thread and run
* @thread_fn. This split handler design is necessary to support
* shared interrupts.
*
* Dev_id must be globally unique. Normally the address of the
* device data structure is used as the cookie. Since the handler
* receives this value it makes sense to use it.
*
* If your interrupt is shared you must pass a non NULL dev_id
* as this is required when freeing the interrupt.
*
* Flags:
*
* IRQF_SHARED Interrupt is shared
* IRQF_TRIGGER_* Specify active edge(s) or level
*
*/
int request_threaded_irq(unsigned int irq, irq_handler_t handler,
irq_handler_t thread_fn, unsigned long irqflags,
const char *devname, void *dev_id)
{
struct irqaction *action;
struct irq_desc *desc;
int retval;
if (irq == IRQ_NOTCONNECTED)
return -ENOTCONN;
/*
* Sanity-check: shared interrupts must pass in a real dev-ID,
* otherwise we'll have trouble later trying to figure out
* which interrupt is which (messes up the interrupt freeing
* logic etc).
*
* Also IRQF_COND_SUSPEND only makes sense for shared interrupts and
* it cannot be set along with IRQF_NO_SUSPEND.
*/
if (((irqflags & IRQF_SHARED) && !dev_id) ||
(!(irqflags & IRQF_SHARED) && (irqflags & IRQF_COND_SUSPEND)) ||
((irqflags & IRQF_NO_SUSPEND) && (irqflags & IRQF_COND_SUSPEND)))
return -EINVAL; //對于那些使用共享中斷的外設,這里強制要求傳遞一個引數dev_id,如果沒有額外引數,中斷處理程式無法識別究竟是哪個外設產生的中斷,通常根據dev_id查詢設備暫存器來確定是哪個外設發生的中斷
desc = irq_to_desc(irq);//獲取該中斷號對應的中斷描述符irq_desc
if (!desc)
return -EINVAL;
if (!irq_settings_can_request(desc) ||
WARN_ON(irq_settings_is_per_cpu_devid(desc)))
return -EINVAL;
if (!handler) {
if (!thread_fn)
return -EINVAL;
handler = irq_default_primary_handler;//handler和thread_fn不能同時為空
}
action = kzalloc(sizeof(struct irqaction), GFP_KERNEL);
if (!action)
return -ENOMEM;
action->handler = handler;
action->thread_fn = thread_fn;
action->flags = irqflags;
action->name = devname;
action->dev_id = dev_id;//分配一個irqaction資料結構,并填充相應的成員
retval = irq_chip_pm_get(&desc->irq_data);
if (retval < 0) {
kfree(action);
return retval;
}
retval = __setup_irq(irq, desc, action);// ----(1)
if (retval) {
irq_chip_pm_put(&desc->irq_data);
kfree(action->secondary);
kfree(action);
}
#ifdef CONFIG_DEBUG_SHIRQ_FIXME
if (!retval && (irqflags & IRQF_SHARED)) {
/*
* It's a shared IRQ -- the driver ought to be prepared for it
* to happen immediately, so let's make sure....
* We disable the irq to make sure that a 'real' IRQ doesn't
* run in parallel with our fake.
*/
unsigned long flags;
disable_irq(irq);
local_irq_save(flags);
handler(irq, dev_id);
local_irq_restore(flags);
enable_irq(irq);
}
#endif
return retval;
}
EXPORT_SYMBOL(request_threaded_irq);
(1)request_threaded_irq會繼續呼叫__setup_irq函式完成中斷的注冊程序,這里會把已經初始化好的irq_action添加到中斷描述符irq_desc的irq_action鏈表中,還會完成一些中斷執行緒化有關的設定,這樣每次中斷觸發以后,會通過virq獲取相應的irq_desc然后呼叫irq_action指向的中斷處理函式,
7、中斷的上下半部機制(TODO)
8、ITS介紹以及代碼分析
8.1 ITS概述
在前面介紹gicv3的時候提到,在gicv3中定義了一個新的中斷型別,LPI(locality-specific peripheral interrupts),LPI是一種基于訊息的中斷,中斷資訊不再通過中斷線進行傳遞,
gicv3定義了兩種方法實作LPI中斷:
-
forwarding方式
外設可以通過訪問redistributor的暫存器GICR_SERLPIR,直接發送LPI中斷,
-
使用ITS方式
ITS(Interrupt Translation Service)在GICv3中是可選的,ITS負責接收來自外設的中斷,并將它們轉化為LPI INTID發送到相應的Redistributor,
一般而言比較推薦使用ITS實作LPI,因為ITS提供了很多特性,在中斷源比較多的場景,可以更加高效,
外設通過寫GITS_TRANSLATER暫存器,發起LPI中斷,此時ITS會獲得2個資訊:
EventID: 值保存在GITS_TRANSLATER暫存器中,表示外設發送中斷的事件型別
DeviceID: 表示哪一個外設發起LPI中斷,
ITS將DeviceID和eventID,通過一系列查表,得到LPI中斷號,再使用LPI中斷號查表,得到該中斷的目標cpu,
8.2 The ITS table
當前,ITS使用三種型別的表來處理LPI的轉換和路由:
device table: 映射deviceID到中斷轉換表
interrupt translation table:映射EventID到INTID,以及INTID屬于的collection組
collection table:映射collection到Redistributor

所以一個ITS完整的處理流程是:
當外設往GITS_TRANSLATER暫存器中寫資料后(包含device ID和event ID),ITS做如下操作:
1、使用DeviceID,從設備表(device table entry)中選擇索引為DeviceID的表項,從該表項中,得到中斷 轉換表(interrupt translation table)的位置,
2、使用EventID,從中斷轉換表中選擇索引為EventID的表項,得到中斷號,以及中斷所屬的collection號,
3、使用collection號,從collection表格中,選擇索引為collection號的表項,得到redistributor的映射資訊,
4、根據collection表項的映射資訊,將中斷資訊,發送給對應的redistributor,

8.3 The ITS Command
its是由its的命令控制的,命令佇列是一個回圈buffer, 由三個暫存器定義,
GITS_CBASER: 指定命令佇列的基地址和大小,命令佇列必須64KB對齊,大小必須是4K的倍數,命令佇列中的每一個索引是32位元組,該暫存器還指定訪問命令佇列時its的cacheability和shareability的設定,
GITS_CREADR: 指向ITS將處理的下一個命令
GITS_CWRITER: 指向佇列中應寫入下一個新命令的索引,

在its的初始化程序以及lpi中斷上報等程序中,會涉及到ITS command的發送, 具體的its commad指令參考:GICv3_Software_Overview_Official_Release_B
8.4 ITS代碼分析
了解了ITS的具體作用以及處理流程后,下面分析linux內核中ITS相關代碼,
its相關的代碼位于drivers/irqchip/irq-gic-v3-its.c
8.4.1 ITS資料結構介紹
/*
* The ITS structure - contains most of the infrastructure, with the
* top-level MSI domain, the command queue, the collections, and the
* list of devices writing to it.
*
* dev_alloc_lock has to be taken for device allocations, while the
* spinlock must be taken to parse data structures such as the device
* list.
*/
struct its_node {
raw_spinlock_t lock;
struct mutex dev_alloc_lock;
struct list_head entry;
void __iomem *base;
void __iomem *sgir_base;
phys_addr_t phys_base;
struct its_cmd_block *cmd_base;
struct its_cmd_block *cmd_write;
struct its_baser tables[GITS_BASER_NR_REGS];
struct its_collection *collections;
struct fwnode_handle *fwnode_handle;
u64 (*get_msi_base)(struct its_device *its_dev);
u64 typer;
u64 cbaser_save;
u32 ctlr_save;
u32 mpidr;
struct list_head its_device_list;
u64 flags;
unsigned long list_nr;
int numa_node;
unsigned int msi_domain_flags;
u32 pre_its_base; /* for Socionext Synquacer */
int vlpi_redist_offset;
};
base : its node的虛擬地址
phys_base: its node的物理地址
cmd_base: 命令佇列的基地址
cmd_write: 指向佇列中下一個命令的地址
tables[]: 指向device table或vpe table的結構體
collection: 指向its_collection結構體, 主要保存映射到的gicr的地址
cbaser_save: 保存cbaser暫存器的資訊
ctlr_save:保存ctlr暫存器的
8.4.2 ITS的初始化
在gic初始化時,會進行ITS的初始化,
its的初始化操作主要是為its的device table以及collection table分配記憶體,并使能its,
static int __init gic_init_bases(void __iomem *dist_base,
struct redist_region *rdist_regs,
u32 nr_redist_regions,
u64 redist_stride,
struct fwnode_handle *handle)
{
.......
if (gic_dist_supports_lpis()) { // ----(1)
its_init(handle, &gic_data.rdists, gic_data.domain); // ----(2)
its_cpu_init(); // ----(3)
}
(1)ITS需要使能內核配置 CONFIG_ARM_GIC_V3_ITS. 如果架構支持LPI, 則進行ITS的初始化,通過讀GICD_TYPER(Interrupt Controller Type Register)暫存器的bit17查看架構是否支持LPI,
(2)ts_init是 its的初始化入口,第三個引數需要注意下,它指定了its的parent domain是gic domain,

(3)its_cpu_init 是在its初始化完成后,進行its的一些額外的配置,如enable lpi以及系結its collection到its 目的redistributour,
8.4.2.1 its初始化函式its_init
int __init its_init(struct fwnode_handle *handle, struct rdists *rdists,
struct irq_domain *parent_domain)
{
struct device_node *of_node;
struct its_node *its;
bool has_v4 = false;
bool has_v4_1 = false;
int err;
gic_rdists = rdists;
its_parent = parent_domain;
of_node = to_of_node(handle);
if (of_node)
its_of_probe(of_node); // ----(1)
else
its_acpi_probe();
if (list_empty(&its_nodes)) {
pr_warn("ITS: No ITS available, not enabling LPIs\n");
return -ENXIO;
}
err = allocate_lpi_tables(); // ----(2)
if (err)
return err;
list_for_each_entry(its, &its_nodes, entry) {
has_v4 |= is_v4(its);
has_v4_1 |= is_v4_1(its);
}
/* Don't bother with inconsistent systems */
if (WARN_ON(!has_v4_1 && rdists->has_rvpeid))
rdists->has_rvpeid = false;
if (has_v4 & rdists->has_vlpis) {
const struct irq_domain_ops *sgi_ops;
if (has_v4_1)
sgi_ops = &its_sgi_domain_ops;
else
sgi_ops = NULL;
if (its_init_vpe_domain() ||
its_init_v4(parent_domain, &its_vpe_domain_ops, sgi_ops)) {
rdists->has_vlpis = false;
pr_err("ITS: Disabling GICv4 support\n");
}
}
register_syscore_ops(&its_syscore_ops);
return 0;
}
(1)its_of_probe
--its_of_probe
--its_probe_one
--its_force_quiescent //讓ITS處于非活動狀態,在非靜止狀態改變ITS的配置會有安全的風險
--kzalloc its_node and init //為its_node分配空間,并對其進行初始化配置
--its_alloc_tables //為device table 和 vpe table分配記憶體
--its_alloc_collections //為collection table中映射到的gicr 地址分配記憶體; 每一個its都有一個collection table, ct可以保存在暫存器(GITS_BASER)或者記憶體(GITS_TYPER.HCC)
--its_init_domain // its domain初始化,注冊its domain相關操作
its probe程序, 主要是初始化its node資料結構, 為its tables分配記憶體, 初始化its domain并注冊its domain相關操作,
its_domain初始化程序中,會指定its irq_domain的host_data為msi_domain_info, 在info->ops.prepare程序中會去創建ITS設備, its translation table會在那個階段分配記憶體,
(2)allocate_lpi_tables
--allocate_lpi_tables
--its_setup_lpi_prop_table
--its_allocate_prop_table
--its_lpi_init
ITS 是為LPI服務的,所以在ITS初始化程序中還需要初始化LPI需要的兩張表
(LPI configuration table, LPI pending tables ), 然后進行lpi的初始化,
LPI的這兩張表就是LPI和其他型別中斷的區別所在: LPI的中斷的配置,以及中斷的狀態,是保存在memory的表中,而不是保存在gic的暫存器中的,
LPI 中斷配置表:
中斷配置表的基地址由GICR_PROPBASER暫存器決定,
對于LPI配置表,每個LPI中斷占用1個位元組(bit[7:0]),指定了該中斷的使能(bit 0)和中斷優先級(bit[7:2]),
當外部發送LPI中斷給redistributor,redistributor首先要訪問memory來獲取LPI中斷的配置表,為了加速這程序,redistributor中可以配置cache,用來快取LPI中斷的配置資訊,
因為有了cache,所以LPI中斷的配置資訊,就有了2份拷貝,一份在memory中,一份在redistributor的cache中,如果軟體修改了memory中的LPI中斷的配置資訊,需要將redistributor中的cache資訊給無效掉,
通過該介面刷相關dcache
gic_flush_dcache_to_poc()
LPI 中斷狀態表
中單狀態表的基地址由GICR_PENDBASER暫存器決定, 該暫存器還可以設定LPI中斷狀態表memory的屬性,如shareability,cache屬性等,
該狀態表主要用于查看LPI是否pending狀態,
該中斷狀態表由redistributor來設定,每個LPI中斷,占用一個bit空間,
0: 該LPI中斷,沒有處于pending狀態
1: 該LPI中斷,處于pending狀態
8.4.2.2 its_cpu_init
--its_cpu_init
--its_cpu_init_lpis //配置lpi 配置表和狀態表, 以及使能lpi
--its_cpu_init_collections //系結每一個collection到target redistributor
--its_cpu_init_collection
--its_send_mapc //發送its mapc command, mapc主要用于映射collection到目的redistributor
--its_send_invall //指定 memory中的LPI中斷的配置資訊和cache中保存的必須一致
8.4.3 its中斷上報
和gic類似, 在中斷上報時,如果設備掛載在its 下, 會呼叫到its domain的一系列operation
static const struct irq_domain_ops its_domain_ops = {
.alloc = its_irq_domain_alloc,
.free = its_irq_domain_free,
.activate = its_irq_domain_activate,
.deactivate = its_irq_domain_deactivate,
};
參考資料
1、GICv3_Software_Overview_Official_Release_B
2、corelink_gic600_generic_interrupt_controller_technical_reference_manual_100336_0106_00_en
3、IHI0069D_gic_architecture_specification
4、ARM GICv3中斷控制器
https://blog.csdn.net/yhb1047818384/article/details/86708769#comments_17151284
5、ARM GICv3 GIC代碼分析
https://blog.csdn.net/yhb1047818384/article/details/87561438
6、Arm_Architecture_Reference_Manual_Armv8_for Armv8-A_architecture_profile
7、奔跑吧Linux內核
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/385596.html
標籤:其他
