跳到主要内容

01. PAXI 协议总览与定位

什么是PAXI

PAXI (Protocol of Accelerated eXchange Interconnect) 是合见工业软件集团(Shanghai UniVista Industrial Software Group)开发的芯片间互联IP核。

[DOC] PAXI SUE2.0 Core的定义:

"The PAXI (Protocol of Accelerated eXchange Interconnect) core is designed to attach a set of Ethernet cores to the advanced microcontroller bus architecture (AMBA) advanced extensible interface (AXI) bus."

"Transactions initiated by an AXI Master are handled by the PAXI core AXI Slave interface and generate requests on the RC Link interface. Requests received on the Ethernet Link are handled by the PAXI core AXI Master interface and generate AXI transactions."

简言之, PAXI将片上AXI4总线事务透明地映射到以太网物理链路上, 实现芯片间(Chip-to-Chip)的高速互联。从上层软件/NoC的视角看, 远端芯片的内存空间就像本地AXI可寻址空间的一部分。

[推导] SUE2.0相比旧版(Point-to-point AXI Wrapper)的核心变化:

  • 名称从"Point-to-point AXI Wrapper"更改为"Protocol of Accelerated eXchange Interconnect", 反映其已从简单封装器演变为完整的互联协议
  • PAXI不再直接连接MAC, 而是通过新的RC Link传输层进行连接

技术栈定位

PAXI是合见工软完整智算互联方案(SUE Protocol)中的**事务层(Transaction Layer)**组件:

完整技术栈 (SUE2.0):
Application / NoC -- AXI4/APB3 接口
|
PAXI Core -- 事务层: AXI <-> Flit 编码/解码
|
RC Link -- 传输层: 可靠传输/Go-Back-N/速率控制/CBFC
|
CESOC (MAC/PCS/FEC) -- 数据链路层: 以太网控制器
|
SerDes (112G PAM4) -- 物理层

[推导] 与旧版架构的对比:

层级旧版 (v2R0p6)SUE2.0
事务层PAXI Core (协议转换+流控)PAXI Core (纯协议转换)
传输层RDMA Engine (RoCEv2, 可选)RC Link (必选, 内置于协议栈)
数据链路层CESOC (MAC/PCS/FEC)CESOC (MAC/PCS/FEC)
物理层SerDes (112G PAM4)SerDes (112G PAM4)

关键变化: RC Link替代了旧版RDMA Engine的大部分功能, 成为协议栈的必要组件。PAXI不再直接管理per-DA Credit流控, 这些功能下放到RC Link的CBFC机制。

核心特性清单

以下特性引用自PAXI SUE2.0 UserGuide V2R0P5 (1.2.1 General Features):

接口与协议

  • [DOC] Support AXI4 interface for NOC, support all 5 channels
  • [DOC] Support APB3 interface for NOC
  • [DOC] Chip-to-Chip AXI interface connection over Ethernet link
  • [DOC] Compatible with Layer 2 switches
  • [DOC] Configurable MAC DA and VC through AXI user bus

性能

  • [DOC] Ultra low latency down to 150ns(400G) between AXI-to-AXI
  • [DOC] Support 200G/400G Ethernet MAC
  • [DOC] Support overrate mode(220G/440G MAC)
  • [DOC] Configurable Memory ready latency (1~3)

流控与虚拟通道

  • [DOC] Base-on credit flow control with RC Link
  • [DOC] Support 8 CBFC and PFC VC
  • [DOC] Configurable RX AXI Master OST number

多播

  • [DOC] Support Multicast Frame
  • [DOC] Support maximum 8 multicast group
  • [DOC] Each multi-cast group support maximum 16 devices

可靠性

  • [DOC] Support error free transmission with RC LINK E2E retry enabled

调试与测量

  • [DOC] Remote chip register access
  • [DOC] Near-end loopbacks at RC LINK interface
  • [DOC] Latency measurement(Round-Trip Time include noc)
  • [DOC] Support Internal pattern generator
  • [DOC] Various status indicators for debug
  • [DOC] DFX pin out

工艺与集成

  • [DOC] Commercial simulator and synthesizer supported
  • [DOC] Synthesizable at 3nm/4nm/5nm/7nm/8nm/12nm/14nm/16nm
  • [DOC] Configurable synchronizer stage

关键性能指标

指标来源
AXI-to-AXI端到端延迟低至150ns @ 400G (Mode 0: P2P 直连, 无 NoC, 远端 PAXI loopback)[DOC] PAXI Features
支持的MAC带宽200G / 400G[DOC] PAXI Features
超频模式带宽220G / 440G[DOC] PAXI Features
CBFC/PFC虚拟通道数8[DOC] PAXI Features
多播组数最多8组[DOC] PAXI Features
每组多播设备数最多16[DOC] PAXI Features
RX AXI Master OST可配置[DOC] PAXI Features

[推导] 与旧版指标对比:

指标旧版SUE2.0变化
虚拟通道3 (Mgmt/Ctrl/Data)8 CBFC/PFC VC扩展, 改为标准CBFC
OST128/256 二选一可配置更灵活
DA管理128 DA, Per-DA Credit通过Multi DA Enable管理简化
流控机制PAXI内部CreditRC Link CBFC下放到传输层
重传机制MAC L1/L2 RetryRC Link E2E Retry端到端
多播不支持8组x16设备新增

C2C模式 vs Switch模式

[DOC] PAXI支持两种组网模式:

C2C直连模式

Chip A                              Chip B
+---------+ SerDes <-> SerDes +---------+
| PAXI | | PAXI |
| RC Link |----------------------| RC Link |
| CESOC | | CESOC |
+---------+ +---------+
  • 两个芯片通过以太网PHY直接相连
  • 最低延迟: 150ns @ 400G (Mode 0: P2P without NOC, remote PAXI loopback 测量条件)
  • [DOC] 延迟测量两种模式: Mode 0 (P2P 直连, 无 NoC, 远端 PAXI loopback) 和 Mode 1 (P2P with NOC)。150ns 为 Mode 0 结果, 实际部署含 NoC 路由时延迟更高
  • 适用场景: 同板/同模块内的芯片互联

Switch模式

Chip A                              Chip B
+---------+ +----------+ +---------+
| PAXI |------| L2 |----| PAXI |
| RC Link | | Ethernet | | RC Link |
+---------+ | Switch | +---------+
+----------+
Chip C |
+---------+ |
| PAXI |---------+
| RC Link |
+---------+
  • [DOC] Compatible with Layer 2 switches
  • 通过标准二层以太网交换机组网
  • 延迟增加(交换机转发延迟)
  • 适用场景: 多芯片集群, 灵活扩展

[推导] SUE2.0通过RC Link层的多种报文格式(Standard/AFH_GEN1/AFH_GEN2_16b/AFH_Lite)支持不同的交换机兼容模式。

序列顺序保证

[DOC] 来自2.1 PAXI Partitions:

"Note PAXI not guarantee the order for same address access of read and write transfer."

"AXI transaction which send to different DA will not guarantee the order PAXI received on AXI interface."

这意味着:

  • 同一地址的读写事务之间没有顺序保证
  • 发往不同DA的AXI事务之间没有顺序保证
  • 上层软件/NoC需要自行管理顺序依赖
  • [推导] 这是为了最大化吞吐量的设计选择, 避免跨DA和读写之间的流水线阻塞

参考文档

[DOC] SUE2.0 References:

  • MAC 200G Specification (Version 0.3)
  • AMBA AXI and ACE Protocol Specification
  • AMBA 3 APB Protocol Version 1.0 Specification