跳到主要内容

L0 Entry -- 入口与编排层

功能概述

L0 是系统入口层,负责:

  • API 端点: REST 端点 + WebSocket 实时推送
  • 配置转换: dict -> EvalConfig 单一转换点
  • 管线编排: 按 mode 路由到 Math 或 G5 管线,协调 L1-L5 执行完整评估流程
  • 任务管理: 异步队列、进度回调、实验管理
  • 数据存储: SQLite 持久化 (Experiment/EvaluationResult)

不在范围: 不做切分/评估/报告的具体计算。

模块清单

模块职责
api/FastAPI 路由定义 (预设 CRUD, 仿真, 任务, 实验等 10 个子模块)
engine.pyrun_evaluation() / run_evaluation_from_request()
eval_config.pyEvalConfig dataclass + build_eval_config()
config_loader.pyYAML 预设加载 (load_chip, load_model, load_topology, load_benchmark)
config_schema.pyPydantic 请求模型 + 配置验证
types.pyDataType / ParallelMode 等核心枚举
tasks.pyTaskManager 任务队列 (ThreadPoolExecutor)
websocket.pyWebSocket 管理 (/ws/tasks)
compat.py前端兼容层 (Gantt/Stats 格式转换)
topology_format.py拓扑格式转换 (grouped_pods 格式解析)
storage/database.pySQLAlchemy ORM (Experiment, EvaluationResult)

EvalConfig 详细设计

子配置结构

@dataclass
class MLAConfig:
q_lora_rank: int # Q 低秩维度 (1536)
kv_lora_rank: int # KV 低秩维度 (512)
qk_nope_head_dim: int # 非 RoPE head 维度 (128)
qk_rope_head_dim: int # RoPE head 维度 (64)
v_head_dim: int # V head 维度 (128)
mla_mode: str # "standard" | "absorb" | "auto"

@dataclass
class MoEConfig:
num_routed_experts: int # 专家总数 (256)
num_shared_experts: int # 共享专家数 (1)
num_activated_experts: int # 激活专家数 (8)
intermediate_size: int # MoE FFN 中间层 (2048)

@dataclass
class ModelConfig:
name: str
hidden_size: int; num_layers: int; num_attention_heads: int
vocab_size: int; intermediate_size: int
num_dense_layers: int; num_moe_layers: int
mla: MLAConfig; moe: MoEConfig
# 运行时参数 (从 DeploymentConfig 注入)
weight_dtype: str; activation_dtype: str
batch: int; is_prefill: bool
q_seq_len: int; kv_seq_len: int # 从 DeploymentConfig 传入

@dataclass
class TopologyOverrides:
c2c_bandwidth_gbps: float; c2c_latency_us: float
b2b_bandwidth_gbps: float; b2b_latency_us: float
r2r_bandwidth_gbps: float; r2r_latency_us: float
p2p_bandwidth_gbps: float; p2p_latency_us: float
switch_latency_us: float; cable_latency_us: float
memory_read_latency_us: float; memory_write_latency_us: float
noc_latency_us: float; die_to_die_latency_us: float

@dataclass
class CommOverrides:
bw_utilization: float # 带宽利用率 (0-1)
sync_lat_us: float # 同步延迟 (us)

@dataclass
class DeploymentConfig:
tp: int; pp: int; dp: int; ep: int; moe_tp: int
seq_len: int; batch_size: int
q_seq_len: int; kv_seq_len: int # 由 inference_config 推导
enable_tp_sp: bool; enable_ring_attention: bool
enable_zigzag: bool # Zigzag 流水线优化
enable_tbo: bool # MoE 计算/通信 Tile-Block Overlap
embed_tp: int; lmhead_tp: int; comm_protocol: int
kv_cache_rate: float; is_prefill: bool

@dataclass
class BoardConfig:
num_chips: int; chip_memory_gb: int; inter_chip_bw_gbps: float

@dataclass
class InferenceConfig:
batch_size: int; input_seq_length: int; output_seq_length: int
weight_dtype: str; activation_dtype: str

build_eval_config() 转换逻辑

输入:
chip_config: dict <- topology_config.chips 的第一个芯片
model_config: dict <- benchmark_config.model (嵌套 YAML 格式)
topology_config: dict <- 完整拓扑 (含 interconnect.links + comm_params)
manual_parallelism: dict <- 前端并行配置 (含 enable_zigzag, enable_tbo 等)
inference_config: dict <- benchmark_config.inference

转换:
1. topology_config.interconnect.links -> TopologyOverrides (c2c/b2b/r2r/p2p bw+lat)
2. topology_config.interconnect.comm_params -> TopologyOverrides (switch/cable/memory/noc/d2d lat)
3. topology_config.interconnect.comm_params -> CommOverrides (bw_utilization, sync_lat)
4. manual_parallelism + inference_config -> DeploymentConfig
- input_seq_length -> q_seq_len (prefill) / 1 (decode)
- input_seq_length -> kv_seq_len
- enable_tbo, enable_zigzag 直接透传
5. model_config(嵌套) + DeploymentConfig 运行时参数 -> ModelConfig
6. topology_config.pods 结构 -> BoardConfig (num_chips, chip_memory)
7. inference_config -> InferenceConfig
8. 所有字段缺失时 raise ValueError

输出:
EvalConfig (全管线单一 source of truth)

run_evaluation() 编排流程

def run_evaluation(eval_config: EvalConfig, progress_callback=None) -> dict:
# L1: 构建 WorkloadIR
model = DeepSeekV3Model.from_model_config(eval_config.model)
ir = model.to_ir()

# L2: 加载 ChipSpec
chip = ChipSpecImpl.from_config(name, eval_config.chip_config)

# L3 共用: Parallelism Planning
deployment = DeploymentSpec(...) # 从 eval_config.deployment
board = BoardSpec(...) # 从 eval_config.board
dist_model = ParallelismPlanner(deployment, board).plan(ir)

# 按 mode 路由
if eval_config.mode == "math":
# L3 Math: Tiling + Scheduling
tile_plan = TilingPlanner(chip, l4_evaluator).plan(dist_model)
exec_plan = Scheduler().plan(dist_model, tile_plan)
# L4 Math: Evaluation
hardware = _build_hardware_spec(chip, eval_config)
engine_result = EvaluationEngine().evaluate(
exec_plan, dist_model, hardware,
prefill_ops=prefill_op_ids,
is_prefill=True,
deployment_config=deployment_dict,
)
else: # mode == "g5"
# L3 G5: Instruction Emission
core_program = G5InstructionEmitter().emit(dist_model)
# L4 G5: Simulation
engine_result = G5Pipeline().run(core_program, chip)

# L5: Reporting
report = ReportingEngine().run(engine_result, config=run_config)
return result

_build_hardware_spec() 核心逻辑

eval_config.topology 注入全部通信参数,从 chip_config 注入计算参数:

topology_spec = TopologySpec(
c2c_bandwidth_gbps=topo.c2c_bandwidth_gbps,
b2b_bandwidth_gbps=topo.b2b_bandwidth_gbps,
... # 全部 14 个字段,不使用默认值
)
comm_spec = CommProtocolSpec(
bw_utilization=eval_config.comm.bw_utilization,
sync_lat_us=eval_config.comm.sync_lat_us,
)
# chip_config 额外注入两个效率参数
hardware["compute_efficiency"] = chip_config["compute_efficiency"]
hardware["compute_dma_overlap_rate"] = chip_config["compute_dma_overlap_rate"]
hardware = merge_specs(hardware_spec, topology_spec, comm_spec)

API 端点一览

方法路径说明
GET/api/health健康检查
GET/api/presets/chips列出芯片预设
GET/api/presets/models列出模型预设
GET/api/topologies列出拓扑配置
GET/api/benchmarks列出 Benchmark
POST/api/simulate同步仿真 (EvaluationRequest)
POST/api/validate配置验证
POST/api/evaluation/submit异步评估任务提交
GET/api/evaluation/tasks查询任务状态
GET/api/evaluation/experiments查询实验列表
POST/api/evaluation/experiments/export导出实验
POST/api/evaluation/experiments/check-import导入检查
POST/api/evaluation/experiments/execute-import执行导入
WS/ws/tasks实时任务状态推送

数据持久化

localStorage (前端)        -- 临时 UI 状态缓存
|
in-memory TaskQueue -- 会话级任务队列 (ThreadPoolExecutor)
|
SQLite (SQLAlchemy) -- 永久存储
+-- Experiment -- 实验元数据 (name, description, timestamps)
+-- EvaluationResult -- 任务状态 + 结果数据 (tps, mfu, full_result JSON)
|
JSON Export -- 离线快照 (导入/导出)

数据库结构为 2 层 (Experiment -> EvaluationResult), 不再有中间 EvaluationTask 表。

配置预设系统

backend/perf_model/configs/
+-- chips/
| +-- SG2262.yaml # SG2262 (64 cores, ~786 TFLOPS FP8)
| +-- _template.yaml # 芯片配置模板 (含字段注释)
+-- models/
| +-- DeepSeek-V3-671B-A37B.yaml
| +-- DeepSeek-V3.2-671B-A37B.yaml
| +-- Qwen3-235b-a22b.yaml / Qwen3-32b.yaml / LLaMA-7b.yaml
| +-- _template.yaml
+-- topologies/
| +-- P1-R1-B1-C8.yaml # 1 pod, 1 rack, 1 board, 8 chips
| +-- P1-R1-B4-C32.yaml
| +-- _template.yaml
+-- benchmarks/
+-- DeepSeek-V3-671B-A37B-S4K-O1-W8A8-B2048.yaml
+-- _template.yaml

配置格式详见 07-configs.md