L0 Entry -- 入口与编排层
功能概述
L0 是系统入口层,负责:
- API 端点: REST 端点 + WebSocket 实时推送
- 配置转换: dict -> EvalConfig 单一转换点
- 管线编排: 按 mode 路由到 Math 或 G5 管线,协调 L1-L5 执行完整评估流程
- 任务管理: 异步队列、进度回调、实验管理
- 数据存储: SQLite 持久化 (Experiment/EvaluationResult)
不在范围: 不做切分/评估/报告的具体计算。
模块清单
| 模块 | 职责 |
|---|---|
api/ | FastAPI 路由定义 (预设 CRUD, 仿真, 任务, 实验等 10 个子模块) |
engine.py | run_evaluation() / run_evaluation_from_request() |
eval_config.py | EvalConfig dataclass + build_eval_config() |
config_loader.py | YAML 预设加载 (load_chip, load_model, load_topology, load_benchmark) |
config_schema.py | Pydantic 请求模型 + 配置验证 |
types.py | DataType / ParallelMode 等核心枚举 |
tasks.py | TaskManager 任务队列 (ThreadPoolExecutor) |
websocket.py | WebSocket 管理 (/ws/tasks) |
compat.py | 前端兼容层 (Gantt/Stats 格式转换) |
topology_format.py | 拓扑格式转换 (grouped_pods 格式解析) |
storage/database.py | SQLAlchemy ORM (Experiment, EvaluationResult) |
EvalConfig 详细设计
子配置结构
@dataclass
class MLAConfig:
q_lora_rank: int # Q 低秩维度 (1536)
kv_lora_rank: int # KV 低秩维度 (512)
qk_nope_head_dim: int # 非 RoPE head 维度 (128)
qk_rope_head_dim: int # RoPE head 维度 (64)
v_head_dim: int # V head 维度 (128)
mla_mode: str # "standard" | "absorb" | "auto"
@dataclass
class MoEConfig:
num_routed_experts: int # 专家总数 (256)
num_shared_experts: int # 共享专家数 (1)
num_activated_experts: int # 激活专家数 (8)
intermediate_size: int # MoE FFN 中间层 (2048)
@dataclass
class ModelConfig:
name: str
hidden_size: int; num_layers: int; num_attention_heads: int
vocab_size: int; intermediate_size: int
num_dense_layers: int; num_moe_layers: int
mla: MLAConfig; moe: MoEConfig
# 运行时参数 (从 DeploymentConfig 注入)
weight_dtype: str; activation_dtype: str
batch: int; is_prefill: bool
q_seq_len: int; kv_seq_len: int # 从 DeploymentConfig 传入
@dataclass
class TopologyOverrides:
c2c_bandwidth_gbps: float; c2c_latency_us: float
b2b_bandwidth_gbps: float; b2b_latency_us: float
r2r_bandwidth_gbps: float; r2r_latency_us: float
p2p_bandwidth_gbps: float; p2p_latency_us: float
switch_latency_us: float; cable_latency_us: float
memory_read_latency_us: float; memory_write_latency_us: float
noc_latency_us: float; die_to_die_latency_us: float
@dataclass
class CommOverrides:
bw_utilization: float # 带宽利用率 (0-1)
sync_lat_us: float # 同步延迟 (us)
@dataclass
class DeploymentConfig:
tp: int; pp: int; dp: int; ep: int; moe_tp: int
seq_len: int; batch_size: int
q_seq_len: int; kv_seq_len: int # 由 inference_config 推导
enable_tp_sp: bool; enable_ring_attention: bool
enable_zigzag: bool # Zigzag 流水线优化
enable_tbo: bool # MoE 计算/通信 Tile-Block Overlap
embed_tp: int; lmhead_tp: int; comm_protocol: int
kv_cache_rate: float; is_prefill: bool
@dataclass
class BoardConfig:
num_chips: int; chip_memory_gb: int; inter_chip_bw_gbps: float
@dataclass
class InferenceConfig:
batch_size: int; input_seq_length: int; output_seq_length: int
weight_dtype: str; activation_dtype: str
build_eval_config() 转换逻辑
输入:
chip_config: dict <- topology_config.chips 的第一个芯片
model_config: dict <- benchmark_config.model (嵌套 YAML 格式)
topology_config: dict <- 完整拓扑 (含 interconnect.links + comm_params)
manual_parallelism: dict <- 前端并行配置 (含 enable_zigzag, enable_tbo 等)
inference_config: dict <- benchmark_config.inference
转换:
1. topology_config.interconnect.links -> TopologyOverrides (c2c/b2b/r2r/p2p bw+lat)
2. topology_config.interconnect.comm_params -> TopologyOverrides (switch/cable/memory/noc/d2d lat)
3. topology_config.interconnect.comm_params -> CommOverrides (bw_utilization, sync_lat)
4. manual_parallelism + inference_config -> DeploymentConfig
- input_seq_length -> q_seq_len (prefill) / 1 (decode)
- input_seq_length -> kv_seq_len
- enable_tbo, enable_zigzag 直接透传
5. model_config(嵌套) + DeploymentConfig 运行时参数 -> ModelConfig
6. topology_config.pods 结构 -> BoardConfig (num_chips, chip_memory)
7. inference_config -> InferenceConfig
8. 所有字段缺失时 raise ValueError
输出:
EvalConfig (全管线单一 source of truth)
run_evaluation() 编排流程
def run_evaluation(eval_config: EvalConfig, progress_callback=None) -> dict:
# L1: 构建 WorkloadIR
model = DeepSeekV3Model.from_model_config(eval_config.model)
ir = model.to_ir()
# L2: 加载 ChipSpec
chip = ChipSpecImpl.from_config(name, eval_config.chip_config)
# L3 共用: Parallelism Planning
deployment = DeploymentSpec(...) # 从 eval_config.deployment
board = BoardSpec(...) # 从 eval_config.board
dist_model = ParallelismPlanner(deployment, board).plan(ir)
# 按 mode 路由
if eval_config.mode == "math":
# L3 Math: Tiling + Scheduling
tile_plan = TilingPlanner(chip, l4_evaluator).plan(dist_model)
exec_plan = Scheduler().plan(dist_model, tile_plan)
# L4 Math: Evaluation
hardware = _build_hardware_spec(chip, eval_config)
engine_result = EvaluationEngine().evaluate(
exec_plan, dist_model, hardware,
prefill_ops=prefill_op_ids,
is_prefill=True,
deployment_config=deployment_dict,
)
else: # mode == "g5"
# L3 G5: Instruction Emission
core_program = G5InstructionEmitter().emit(dist_model)
# L4 G5: Simulation
engine_result = G5Pipeline().run(core_program, chip)
# L5: Reporting
report = ReportingEngine().run(engine_result, config=run_config)
return result
_build_hardware_spec() 核心逻辑
从 eval_config.topology 注入全部通信参数,从 chip_config 注入计算参数:
topology_spec = TopologySpec(
c2c_bandwidth_gbps=topo.c2c_bandwidth_gbps,
b2b_bandwidth_gbps=topo.b2b_bandwidth_gbps,
... # 全部 14 个字段,不使用默认值
)
comm_spec = CommProtocolSpec(
bw_utilization=eval_config.comm.bw_utilization,
sync_lat_us=eval_config.comm.sync_lat_us,
)
# chip_config 额外注入两个效率参数
hardware["compute_efficiency"] = chip_config["compute_efficiency"]
hardware["compute_dma_overlap_rate"] = chip_config["compute_dma_overlap_rate"]
hardware = merge_specs(hardware_spec, topology_spec, comm_spec)
API 端点一览
| 方法 | 路径 | 说明 |
|---|---|---|
| GET | /api/health | 健康检查 |
| GET | /api/presets/chips | 列出芯片预设 |
| GET | /api/presets/models | 列出模型预设 |
| GET | /api/topologies | 列出拓扑配置 |
| GET | /api/benchmarks | 列出 Benchmark |
| POST | /api/simulate | 同步仿真 (EvaluationRequest) |
| POST | /api/validate | 配置验证 |
| POST | /api/evaluation/submit | 异步评估任务提交 |
| GET | /api/evaluation/tasks | 查询任务状态 |
| GET | /api/evaluation/experiments | 查询实验列表 |
| POST | /api/evaluation/experiments/export | 导出实验 |
| POST | /api/evaluation/experiments/check-import | 导入检查 |
| POST | /api/evaluation/experiments/execute-import | 执行导入 |
| WS | /ws/tasks | 实时任务状态推送 |
数据持久化
localStorage (前端) -- 临时 UI 状态缓存
|
in-memory TaskQueue -- 会话级任务队列 (ThreadPoolExecutor)
|
SQLite (SQLAlchemy) -- 永久存储
+-- Experiment -- 实验元数据 (name, description, timestamps)
+-- EvaluationResult -- 任务状态 + 结果数据 (tps, mfu, full_result JSON)
|
JSON Export -- 离线快照 (导入/导出)
数据库结构为 2 层 (Experiment -> EvaluationResult), 不再有中间 EvaluationTask 表。
配置预设系统
backend/perf_model/configs/
+-- chips/
| +-- SG2262.yaml # SG2262 (64 cores, ~786 TFLOPS FP8)
| +-- _template.yaml # 芯片配置模板 (含字段注释)
+-- models/
| +-- DeepSeek-V3-671B-A37B.yaml
| +-- DeepSeek-V3.2-671B-A37B.yaml
| +-- Qwen3-235b-a22b.yaml / Qwen3-32b.yaml / LLaMA-7b.yaml
| +-- _template.yaml
+-- topologies/
| +-- P1-R1-B1-C8.yaml # 1 pod, 1 rack, 1 board, 8 chips
| +-- P1-R1-B4-C32.yaml
| +-- _template.yaml
+-- benchmarks/
+-- DeepSeek-V3-671B-A37B-S4K-O1-W8A8-B2048.yaml
+-- _template.yaml
配置格式详见 07-configs.md。