NVIDIA H100、 A100 和 A30 Tensor Core GPU), 详情参考:GPU 支持矩阵
通过 Operator 开启 MIG Single 模式,在安装界面配置参数:
安装完成后需要给对应节点(已插入对应 GPU 卡节点)打上切分规格的 label,如不执行此操作,将按照默认不切分。
Tip
Single 模式只能按照单一模式进行切分。建议使用默认策略,也可以自定义切分策略。
界面配置 :
在 ConfigMap 中搜索 default-mig-parted-config
,进入详情找到 GPU 卡型号对应的切分规格。
找到对应节点,选择 修改标签 添加 nvidia.com/mig.config="all-1g.10gb" 。若选择其他规格,则按照其他规格进行切分。
命令配置:
查看配置结果
设置完成后,在确认部署应用时即可使用 GPU MIG 资源。通过 Operator 开启 MIG Mixed 模式,在安装界面配置如下参数:
安装完成后需要给对应节点(已插入对应 GPU 卡节点)打上切分规格的 label,如不执行此操作,将按照默认不切分。
Tip
建议使用默认策略,也可以自定义切分策略。
界面配置 :
在 ConfigMap 中搜索 default-mig-parted-config ,进入详情找到 GPU 卡型号对应的切分规格。
找到对应节点,选择 修改标签 添加 nvidia.com/mig.config="all-1g.10gb" 。若选择其他规格,则按照其他规格进行切分。
命令配置 :
查看配置结果
设置完成后,在确认部署应用时即可使用 GPU MIG 资源。
可自定义切分策略配置文件,单张卡最多可切分为 7 个实例。需在安装 GPU Operator 前创建,并在安装时指定该 ConfigMap 名称。
在 ConfigMap 中创建自定义切分策略,部署时需要和 GPU operator 部署在同一个命名空间下。 同时您创建的文件名称不能与默认 default-mig-parted-config 相同。配置数据可参考如下 yaml。
如下 YAML 为示例自定义配置 custom-mig-parted-config ,配置数据的 key 为如下 config.yaml 中内容,您可以自定义添加其他切分策略。
# 自定义切分 GI 实例配置
version: v1
mig-configs:
all-disabled:
- devices: all
mig-enabled: false
# A100-40GB, A800-40GB
all-1g.5gb:
- devices: all
mig-enabled: true
mig-devices:
"1g.5gb": 7
all-1g.5gb.me:
- devices: all
mig-enabled: true
mig-devices:
"1g.5gb+me": 1
all-2g.10gb:
- devices: all
mig-enabled: true
mig-devices:
"2g.10gb": 3
all-3g.20gb:
- devices: all
mig-enabled: true
mig-devices:
"3g.20gb": 2
all-4g.20gb:
- devices: all
mig-enabled: true
mig-devices:
"4g.20gb": 1
all-7g.40gb:
- devices: all
mig-enabled: true
mig-devices:
"7g.40gb": 1
# H100-80GB, H800-80GB, A100-80GB, A800-80GB, A100-40GB, A800-40GB
all-1g.10gb:
# H100-80GB, H800-80GB, A100-80GB, A800-80GB
- device-filter: ["0x233010DE", "0x233110DE", "0x232210DE", "0x20B210DE", "0x20B510DE", "0x20F310DE", "0x20F510DE"]
devices: all
mig-enabled: true
mig-devices:
"1g.10gb": 7
# A100-40GB, A800-40GB
- device-filter: ["0x20B010DE", "0x20B110DE", "0x20F110DE", "0x20F610DE"]
devices: all
mig-enabled: true
mig-devices:
"1g.10gb": 4
# H100-80GB, H800-80GB, A100-80GB, A800-80GB
all-1g.10gb.me:
- devices: all
mig-enabled: true
mig-devices:
"1g.10gb+me": 1
# H100-80GB, H800-80GB, A100-80GB, A800-80GB
all-1g.20gb:
- devices: all
mig-enabled: true
mig-devices:
"1g.20gb": 4
all-2g.20gb:
- devices: all
mig-enabled: true
mig-devices:
"2g.20gb": 3
all-3g.40gb:
- devices: all
mig-enabled: true
mig-devices:
"3g.40gb": 2
all-4g.40gb:
- devices: all
mig-enabled: true
mig-devices:
"4g.40gb": 1
all-7g.80gb:
- devices: all
mig-enabled: true
mig-devices:
"7g.80gb": 1
# A30-24GB
all-1g.6gb:
- devices: all
mig-enabled: true
mig-devices:
"1g.6gb": 4
all-1g.6gb.me:
- devices: all
mig-enabled: true
mig-devices:
"1g.6gb+me": 1
all-2g.12gb:
- devices: all
mig-enabled: true
mig-devices:
"2g.12gb": 2
all-2g.12gb.me:
- devices: all
mig-enabled: true
mig-devices:
"2g.12gb+me": 1
all-4g.24gb:
- devices: all
mig-enabled: true
mig-devices:
"4g.24gb": 1
# H100 NVL, H800 NVL
all-1g.12gb:
- devices: all
mig-enabled: true
mig-devices:
"1g.12gb": 7
all-1g.12gb.me:
- devices: all
mig-enabled: true
mig-devices:
"1g.12gb+me": 1
all-2g.24gb:
- devices: all
mig-enabled: true
mig-devices:
"2g.24gb": 3
all-3g.47gb:
- devices: all
mig-enabled: true
mig-devices:
"3g.47gb": 2
all-4g.47gb:
- devices: all
mig-enabled: true
mig-devices:
"4g.47gb": 1
all-7g.94gb:
- devices: all
mig-enabled: true
mig-devices:
"7g.94gb": 1
# H100-96GB, PG506-96GB
all-3g.48gb:
- devices: all
mig-enabled: true
mig-devices:
"3g.48gb": 2
all-4g.48gb:
- devices: all
mig-enabled: true
mig-devices:
"4g.48gb": 1
all-7g.96gb:
- devices: all
mig-enabled: true
mig-devices:
"7g.96gb": 1
# H100-96GB, H100 NVL, H800 NVL, H100-80GB, H800-80GB, A800-40GB, A800-80GB, A100-40GB, A100-80GB, A30-24GB, PG506-96GB
all-balanced:
# H100 NVL, H800 NVL
- device-filter: ["0x232110DE", "0x233A10DE"]
devices: all
mig-enabled: true
mig-devices:
"1g.12gb": 1
"2g.24gb": 1
"3g.47gb": 1
# H100-80GB, H800-80GB, A100-80GB, A800-80GB
- device-filter: ["0x233010DE", "0x233110DE", "0x232210DE", "0x20B210DE", "0x20B510DE", "0x20F310DE", "0x20F510DE"]
devices: all
mig-enabled: true
mig-devices:
"1g.10gb": 2
"2g.20gb": 1
"3g.40gb": 1
# A100-40GB, A800-40GB
- device-filter: ["0x20B010DE", "0x20B110DE", "0x20F110DE", "0x20F610DE"]
devices: all
mig-enabled: true
mig-devices:
"1g.5gb": 2
"2g.10gb": 1
"3g.20gb": 1
# A30-24GB
- device-filter: "0x20B710DE"
devices: all
mig-enabled: true
mig-devices:
"1g.6gb": 2
"2g.12gb": 1
# H100-96GB, PG506-96GB
- device-filter: ["0x233D10DE", "0x20B610DE"]
devices: all
mig-enabled: true
mig-devices:
"1g.12gb": 2
"2g.24gb": 1
"3g.48gb": 1
# 设置后会按照设置规格切分 CI 实例
custom-config:
- devices: all
mig-enabled: true
mig-devices:
"1g.10gb": 4
"1g.20gb": 2
在上述的 YAML 中设置 custom-config
,设置后会按照规格切分 CI 实例。
在安装 GPU Operator 时,指定该 ConfigMap。