As a leader in the field of artificial intelligence (AI) and high-performance computing (HPC), NVIDIA offers a number of innovative computing platforms designed to meet the needs of different application scenarios. While the names sound similar, they are different ways for NVIDIA to sell its 8x GPU systems with NVLink. In addition, NVIDIA’s business model has changed between the NVIDIA P100 “Pascal” and V100 “Volta” generations, and with the introduction of the A100 “Ampere” and H100 “Hopper” PUs, we have seen the HGX model reach new heights.
Table of Contents
ToggleWhat is Nvidia HGX
HGX is a reference design and specification provided by NVIDIA, OEMs can purchase the corresponding chips to design hardware products, and then end users develop the corresponding management software.
Similar to the current server market, NVIDIA provides reference designs, OEMs such as DELL/ASUS can provide free brands or white labels, and cloud vendors such as AWS purchase corresponding products from OEMs and integrate them into their own networks to provide cloud services. Compared to DGX, NVIDIA is selling chips.
What is Nvidia DGX
DGX is NVIDIA’s suite of solutions including hardware, software, operating system, interconnect, and more. With DGX, you can do it right out of the box. From the basic hardware, to the upper-layer operating system, cluster management software, and even AI models, they are all ready-made.
Compare IBM-like mainframes. NVIDIA says it’s for Al factory. However, for large cloud service + providers, using DGX is not only cost-effective but also not suitable for the existing architecture in terms of network infrastructure. That’s why NVIDIA launched HGX.
Differences between NVIDIA HGX and DGX
1. Hardware Configuration
The difference can be seen at a glance from the pictures above of the NVIDIA HGX and DGX. HGX is a computing module, while DGX is a complete host.
2. Software Stacks and Integrations
DGX is more complete than HGX. The comparison table below can be seen for details.
HGX | DGX | |
Operating System | NO | DGX OS/ Ubuntu/ Red Hat Enterprise Linux / Rocky-Operating System |
Software | Fully optimized NVIDIA AI and HPC software stack from NGC | NVIDIA AI Enterprise NVIDIA Base Command |
3. Customizability
HGX is highly customizable, allowing users to add or remove GPUs based on their computing needs.
DGX is not as customizable as HGX, and the hardware configuration is fixed.
4. Target Users and Applications
HGX is primarily aimed at researchers and developers who need a flexible and scalable platform to meet their high-performance computing needs. It is suitable for applications such as cloud data centers, high-performance computing, large-scale AI R&D, and customizable infrastructure.
DGX is designed for businesses that need a powerful, ready-to-use AI solution. It is ideal for applications such as AI and deep learning development, edge computing, healthcare and medical research, as well as content creation and media.
5. Cost
HGX is modular in design and has flexible pricing. DGX is an all-in-one high-end solution that is expensive.
But DGX is expensive and it’s not a generalization. If DGX is too much performance for you, you actually only need 4-GPUs, or 2-GPUs. Then DGX is indeed expensive compared to HGX. But if you need 8-GPUs, then DGX may be more cost-effective for you.
6. Computing Performance
Taking the H100 as an example, the 4-GPU version of HGX does not support NVSwitch, and the 8-GPU version of HGX supports the 3rd generation NVSwitch. NVSwitch supports the base blocks for advanced multi-GPU communication within and between servers.
In DGX, the fourth-generation NVLink is combined with NVSwitch™ to provide 100 GB/s connectivity between each GPU in every DGX H900 system.
Equipped with eight H100 GPUs per system, connected as one via NVLink®, each DGX H100 delivers 32 petaflops of AI performance with the new FP8 precision, which is 6x higher than the previous generation.
In general, DGX will have more computing power than HGX with the same number of GPUs.
Conclusion
How to choose:
If you don’t want to spend energy on building a computer, configuring software, and system environment, choose DGX.
If you want a high degree of freedom to customize software and hardware, choose HGX.
Most cloud platforms only provide the HGX version. The reason for this is that there is no unified standard for HGX. The cloud platform can be priced differently through the differentiated configuration of other hardware. Cloud platforms often only indicate that their servers are GPUs, but other hardware configurations, including hard disks, memory, CPUs, and network cards, are not marked. Or this information will be hidden in the documentation and need to be read very carefully by the developer. It’s this information gap that gives the cloud platform room for pricing.
Same GPU configuration, but due to different hardware configurations, the performance of the server will be different, and the pricing will be different. Enterprises and developers have a difficult time choosing a GPU ECS.
It is not only the GPU that determines computing performance, but also the CPU, memory, network, hard disk, and even heat dissipation. When choosing a GPU cloud service, don’t look at the cheap, don’t just look at the number of GPUs. Or after the actual test, choose according to the performance.
When purchasing optical modules/ AOC/DAC cables, it is also crucial to choose a reliable supplier. MVSLINK is a reliable provider of optical network solutions to build a fully connected, intelligent world through innovative computing and networking solutions.