Tencent’s tech team has optimized DeepSeek’s open-source DeepEP communication framework014 Archives boosting its performance across different network environments, according to the Chinese AI startup. Testing showed a 100% improvement on RoCE networks and a 30% gain on InfiniBand (IB), offering more efficient solutions for AI model training. On GitHub, DeepSeek acknowledged the Chinese tech giant’s contribution had led to a “huge speedup.” DeepEP is a communication library tailored for a mixture of experts (MoE) and expert parallelism (EP), supporting high-throughput, low-latency GPU kernels and low-precision computing, including FP8. Tencent’s Starlink Networking team identified two main bottlenecks: underutilized dual-port NIC bandwidth and CPU control latency. After targeted optimizations, performance doubled on RoCE and improved by 30% on IB. The enhanced framework is now fully open-source and has been successfully deployed in training Tencent’s Hunyuan large model, demonstrating strong versatility within environments built on Tencent’s Starlink and H20 servers, Chinese tech media outlet iThome reported. [iThome, in Chinese]
(Editor: {typename type="name"/})
A worthless juicer and a Gipper-branded server
Giant iceberg poised to break off Antarctica is the size of Delaware
5 secrets of top wingmen that can help you help your friends find love
This airport is using facial recognition on U.S. citizens
This fat bear's before and after photos are stunning
5 movies we're looking forward to in 2017
Disneyland tourists flying in a hot air balloon crash into a nearby pond
Woman snaps a casual selfie while her sister gives birth in the background
Best robot vacuum deal: Eufy Omni C20 robot vacuum and mop at record
接受PR>=1、BR>=1,流量相当,内容相关类链接。