Open Source

FedML: A Research Library and Benchmark for Federated Machine Learning
Chaoyang He, Songze Li (HKUST, Stanford), Jinhyun So (USC), Mi Zhang (USC), Hongyi Wang (U Wisconsin Madison), Xiaoyang Wang (UIUC), Praneeth Vepakomma (MIT), Abhishek Singh (MIT), Hang Qiu (Stanford), Li Shen (Tencent), Peilin Zhao (Tencent), Yan Kang (WeBank), Yang Liu (WeBank), Ramesh Raskar (MIT), Qiang Yang (HKUST, WeBank), Murali Annavaram, Salman Avestimehr
The short version of FedML white paper won Best Paper Award at SpicyFL@NeurIPS 2020
[BibTex] [Homepage] [Arxiv] [Slack] [Documentation] [Video] [Slides] [Best Paper Award] [Code]
TDLR: FedML Ecosystem ( is a family of open research libraries to facilitate federated learning research in diverse application domains. It includes FedML Core Framework, FedNLP (Natural Language Processing), FedCV (Computer Vision), FedGraphNN (Graph Neural Networks), FedIoT (Internet of Things), and FedMobile (Smartphones). With the fundamental support from FedML Core Framework, FedML Ecosystem enables three computing paradigms: on-device training for edge devices (cross-device FL), distributed computing (cross-silo FL), and single-machine simulation, and also promotes diverse algorithmic research with flexible and generic API design and comprehensive reference baseline implementations (federated optimizer, private/security algorithms, models, and datasets). Compared with TFF and LEAF, FedNLP and FedCV greatly enrich the diversity of data sets and learning tasks. FedNLP supports various popular task formulations in the NLP domain, such as text classification, sequence tagging, question answering, seq2seq generation, and language modeling. FedCV can help researchers evaluate the three most representative tasks: image classification, image segmentation, and object detection. Moreover, FedGraphNN is the first FL research platform for analyzing graph-structured data using Graph Neural Networks in a distributed computing manner, filling the gap between federated learning and the data mining field. FedGraphNN collects, preprocess, and partitions 36 datasets from 7 domains such as molecule ML (drug discovery, bioinformatics, etc.), Social networks, Recommendation Systems, and Knowledge Graphs. Going beyond traditional AI applications, FedIoT and FedMobile further extend FL to perform in wireless communication (e.g., 5G) and mobile computing (e.g., embedded IoT devices such as Raspberry PI, smartphones running on Android OS). All in all, FedML Ecosystem aims to provide a one-stop scientific research platform through FedML Ecosystem and finally realize trustworthy ML/AI, which is more secure, scalable, efficient, and ubiquitous.

I’ve contributed to PyTorch Pipe and PyTorch RPC. See the following two publications for details.

PipeTransformer: Automated Elastic Pipelining for Distributed Training of Transformers
Chaoyang He, Shen Li (Facebook AI Research), Mahdi Soltanolkotabi, Salman Avestimehr
Accepted to ICML 2021 (International Conference on Machine Learning 2021)
[Arxiv] [Proceeding] [Homepage] [Slides] [Animation] [Open Source Code] []
TDLR: PipeTransformer is featured by official website:

PyTorch RPC: Distributed Deep Learning Built on Tensor-Optimized Remote Procedure Calls
Shen Li ( Facebook AI Applied Research ), Pritam Damania ( Facebook ), Luca Wehrstedt ( Facebook AI Research ), Rohan Varma ( Facebook ), Omkar Salpekar ( Facebook ), Pavel Belevich, Howard Huang ( Facebook ), Yanli Zhao ( Facebook ), Lucas Hosseini ( Facebook ), Wanchao Liang, Hongyi Jia, Shihao Xu, Satendra Gera ( Facebook ), Alisson Azzolini ( Facebook ), Chaoyang He ( University of Southern California ), Amir Ziashahabi ( University of Southern California ), Guoqiang Jerry Chen ( Facebook ), Zachary DeVito ( Facebook AI Research ), Alban Desmaison ( Facebook ), Edward Yang ( Facebook AI Reseach ), Dmytro Dzhulgakov ( Facebook ), Yi Wang ( Cruise ), Greg Chanan ( Facebook AI Research ), Soumith Chintala ( Facebook ), Brian Vaughan ( Facebook ), Boris Valkov ( Facebook ), Manoj Krishnan ( Facebook ), Dwarak Rajagopal ( Facebook ), Mahdi Soltanolkotabi ( University of Southern California ), Salman Avestimehr ( University of Southern California ), Joe Spisak ( Facebook )

Distributed training technologies have advanced rapidly in the past few years and have unlocked unprecedented scalability with increasingly complex solutions. These technologies have made distributed training much more efficient and accessible, but impose specific constraints on the training paradigm or the model structure. As a result, applications that fail to meet these constraints have to rely on general-purpose distributed computing frameworks to scale out. However, without access to the deep learning framework’s internal components, these distributed computing frameworks usually fall far short of efficiency and usability. To address these problems, we propose PyTorch RPC as a generic and high-performance solution for distributed deep learning. Compared to generic distributed computing frameworks, PyTorch RPC natively provides essential features for implementing training applications in a distributed environment, including optimized tensor communications, remote memory management, and a distributed autograd engine. Evaluations show that PyTorch RPC attains up to two orders of magnitude faster tensor communication compared to gRPC with one-tenth of user code. Case studies further demonstrate that users can easily employ PyTorch RPC to build efficient reinforcement learning applications (Mario solver), implement large language models (175B parameters), train recommendation models (DLRM), and scale federated learning tasks (FedML). PyTorch RPC is available at