Abstract
In cross-silo federated learning (FL), the data among clients are usually statistically heterogeneous (aka not independent and identically distributed, non-IID) due to diversified data sources, lowering the accuracy of FL. Although many personalized FL (PFL) approaches have been proposed to address this issue, they are only suitable for data with specific degrees of statistical heterogeneity. In the real world, the heterogeneity of data among clients is often immeasurable due to privacy concern, making the targeted selection of PFL approaches difficult. Besides, in cross-silo FL, clients are usually from different organizations, tending to hold architecturally different private models. In this work, we propose a novel FL framework, FedAPEN, which combines mutual learning and ensemble learning to take the advantages of private and shared global models while allowing heterogeneous models. Within FedAPEN, we propose two mechanisms to coordinate and promote model ensemble such that FedAPEN achieves excellent accuracy on various data distributions without prior knowledge of data heterogeneity, and thus, obtains the adaptability to data heterogeneity. We conduct extensive experiments on four real-world datasets, including: 1) Fashion MNIST, CIFAR-10, and CIFAR-100, each with ten different types and degrees of label distribution skew; and 2) eICU with feature distribution skew. The experiments demonstrate that FedAPEN almost obtains superior accuracy on data with varying types and degrees of heterogeneity compared with baselines.
Type
Publication
in ACM SIGKDD Conference on Knowledge Discovery and Data Mining