Abstract: Binary neural networks (BNNs) represent weights and activations using 1-bit values, which has extremely lower memory costs and computational complexities, but usually suffer from severe accuracy degradation. Knowledge distillation is an effective way to improve the performance of BNN by inheriting the knowledge from higher-bit network. However, faced with the accuracy gap and bit gap between 1-bit network and different higher-bit networks, it is uncertain which higher-bit network is more suitable to be the teacher of a certain BNN. Therefore, we propose a novel multi-bit adaptive distillation(MAD) method for maximally integrating the advantages of various bit-width teacher networks(e.g. 2-bit, 4-bit, 8-bit and 32-bit). In practice, intermediate features and output logits of teachers will be simultaneously utilized for improving the performance of BNN. Moreover, an adaptive knowledge adjusting scheme is explored to dynamically adjust the contribution of different teachers in the distillation process. Comprehensive experiments conducted on CIFAR-10/100 and ImageNet datasets with various network architectures demonstrate the superiorities of MAD over many state-of-the-arts binarization methods. For instance, without introducing any extra inference calculations, our binarized ResNet-18 achieves 1.5% improvement for BirealNet binarization method on ImageNet.