Abstract: In spite of their remarkable success in many vision tasks, convolutional neural networks (CNNs) often has trouble counting people in crowded scenes due to the following reasons. First, ordinary CNNs with fixed receptive fields are inadequate to handle diverse sizes and densities of people. Second, CNNs for counting are sensitive to brightness and contrast changes of input image. This paper proposes a new CNN for crowd counting that resolves these two issues. First, we develop a new counting network called pyramid feature selection network (PFSNet) that adapts its receptive fields dynamically to local crowd densities of the input image. Second, we introduce a light-weight and effective image enhancement network, which manipulates input image to normalize its condition and make it more counting-friendly, leading to robust and improved crowd counting. The concatenation of the two networks, dubbed E-PFSNet, achieves the state of the art on three public benchmarks for crowd counting. Also, it outperforms previous arts in terms of robustness against changes in image conditions as well as counting accuracy.