Abstract:Dense target distribution, complex backgrounds, and a large number of small objects often lead to suboptimal detection performance in remote sensing image object detection. To address these challenges, this paper proposes RSD-DETR, a remote sensing object detection algorithm based on RT-DETR. First, a lightweight multi-scale feature extraction module, Faster-CGLU, is designed by integrating a gating mechanism with partial convolution, which optimizes the aggregation of local and global feature information while reducing computational redundancy. Second, a CGA-AIFI module is constructed using cascaded group attention (CGA), which focuses on critical feature regions while suppressing irrelevant background information, thereby enhancing the interaction between the model and object features. Finally, a cross-scale dynamic feature fusion module (CS-DFFM) is designed, which performs spatial alignment and dynamic fusion of multi-scale feature maps through the dynamic scale-sequence feature fusion (DySSFF) module and the triple feature encoder (TFE) module. This effectively mitigates the loss of small object features caused by upsampling and downsampling, and enhances the network′s multi-scale feature fusion capability. Experimental results show that on the SIMD and DOTA-v1.0 datasets, the proposed algorithm reduces the number of parameters by 22.11% compared with the baseline model, and the mean average precision (mAP0.5) reaches 79.9% and 86.8% respectively, which are 2.5% and 1.7% higher than those of the baseline model. The real-time performance of the model is also improved. The detection effect is better than other classic models, and it has excellent performance.