Beyond Softmax:
Dual-Branch Sigmoid Architecture for Accurate Class Activation Maps

Ewha Womans University
Interpolate start reference image.

Abstract

Class Activation Mapping (CAM) and its extensions have become indispensable tools for visualizing the evidence behind deep network predictions. However, by relying on a final softmax classifier, these methods suffer from two fundamental distortions: additive logit shifts that arbitrarily bias importance scores, and sign collapse that conflates excitatory and inhibitory features.

We propose a simple, architecture-agnostic dual-branch sigmoid head that decouples localization from classification. Given any pretrained model, we clone its classification head into a parallel branch ending in perclass sigmoid outputs, freeze the original softmax head, and fine-tune only the sigmoid branch with class-balanced binary supervision. At inference, softmax retains recognition accuracy, while class evidence maps are generated from the sigmoid branch — preserving both magnitude and sign of feature contributions. Our method integrates seamlessly with most CAM variants and incurs negligible overhead. Extensive evaluations on fine-grained tasks (CUB-200-2011, Stanford Cars) and WSOL benchmarks (ImageNet-1K, OpenImages-30K) show improved explanation fidelity and consistent Top-1 Localization gains — without any drop in classification accuracy.

Problems

All CAM variants ultimately form a heatmap by linearly combining feature maps with per-channel weights. However, when weights are derived from softmax-based scores, softmax’s invariances break these assumptions:
(a) Additive Logit Shift. Adding a constant δ to all feature weights leaves the softmax probability yk unchanged but disproportionately amplifies feature fi in the heatmap.
(b) Sign Collapse. Subtracting δ flips formerly positive feature weights to negative without affecting yk, causing previously highlighted regions to vanish.
In both cases, identical classification outputs produce drastically different localization maps.

Our Approach

To disentangle these distortions, we introduce a dual-branch sigmoid head that decouples localization from classification.
Training. Starting from a pretrained classifier, we copy its head h into a new branch ˜h with identical architecture, but fresh parameters. The sigmoid branch outputs class-wise scores. Here, the original softmax head and backbone remain frozen.
Inference. After feature extraction, the frozen softmax head predicts the class label k*. In parallel, any CAM variant computes per-channel importance scores w̃k* (via weights or gradients) for sk*, which are rectified by clamping to positive values. These positive-only scores are then linearly combined with the feature maps to produce the final class evidence heatmap M̃k*.

Quanitative Results

1. Fine-Grained Explanation Fidelity

table1_xai

Fine-grained explanation fidelity on CUB-200-2011 and Stanford Cars. For % Average Drop (lower is better) and % Increase in Confidence (higher is better), improved values are shown in blue and worsened values in red; parentheses indicate the change relative to the baseline.

2. Weakly-Supervised Object Localization

table2_wsol

WSOL results on ImageNet-1K and OpenImages-30K. For each base method we shade the baseline row in gray; “+ Ours” rows report updated scores with their Δ shown in parentheses (blue for gains, red for drops).

Qualitative Results

1. Fine-Grained Explanation Fidelity

qual_all_xai

Additional qualitative explanation examples on fine-grained datasets: VGG-16 on CUB-200-2011 (top) and ResNet-50 on Stanford Cars (bottom).

2. Weakly-Supervised Object Localization

qual_all_imagenet

Additional qualitative WSOL examples on ImageNet-1K using VGG-16 (top), ResNet-50 (middle), and InceptionV3 (bottom). Predicted bounding boxes are shown in green, and ground-truth boxes in red.

BibTeX


        @article{oh:2025:beyondsoftmax,
          title = {Beyond Softmax: Dual-Branch Sigmoid Architecture for Accurate Class Activation Maps},
          author = {Oh, Yoojin and Noh, Junhyug},
          journal = {British Machine Vision Conference (BMVC)},
          year = {2025}
        }