Troubleshooting

What you will find here

Real failure modes observed in current BNNR runs, with reproducible checks and fixes.

When to use this page

Use this when a command fails, a run hangs, dashboard is empty, or export output is unexpected.

1) Training finishes but process does not exit

Cause:

run started with --with-dashboard (default), so live server stays active.

Fix:

stop with Ctrl+C, or
run one-shot with --without-dashboard.

2) `Dashboard backend dependencies are missing`

Cause:

dashboard extras not installed.

Fix:

python3 -m pip install -e ".[dashboard]"

3) `--data-path is required ...`

Cause:

missing path for dataset types that require external structure.

Applies to:

imagefolder

Fix: pass --data-path.

4) Dashboard shows zero runs

Cause:

wrong --run-dir, or run directory missing events.jsonl.

Fix:

use a parent folder containing run_* directories, or
point directly at a run directory that has events.jsonl.

6) (Historical) Python 3.9 and union types in CLI/FastAPI

Supported releases (0.1.1+) require Python >=3.10; this scenario does not apply to current wheels.

If you run an older checkout on Python 3.9, you could see unsupported operand type(s) for | from runtime-evaluated X | None annotations. Mitigations were: compatible annotations in introspected paths, or eval-type-backport for python < 3.10 (removed when 3.9 support was dropped).

7) CI/test import error: `ModuleNotFoundError: No module named 'httpx'`

Cause:

async dashboard tests require httpx.

Fix:

include httpx in test/dev dependencies.

8) `pip install -e ".[dashboard]"` or `python -m build` fails in restricted environments

Cause:

isolated build environment cannot fetch build backend packages (for example hatchling) due network restrictions.

Fix:

in restricted/offline environments, use prepared build env and python -m build --no-isolation,
in GitHub CI (network-enabled), standard python -m build should work.

9) CUDA appears unavailable

Check:

python3 -c "import torch; print(torch.cuda.is_available(), torch.cuda.device_count())"

Fix:

install CUDA-compatible PyTorch build for your OS/driver.

10) Augmentation instability on grayscale datasets

Symptom:

shape/broadcast errors with aggressive presets on grayscale data.

Fix:

start with RGB datasets for preset stress tests,
keep grayscale smoke runs minimal and conservative.

11) `python3 -m venv` fails with `ensurepip is not available`

Cause:

Python venv support package is missing on the host OS.

Fix:

install system venv package (for example python3.12-venv on Ubuntu),
recreate the virtual environment and continue quickstart steps.

12) QR code is visible but dashboard does not open on phone

Cause:

phone is on a different network,
local firewall blocks dashboard port,
router/client isolation blocks peer-to-peer traffic.

Fix:

confirm phone and machine are on the same Wi-Fi,
open Network URL manually on phone (not only QR scan),
test with explicit port, e.g. --dashboard-port 8080,
allow Python/server traffic for that port in firewall settings.

13) Loading BNNR checkpoints for inference

BNNR checkpoints include RNG state for deterministic resume. When loading for inference with PyTorch >= 2.6, pass weights_only=False:

import torch
 
ckpt = torch.load(
    "checkpoints/iter_1_augname.pt",
    map_location="cpu",
    weights_only=False,
)
model.load_state_dict(ckpt["model_state"])
model.eval()

Checkpoint keys: model_state, iteration, augmentation_name, metrics, config_snapshot, rng_state (safe to ignore for inference).

14) Choosing XAI target layers for SimpleTorchAdapter

By default, SimpleTorchAdapter picks the last Conv2d layer for XAI. To override, pass target_layers explicitly:

# Example: EfficientNet-B0
target_layer = model.features[-1][0]  # last MBConv block
 
adapter = SimpleTorchAdapter(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    target_layers=[target_layer],
)

Common choices:

ResNet: model.layer4[-1]
EfficientNet: model.features[-1][0]
ViT: last attention block (may require custom wrapper)

15) Windows: `RuntimeError: An attempt has been made to start a new process`

Cause:

PyTorch DataLoader with num_workers > 0 on Windows requires if __name__ == "__main__": guard.

Fix:

wrap your training script entry point:

if __name__ == "__main__":
    main()

or set num_workers=0 in your DataLoader.

16) Multi-label: `task: multilabel` in YAML but training still looks single-label

Cause:

bnnr train preset pipelines (build_*_pipeline in pipelines.py) always use CrossEntropyLoss and single-label targets, regardless of task in the config file.

Fix:

integrate multi-label data with SimpleTorchAdapter(multilabel=True) and BCEWithLogitsLoss (Golden Path), or run examples/multilabel/multilabel_demo.py; do not expect --dataset cifar10 (or similar) alone to become multi-label.

17) Detection XAI with YOLO / Ultralytics

Symptom:

Log says detection XAI or probe prediction snapshots were skipped for an Ultralytics backbone.

Cause:

BNNR needs the high-level YOLO.predict path and the same image scaling as training. A raw ultralytics.nn.tasks.DetectionModel passed through DetectionAdapter does not expose that.

Fix:

Use UltralyticsDetectionAdapter from bnnr.detection_adapter for Ultralytics YOLO training. That adapter implements predict_detection_dicts so detection XAI (activation / occlusion) and probe snapshots work with xai_enabled=True.

Troubleshooting

What you will find here

When to use this page

1) Training finishes but process does not exit

2) Dashboard backend dependencies are missing

3) --data-path is required ...

4) Dashboard shows zero runs

6) (Historical) Python 3.9 and union types in CLI/FastAPI

7) CI/test import error: ModuleNotFoundError: No module named 'httpx'

8) pip install -e ".[dashboard]" or python -m build fails in restricted environments

9) CUDA appears unavailable

10) Augmentation instability on grayscale datasets

11) python3 -m venv fails with ensurepip is not available

12) QR code is visible but dashboard does not open on phone

13) Loading BNNR checkpoints for inference

14) Choosing XAI target layers for SimpleTorchAdapter

15) Windows: RuntimeError: An attempt has been made to start a new process

16) Multi-label: task: multilabel in YAML but training still looks single-label

17) Detection XAI with YOLO / Ultralytics

2) `Dashboard backend dependencies are missing`

3) `--data-path is required ...`

7) CI/test import error: `ModuleNotFoundError: No module named 'httpx'`

8) `pip install -e ".[dashboard]"` or `python -m build` fails in restricted environments

11) `python3 -m venv` fails with `ensurepip is not available`

15) Windows: `RuntimeError: An attempt has been made to start a new process`

16) Multi-label: `task: multilabel` in YAML but training still looks single-label