This paper introduces semantic features as a candidate conceptual framework for white-box neural networks. The proof of concept model is well-motivated, inherently interpretable, has low parameter-count and achieves almost human-level adversarial test metrics - with no adversarial training! These results and the general nature of the approach warrant further research on semantic features. The code is available at https://github.com/314-Foundation/white-box-nn
翻译:暂无翻译