We investigate whether continuous-control policies can be represented and learned as discrete logic circuits instead of continuous neural networks. We introduce Differentiable Weightless Controllers (DWCs), a symbolic-differentiable architecture that maps real-valued observations to actions using thermometer-encoded inputs, sparsely connected boolean lookup-table layers, and lightweight action heads. DWCs can be trained end-to-end by gradient-based techniques, yet compile directly into FPGA-compatible circuits with few- or even single-clock-cycle latency and nanojoule-level energy cost per action. Across five MuJoCo benchmarks, including high-dimensional Humanoid, DWCs achieve returns competitive with weight-based policies (full precision or quantized neural networks), matching performance on four tasks and isolating network capacity as the key limiting factor on HalfCheetah. Furthermore, DWCs exhibit structurally sparse and interpretable connectivity patterns, enabling a direct inspection of which input thresholds influence control decisions.
翻译:本研究探讨连续控制策略是否能够以离散逻辑电路而非连续神经网络的形式进行表示与学习。我们提出可微分无权重控制器(DWCs),这是一种符号-可微分架构,通过温度计编码输入、稀疏连接的布尔查找表层以及轻量级动作头,将实值观测映射为动作。DWCs可通过基于梯度的技术进行端到端训练,同时能直接编译为FPGA兼容电路,实现每动作仅需极少(甚至单时钟周期)延迟与纳焦级能量消耗。在包括高维度Humanoid在内的五个MuJoCo基准测试中,DWCs取得的回报与基于权重的策略(全精度或量化神经网络)相当,在四项任务中表现匹配,并揭示网络容量是HalfCheetah任务中的关键限制因素。此外,DWCs展现出结构稀疏且可解释的连接模式,使得能够直接检视哪些输入阈值影响控制决策。