🚀 Just wrapped: ESA-funded project SoC-HEALTH2 – Hierarchical Health Management in Heterogeneous Systems 🎯
We're excited to announce the successful completion of SoC-HEALTH2, a continuation of our earlier ESA GSTP/EXPRO+ activity that originally introduced the On-Chip Fault Management (OCFM) architecture that enabled both a) instant fault detection isolation and recovery (FDIR) and b) data collection for prognostics in multicore CPU subsystems. The OCFM framework is capable to collect and process data from hundreds of on-chip sensors and checkers in real time, as well as adaptively reschedule tasks across CPU cores w.r.t. the system health status.
With current SoC-HEALTH2, we’ve significantly expanded OCFM’s capabilities beyond multicore CPUs – towards FPGA SoCs – targeting earth observation, telecom and rover mission profiles based on COTS FPGAs primarily in the New Space segment. The OCFM framework has been successfully implemented and verified on Versal and Zynq Ultrascale+ FPGA SoCs following rigorous ECSS standards.
The developed demonstrator supports the standard PUS/CCSDS telemetry protocol and employs an image recognition DNN accelerator demo payload. Customization via open API and extensions (3rd-party libraries) to adapt the OCFM framework for mission-specific FDIR goals is supported.
Health manager functions from the requirements and validation standpoints have been analyzed using FMEA/FMECA and fault injection campaigns. The radiation hardening of health management functions has been achieved by using block-level and distributed TMR in hardware (FPGA logic) and dual lock-step execution for the software components running on FPGA-SoC’s CPU cores (R cores working in lock-step as a safety island).