Gajinder PanesarNo-one really knows what is going on inside SoCs; no-one really knows how systems-on-chip operate as a system

There’s a crisis going on in the chip industry. SoC architects, developers and EDA companies will deny it, but it’s a crisis nevertheless.

The fact is that today’s SoCs are so complicated that no-one – and I mean no-one – truly understands how they work. And this is despite billions of dollars spent every year on design, verification, EDA, shuttles and re-spins.

How did we get to this parlous state in the first place?

Over the years chips have become more and more complex: the average SoC today has more than 100 different IP blocks, many of which are processor cores that in days gone-by would have been big enough to require a chip of their own. The blocks are miscellaneously purchased from third parties, acquired bundled with EDA licenses, developed in house or re-used from previous designs. They are strung together with staggeringly sophisticated and complicated interconnects. And the whole system runs a combination of legacy, newly-developed and third-party software that in total will have taken engineer-centuries to develop.

This complexity allows us to build powerful systems that are much more than the sum of their parts.

But many SoC architects, designers and verification and validation engineers are, frankly, in denial about the true implications of this complexity. They believe they understand how their products work. But I can safely say that they do not, because they simply cannot.

The interactions of these different hardware and software blocks make for a system that is inherently non-deterministic. The same argument that tells us that the system is more than the sum of its parts also leads us to the inevitable conclusion that it will exhibit behavior that is at times unpredictable. Validating the constituent modules in isolation is no help – a system on a chip needs a system-level view.

Whether the chip industry admits it or not, in my book this constitutes a crisis; and it’s one of the main reasons that I decided to join UltraSoC.

There has to be a better way. Developers need to bite the bullet and start understanding system-wide performance, especially in heterogeneous systems. A “system on chip” should be developed as system: not as though it were a collection of independent pieces, working with each one in a silo. The following phrases need to be banned: “block x works fine – it’s not my problem”; “looks like a software problem to me”; “looks like a hardware problem to me”.

Co-integration, intermittent deadlocks and hangs, host and accelerator conflicts and subtle problems with interconnects are the nightmares that keep engineering managers awake at night. And these are the problems that a system-level view, via a technology like UltraSoC’s, can solve.

It’s about much more than just debug (although debug is a big part of it): we help people to understand and tune the system as a whole. Just as importantly, we are able to do this non-intrusively via hardware – “at wire speed” – and while the device is in “mission mode”.

It’s also important that we’re IP-vendor-independent. There is no way that a vendor-supplied tool can give a holistic view of a device: in contrast, UltraSoC’s technology allows us to offer the right view, tailored to the specific needs of the situation – whether that’s centered on core operation, DMA, a specific peripheral or interrupt controller.

And we can hone those views with filtering and intelligence to focus on the transactions and processes that really matter. Stalls, contentions, deadlocks and even hard-to-replicate problems become tractable issues. In this way we can help to address the real questions that chip developers need answering: questions like “where have my MIPS gone?” and “how long does that DMA transfer really take?”

Last but not least, we can extend that data collection and analysis process beyond the lab and into the field, allowing product refinement, pre-emptive action against failure, and troubleshooting and forensics when problems occur. Analytics like this opens the door to system optimization and tuning with live data, based on actual users and real world performance.

Creating a chip should be neither “art” nor black magic – our work at UltraSoC is to provide the analytical foundation to allow the whole process to take place in a structured fashion, informed by real understanding and solid engineering principles.

That’s why I’m so excited to be working here at UltraSoC: because system designers deserve system-level tools.