Selection of the right on-chip network is critical to meeting the requirements of today’s advanced SoCs. There is easy IP integration with IP cores from many sources with different protocols, and an UVM verification environment.
John Bainbridge, staff technologist, CTO Office, Sonics Inc., said that it optimizes the system performance. Virtual channels offer efficient resource usage – saves gates and wires. The non-blocking network leads to an improved system performance. There are flexible topology choices with optimal network to match requirements.
Power management is key with advanced system partitioning, and an improved design flow and timing closure. Finally, the development environment allows easy design capture and has performance analysis tools.
For the record, there are several SoC integration challenges that need to be addressed, such as IP integration, frequency, throughput, physical design, power management, security, time-to-market and development costs.
SGN exceeds requirements
SGN met the tablet performance requirement with fabric frequency of 1066MHz. It has an efficient gate count of 508K gates. There are features such as an advanced system partitioning, security and I/O coherency. There is support for system concurrency as well as advanced power management.
Sonics offers system IP solutions such as SGN, a router based NoC solution, with flexible partitioning and VC (Virtual Channel) support. The frequency is optimized with credit based flow control.
SSX/SLX is message based crossbar/ShareLink solutions based on interleaved multi-channel technology. It has target based QoS with three arbitration levels. The SonicsExpress is for power centric clock domain crossing. There is sub-system re-use and decoupling. The MemMax manages and optimizes the DRAM efficiency while maintaining system QoS. There is run-time programmability for all traffic types. The SonicsConnect is a non-blocking peripheral interconnect.
Tensilica DPU solutions are meant for broad applications. It is focusing on three key verticals — Hi-Fi audio voice, IVP imaging and Diamond controllers, as well as the Xtensa. Tensilica will expand the Cadence IP footprint in SoCs. This compliments Cadence and Cosmic Circuits interface and analog IPs.
How does all of this fit into Cadence’s vision of an IP factory? According to Chris Rowan, founder and CTO, Tensilica, there will likely be an IP bazaar, architected for efficiency, quality and strong focus on integration. He was speaking on the concluding day of the 13th Global Electronics Summit at Santa Cruz, USA.
Complex imaging functions are now everywhere. There are some challenges here such as computational demands. The off-load opportunity means more operations, and lower power per operation.
The Tensilica IVP – image/video processing family consists of the IVP, a high-performance DSP subsystem. It is built for low energy handheld devices. It also has licensable, synthesizable core with rich software tools and libraries. The IVP core has 32 element engines. The IVP has many parallel ‘element engines’ + Xtensa control programmed as SMID uniprocessor. Application examples include feature detection, 3D noise reduction filter, and video stabiilizer.
IVP is meeting tomorrow’s imaging requirements. It is built for very high imaging efficiency. It is easy to program and is scalable — and can use multiple cores.There is a huge market in many applications. An example of how Tensilica will fit into Cadence’s IP factory is the DTV application.
Together, Cadence and Tensilica will increase customer value. They will accelerate the time-to-market with solution proven customizable design IP. There will be fully integrated data plane solutions for optimized solutions, power and area for various applications. High quality IP subsystems are tested to work optimally together. It is highly complementary to partner CPUs. It is also highly complementary to Cadence’s broad connectivity/AMS design IP, verification IP offerings, and foundry-qualified SoC design tools.
The partnership will also bolster Cadence as a next-generation IP provider. There will be an enhanced portfolio of advanced IP in advanced nodes spanning a wide range of applications. It will address seamless designs from architecture definition to silicon
tape-out. It will also strengthen solutions to address key market segments.
About 318 engineers and managers completed a blind, anonymous survey on ‘On-Chip Communications Networks (OCCN), also referred to as an “on-chip networks”, defined as the entire interconnect fabric for an SoC. The on-chip communications network report was done by Sonics Inc. A summary of some of the highlights is as follows.
The average estimated time spent on designing, modifying and/or verifying on-chip communications networks was 28 percent (for the respondents that knew their estimate time).
The two biggest challenges for implementing OCCNs were meeting product specifications and balancing frequency, latency and throughput. Second tier challenges were integrating IP elements/sub-systems and getting timing closure.
As for 2013 SoC design expectations, a majority of respondents are targeting a core speed of at least 1 GHz for SoCs design starts within the next 12 months, based on those respondents that knew their target core speeds. Forty percent of respondents expect to have 2-5 power domain partitions for their next SoC design.
A variety of topologies are being considered for respondents’ next on-chip communications networks, including NoCs (half), followed by crossbars, multi-layer bus matrices and peripheral interconnects; respondents that knew their plans here, were seriously considering an average of 1.7 different topologies.
Twenty percent of respondents stated they already had a commercial Network-on-Chip (NoC) implemented or plan to implement one in the next 12 months, while over a quarter plan to evaluate a NoC over the next 12 months. A NoC was defined as a configurable network interconnect that packetizes address/data for multicore SoCs.
For respondents who had an opinion when commercial Networks-on-Chip became an important consideration versus internal development when implementing an SoC, 43 percent said they would consider commercial NoCs at 10 or fewer cores; approximately two-thirds said they would consider commercial NoCs at 20 or fewer cores.
The survey participants’ top three criteria for selecting a Network on Chip were: scalability-adaptability, quality of service and system verification, followed by layout friendly, support for power domain partitioning. Half of respondents saw reduced wiring congestion as the primary reason to use virtual channels, followed by increased throughput and meeting system concurrency with limited bandwidth.
Functional verification is critical in advanced SoC designs. Abey Thomas, verification competency manager, Embitel Technologies, said that over 70 percent effort in the SoC lifecycle is verification. Only one in three SoCs achieves first silicon success.
Thirty percent designs needed three or more re-spins. Three out of four designs are SoCs with one or more processors. Three out of four designs re-use existing IPs. Almost all of the embedded processor IPs have power controllability. Almost all of the SoCs have multiple asynchronous clock domains.
An average of 75 percent designs are less than 20 million gates. Significant increase in formal checking is approaching. Average number of tests performed has increased exponentially. Regression runs now span several days and weeks. Hardware emulation and FPGA prototyping is rising exponentially. There has been a significant increase in verification engineers involved. A lot of HVLs and methodologies are now available.
Verification challenges include unexpected conflicts in accessing the shared resource. Complexities can arise due to an interaction between standalone systems. Next, there are arbitration priority related issues and access deadlocks, as well as exception handling priority conflicts. There are issues related to the hardware/software sequencing, and long loops and unoptimized code segments. The leakage power management and thermal management also pose problems.
There needs to be verification of performance and system power management. Multiple power regions are turned ON and OFF. Multiple clocks are also gated ON and OFF. Next, asynchronous clock domain crossing, and issues related to protocol compliance for standard interfaces. There are issues related to system stability and component reliability. Some other challenges include voltage level translators and isolation cells.
Where are we now? It is at clock gating, power gating with or without retention, multi-switching (multi-Vt) threshold transistors, multi-supply multi-voltage (MSMV), DVFS, logic optimization, thermal compensation, 2D-3D stacking, and fab process and substrate level bias control.
So, what’s needed? There must be be low power methods without impacting on performance. Careful design partitions are needed. The clock trees must be optimized. Crucial software operations need to be identified at early stages. Also, functional verification needs to be thorough.
Power hungry processes must be shortlisted. There needs to be compiler level optimization as well as hardware acceleration based optimization. There should be duplicate registers and branch prediction optimization. Finally, there should be big-little processor approach.
Present verification trends and methodologies include clock partitions, power partitions, isolation cells, level shifters and translators, serializers-deserializers, power controller, clock domain manager, and power information format – CPF or UPF. In low-power related verification, there is on power-down and on power-up. In the latter, the behavioral processes are re-enabled for evaluation.
Open source verification challenges
First, the EDA vendor decides what to support! Too many versions are released in short time frame. Object oriented concepts are used that are sometimes unfit for hardware. Modelling is sometimes done by an engineer who does not know the difference between a clock cycle and motor cycle! Next, there is too much of open source implementations without much documentation. There can be multiple, confusing implementation options as well. In some cases, no open source tools are available. There is limited tech support due to open source.
Power aware simulation steps perform register/latch recognition from RTL design. They perform identification of power elements and power control signals.They support UPF or CPF based simulation. Power reports are generated, which can be exported to a unique coverage database.
Common pitfalls include wrapper on wrapper bugs, eg. Verilog + e wrapper + SV. There is also a dependency on machine generated functional coverage goals. There may be a disconnect between the designer and verification language. There are meaningless coverage reports and defective reference models, as well as unclear and ambiguous specification definition. The proven IP can become buggy due to wrapper condition.
Tips and tricks
There needs to be some early planning tips. Certain steps need to be completed. There should be completion of code coverage targets, completion of functional coverage targets, completion of targeted checker coverage, completion of correlation between functional coverage and checker coverage list, and a complete review of all known bugs, etc.
Tips and tricks include bridging the gap between design language and verification language. There must be use of minimal wrappers to avoid wrapper level bugs. There should be a thorough review of the coverage goals. There should be better interaction between designer and verification engineers. Run using basic EDA tool versions and lower costs.
It is always a pleasure to chat with Dr. Wally (Walden C.) Rhines, chairman and CEO, of Mentor Graphics. I chatted with him, trying to understand gigascale design, verification trends, strategy for power-aware verification, SERDES design challenges, migrating to 3D FinFET transistors, and Moore’s Law getting to be “Moore Stress”!
Chip design in gigascale, hertz, complex
First, I asked him to elaborate on how implementation of chip design will evolve, with respect to gigascale design, gigahertz and gigacomplex geometries.
He said: “Thanks to close co-operation among members of the foundry ecosystem, as well as cooperation between IDMs and their suppliers, serious development of design methods and software tools is running two to three generations ahead of volume manufacturing capability. For most applications, “Gigascale” power dissipation is a bigger challenge than managing the complexity but “system-level” power optimization tools will continue to allow rapid progress. Thermal analysis is becoming part of the designer’s toolkit.”
Functional verification is continually challenged by complexity but there have been, and continue to be, many orders of magnitude improvement in performance just from adoption of emulation, intelligent test benches and formal methods so this will not be a major limitation.
The complexity of new physical design problems will, however, be very challenging. Design problems ranging from basic ESD analysis, made more complex due to multiple power domains, to EMI, electromigration and intra-die variability are now being addressed with new design approaches. Fortunately, programmable electrical rule checking is being widely adopted and will help to minimize the impact of these physical effects.
Is verification keeping up?
How is the innovation in verification keeping up with trends?
Dr. Rhines added that over the past decade, microprocessor clock speeds have leveled out at 3 to 4 GHz and server performance improvement has come mostly from multi-core architectures. Although some innovative approaches have allowed simulators to gain some advantage from multi-core architectures, the speed of simulators hasn’t kept up with the growing complexity of leading edge chips.
Emulators have more than made up the difference. Emulators offer more than four orders of magnitude faster performance than simulators and emulators do so at about 0.005X the cost per cycle of simulation. The cost of power per year is more than one third the cost of hardware in a large simulation farm today, while emulation offers a 12X savings in power per verification clock cycle. For those who design really complex chips, a combination of emulation and simulation, along with formal methods and intelligent test benches, has become standard.
At the block and subsystem level, high level synthesis is enabling the next move up in design and verification abstraction. Since verification complexity grows at about the square of component count, we have plenty of room to handle larger chips by taking advantage of the four orders of magnitude improvement through emulation plus another three or four orders of magnitude through formal verification techniques, two to three orders of magnitude from intelligent test benches and three orders of magnitude from higher levels of abstraction.
By applying multiple engines and multiple abstraction levels to the challenge of verifying chips, the pressure is on to integrate the flow. Easily transitioning and reusing verification efforts from every level—including tests and coverage models, from high level models to RTL and from simulation to emulation—is being enabled through more powerful and adaptable verification IP and high level, graph-based test specification capabilities. These are keys to driving verification reuse to match the level of design reuse.
Powerful verification management solutions enable the collection of coverage information from all engines and abstraction levels, tracking progress against functional specifications and verification plans. Combining verification cycle productivity growth from emulation, formal, simulation and intelligent testing with higher verification abstraction, re-use and process management provides a path forward to economically verifying even the largest, most complex chips on time and within budget.
Good power-aware verification strategy for SoCs
What should be a good power-aware verification strategy for SoCs
According to him, the most important guideline is to start power-aware design at the highest possible level of system description. The opportunity to reduce system power is typically an order of magnitude greater at the system level than at the RTL level. For most chips today, that means at least the transaction level when the design is still described in C++ or SystemC.
Significant experience and effort should then be invested at the RTL level using synthesis and UPF-enabled simulation. Verification solutions typically automate the generation of correctness checks for power-control sequences and power-state coverage metrics. As SoC power is typically managed by software, the value of a hardware/software co-verification and co-debug solution in simulation and emulation becomes apparent in power-management verification at this level.
As designers proceed to the gate and transistor level, accuracy of power estimation improves. That is why gate level analysis and verification of the fully implemented power management architecture is important. Finally, at the physical layout, designers traditionally were stuck with whatever power budget was passed down to them. Now,they increasingly have power goals that can be achieved using dozens of physical design techniques that are built into the place and route tools.
SuVolta Inc., based in California, USA, develops and licenses CMOS semiconductor technologies that significantly reduce the power consumption of integrated circuits (ICs). Back in June 2011, introduced the PowerShrink low-power platform and the first licensee, Fujitsu. Thanks to Amanda Crnkovich of The Hoffmann Agency, I interacted with Dr. Scott E. Thompson, CTO, SuVolta, on the deeply depleted channel (DDC) technology that delivers over 50 percent reduction in IC power consumption, while maintaining performance.
What’s DDC technology all about?
First, I asked Dr. Thompson what the DDC technology is all about? He said that SuVolta’s PowerShrink platform in planar, bulk CMOS provides dramatic improvements in variability and device performance, and is compatible with existing CMOS processes. It integrates using conventional fabrication equipment and materials, and enables the reuse of existing circuit IP infrastructure. SuVolta is focusing on solving the power problem in system-on-chips (SoCs) across multiple CMOS process technology nodes.
He added: “SuVolta’s DDC transistor reduces threshold voltage (VT) variability and enables continued CMOS scaling. The structure works by forming a deeply depleted channel when a voltage is applied to the gate. In a typical implementation the DDC channel has several regions – an undoped or very lightly doped region, a VT setting offset region and a screening region. Each implementation of SuVolta’s DDC transistor may vary depending on the wafer fabrication facility and specific chip design requirements.”
The DDC transistor has a much tighter distribution of threshold voltages. In addition, DDC transistors allow for the setting of multiple VTs, which is vital for today’s low-power products.
“Perhaps, the biggest benefit is in embedded SRAM memory blocks. For most chips, lowering supply voltage is limited by the SRAM. However, with a DDC transistor, conventional 6T SRAMs have been demonstrated operating below 500 milli Volts. This is significant as it is amongst the lowest voltage ever reported in a standard embedded SRAM,” added Dr. Thompson.
Impact on reducing IC power consumption in devices
So, what impact will all of this have on reducing IC power consumption in devices, such as smartphones, tablets, etc.? While the increased density in transistors enables more features for all types of devices, power has now become the biggest issue in semiconductors. This “power impasse” is critical or two reasons:
* Excessive power consumption limits battery life for mobile devices, and causes huge electricity bills for server farms.
* Devices are hitting their thermal (heat) limit, thus preventing more capabilities from being added. Power consumption directly creates heat. This is becoming a major problem in mobile devices, which have very strict thermal limits. To hit thermal limits, chip makers must forego adding additional content, or “throttle” the chip back to a slower speed.
The impact of excess power on consumers is profound: shorter battery life, lower-content mobile devices – fewer features and/or slower performance, higher electronics costs because transistors hit their scaling limit because of power, excessive energy bills and an increased global demand for energy.
Dr. Thompson added: “SuVolta’s PowerShrink platform enables semiconductor firms to cut chip power in half without sacrificing performance, losing functionality, or migrating to a more advanced, and costly, semiconductor process node. And, it does so using planar, bulk CMOS, and does not require development of new manufacturing facilities or IP blocks.” Read more…
Altera Corp. has introduced SoC FPGAs that integrates an ARM processor with the FPGA. The SoC FPGAs are said to deliver reduced board space, power and system costs, as well as increased performance. Altera also launched the FPGA industry’s first Virtual Target that enables immediate device-specific application software development prior to hardware availability.
The ARM-based FPGAs integrate 28-nm Cyclone V and Arria V FPGA fabric, a dual-core ARM Cortex-A9 MPCore processor, error correcting code (ECC) protected memory controllers, peripherals and high-bandwidth interconnect into a single chip. The Cyclone V and Arria V SoC FPGAs further extend the portfolio’s reach into the embedded processing market. Embedded developers needs include increased system performance, reducing system power, and reducing board size as well as system cost. ARM + Altera = SoC FPGAs.
The SoC FPGA family highlights include the dual-core ARM Cortex-A9 MPCore processor, which includes hard memory controller, peripherals and high-bandwidth interconnect. Altera’s 28-nm FPGA fabric involves the Cyclone V SoC FPGA the and Arria V SoC FPGA, respectively. ARM’s ecosystem and Altera’s hardware development flow is also featured in the form of the Quartus II software and Qsys system integration tool. These SoC FPGAs are also said to have a proven virtual prototyping methodology in the form of SoC FPGA Virtual Target for device-specific software development.
The ARM processor has been combined with hard IP. The SoC FPGA uses the dual-core ARM Cortex-A9 MPCore processor that features 800 MHz per core (industrial grade), NEON media processing engine, single/double precision floating point unit (FPU), 32-KB/32-KB L1 caches per core and ECC-protected 512-KB shared L2 cache. The hard IP features multi-port memory controller with ECC, such as DDR2/3, mobile DDR, LPDDR2, as well as QSPI, NAND flash, NOR flash memory controller with ECC, and a wide range of common peripherals.
The advanced 28nm low-power (28LP) FPGA fabric is the optimal choice for addressing today’s power- and cost-constrained applications and boasts the lowest absolute power. The hard IP features up to three memory controllers with ECC, variable precision DSP technology, up to two hard PCIe Gen 2 x4 and high-speed transceivers operating up to 10 Gbps. Read more…
According to Patrick Maccartee, director of product management and James Ready, CTO, Monta Vista, Monta Vista virtualization can be realized. The benefits to developers are clear in terms of lowered complexity, flexibility in development, high performance, first Linux configured for dataplane performance.
These were the conclusions from the seminar, where I was an invited audience, on Beyond Virtualization: The MontaVista Approach to Multi-core SoC Resource Allocation and Control.
Use cases for virtualization in the IT world include server consolidation, underutilization, management of numerous OSs and dependant applications.
Hardware considerations include very uniform server hardware platforms, especially, I/O, and an extensive processor support for virtualization. There also exists a huge uniform market for virtualization, with numerous successful companies of very large scale.
Embedded is different yet again. Embedded devices are already highly optimized, especially, in terms of size, power consumption, CPU utilization, etc. No layer of software makes a processor go faster. So far, it is not a big market.
Multi-core does not automatically mean either RTOS for data plane, hypervisors/virtualization and multiple OSs. In this scenario, what’s useful for embedded virtualization? The answer is MontaVista virtualization architecture
At the fag end of day 1 of CDNLive India 2010, I had the opportunity to interact with John Bruggeman, CMO, Cadence Design Systems and Rahul Arya, director, marketing and technology sales, Cadence Design Systems (I) Pvt Ltd.
A week ago, I’d written a post: Is social media really helping semicon/VLSI firms? Of course, there was a session organized by EDA Consortium (EDAC), titled: Does Social Media Reach the Engineers You Want or Waste Your Time?
Having earlier had a chat with Karen Bartleson, a panelist at the EDAC event, I thought it best to get John’s views on some of the issues, since the EDAC panel had representation from Cadence (it wasn’t John) as well!
Lot more needs to be done on social sites
First, it is well known that the adoption of social media is at its infancy in the semicon.VLSI industry. In some other industries, the adoption is much faster. Why has it been this way, so far?
Bruggeman said: “We have an ageing population in our design community, more so than the other technology industries. So, we have been slower in adopting. The pickup on Twiter has been slow.
“We need to do whatever we can do to accelerate. We have heavily invested in bloggers and are also into driving social media. Cadence has two bloggers on staff. The blogs are promising. However, in some of the social media sites, a lot more needs to be done.” That’s quite an honest answer!
Are you building communities?
So, how are semicon/VLSI firms using the social media to build communities? Are you building or attempting to build communities? What is that particular community doing?
He added: “We need to figure out how, as an industry, should we use social media. How do you get a community of users to engage in an open dialog? We haven’t got anywhere near at developing a community. We also have to expand beyond blogging.”
Is the social media really helping reach out to design engineers? Are companies hiring via social media sites?
According to Bruggeman, every recruiter of note is now involved in LinkedIn. “Hirings are happening there. Design engineers are also going there to get hired, and not merely for free exchange of information. This is where engineers can talk to engineers,” he noted. “However, it will be interesting to see whether a commuity can be developed. So far, social media has managed to reach out to design engineers only a little bit.” Read more…