카테고리 보관물: Hardware design

Reduce common-cause failures for robust redundancy


출처:http://www.edn.com/design/integrated-circuit-design/4433315/Reduce-common-cause-failures-for-robust-redundancy

CCF를 줄이기 위한 방법

  • 하드웨어 유닛을 공간적으로 분리시킨다.
  • 중복 컴포넌트의 합성(synthesis)을 분리시킨다 – 중복되는 컴포넌트들끼리 합성하면 합성 컴포넌트에 대해 CCF가 생길 수 있지.
  • Clock tree및 Clock monitoring – 모니터링을 하고자 하는 block에 최대한 가깝게 모니터링을 한다.
  • 전압 모니터링 – 저전압 탐지, 고전압 탐지
  • 온도 모니터링 – 고온으로 인한 CCF 발생 우려가 있으므로
  • 공통 신호 모니터링 – non safety라고 생각해서 common mode signal에 대한 대비를 하지 않으면 털린다.  예를 들면 scan, test, debug
  • I/O 배치- 뭉치면 죽고 흩어지면 산다.

Product how-to: Achieving STARC and DO-254 compliance using HDL Coder-generated code


출처: http://www.edn.com/design/integrated-circuit-design/4437919/Product-how-to–Achieving-STARC-and-DO-254-compliance-using-HDL-Coder-generated-code

 

HDL용 coding guideline이 존재한다. 없을거라고 생각하진 않았지만,

STARC DSG VHDL Rules
STARC DSG Verilog Rules

Model based development에서 c용 source code가 생성되는 것뿐만 아니라 HDL code가 생성되다 보니까 c 스타일의 개발과 유사하게 흘러가는 듯하다. low level의 source code라는 점에서는 크게 다를바는 없다고 생각한다. detail하게 들어가면 달라질 수 있겠지만,,

 

전체적으로는 Mathworks의 HDL development 프로세스를 지원하는 도구의 활용 방안을 설명해 놓은 article이라고 볼 수 있겠다.

HDL용 Coding guideline이 있다는 정보를 get함

 

Some of the RTL coding standards that HDL Coder generated code is built on:

  • DO-254: The DO-254 standard was formally recognized by the FAA in 2005 via AC 20-152 as a means of compliance for the design of complex electronic hardware in airborne systems. Complex electronic hardware includes devices like Field Programmable Gate Arrays (FPGA), Programmable Logic Devices (PLD), and Application Specific Integrated Circuits (ASIC).
  • STARC: Semiconductor Technology Academic Research Center (STARC) policy guidelines are an extensive set of rules that ASIC and system on chip (SoC) designers use to perform in-depth structural analysis on Verilog and VHDL Register Transfer Level (RTL) descriptions. STARC guidelines are compiled by consortium of 11 major Japanese semiconductor companies that promote a design standard for IP trade and reuse.
  • RMM: The Reuse Methodology Manual (RMM), 3rd Edition, (ISBN 1-4020-7141-8), outlines a set of best practices for creating reusable designs for use in SoCs.

 

 

Understanding the Impact of Silicon Errors on Functional Safety Standards Compliance


Functional Safety를 위해 HW level에서 필요한 일들에 대해서 잘 정리한 자료이다.

good starting point가 될 수 있을 것이라 생각한다.


추가(2016.12.16)

링크가 깨져서 더이상 자료가 유효하지 않음. 그렇다고 남의 자료 올리기는 좀 그렇고..
자료가 필요하신 분은 댓글을 …

 


Faults in a functional safety system can be broadly classified into two categories:
Systematic and Random faults

Safety Faults and their Causes
Faults
Systematic Random
Systematic Faults
– Result from a fault in design or manufacturing
– Often a result of failure to follow best practices
– Rate of systematic faults can be reduced through continual and
rigorous process improvement
Random Faults
– Result from random defects inherent to process or usage condition
– Rate of random faults cannot generally be reduced; focus must be on
the detection and handling in the application.
Systematic or Random Fault?
• Use of IC outside of datasheet specification? Systematic
• SEU corruption of a memory element? Random
• Logic fault upon exercising a specific path? Either
• Data corruption in presence of EMI? Either
• ESD damage to an I/O? Either

• Both systematic and random faults must be addressed
• Systematic faults are addressed by:
– Application of robust development processes
– Verification and validation activities at all levels of development
– Continuous monitoring of operations once in production
• Random faults are addressed by:
– Architectural analysis to understand impact of faults on the system
– Application of diagnostics to detect critical faults
– Transition of system into a safe state upon fault detection

ISO 26262-5:2010 notes four fault models:
– “Stuck-at” fault model: “… a fault category that can be described with
continuous “0” or “1” or “on” at the pins of an element.”
– “D.C.” fault model: “includes the following failure modes: stuck-at faults,
stuck-open, open or high impedance outputs, as well as short circuits
between signal lines.”
– “A.C.” fault model: “transition faults … and path delays”
– “Soft error” fault model: “…These transient faults are also referred to as
Single Event Upset (SEU) and Single Event Transient (SET).”
• Such statements are the result of compromises in committee to attempt
to find common definitions for all types of electrical and electronic
hardware.
• Any relevant failure modes known to the hardware developer or in the
state of the art should be considered in the functional safety analysis.

JEDEC Publication
– Failure Mechanisms and models for Semiconductor Device(JEP122G)
– Measurement and reporting of alpha particle and terrestrial cosmic ray-induced soft errors
in semiconductor devices(JESD89A)

Establishing Base Failure Rates
– IEC TR 62380
– SIEMENS SN 29500-2
– FIDES Guide 2009 Reliability Methodology for Electronic Systems

In a perfect world, we could establish a base
failure rate for every failure mode of an IC.
• From a practical standpoint, we estimate using models, field data, and published
standards.
• Partition of failure rates between elements on an IC is typically done via circuit type,
die area, and/or transistor count

Safe vs. Dangerous Faults

A safe fault is one which does not result in propagation to a safetyrelated
failure at the system level.
• Determination of safe vs. dangerous faults is primarily based on end application usage of the product.
• An “architectural safeness” factor can be established via fault injection and engineering analysis, providing a baseline for minimum
percentage of safe faults.
– Typical architectural safe faults include faults in debug and DFT logic which is not activated during normal system operation.
• Fault injection techniques applied during IC simulation or on functional models can be used to establish the ratio of safe vs. dangerous faults.

Considering Timing Aspects in ISO 26262

As illustrated in ISO 26262-1:2010; Figure 4, a system must be able to
detect faults and transition to safe state before a fault can become a
system level hazard (fault tolerant time interval or FTTI)
• To impact SPFM, any diagnostics must execute within the FTTI
• To impact LPFM, any diagnostics must execute once per drive cycle

Diagnostic Timing Implications

• Shorter FTTI safety goals result in strong demand for parallel redundancy based diagnostics.
– Airbag systems ( – Steering and braking (10ms – 100ms FTTI) converging on lockstep CPUs
• Longer FTTI safety goals allow more flexibility in diagnostic selection
– RADAR and vision ADAS systems (500ms+ FTTI) tend to rely on multiple
samples of input sensors and software based diagnostics
• Though it is attractive to repurpose DFT logic for functional safety diagnostics the execution timing aspects may prohibit effective use as run time diagnostics.

Selection of Diagnostics

• ISO 26262 does not mandate specific diagnostics to be implemented.
• It does recommend specific diagnostics for several different types of basic design elements, as illustrated in ISO 26262-5; Table D.6
• The IC developer must demonstrate the performance of implemented diagnostics. Typically this is done with fault injection testing.

Safety Manuals and Analysis Reports

• Safety manuals are created to provide instructions to the system integrator of how to use
the safety features of the IC.
• Analysis reports allow the system integrator to derive safety related metrics which
can be applied to their system level analysis.

ISO/AWI 19451 “Application of ISO 26262 to Semiconductors” currently has over 20 semiconductor companies participating.

– If you would like to participate, please contact me offline