Toyota Case: Single Bit Flip That Killed

출처: EETimes


Bar라고 하는 애들이 도대체 도요타에게 무슨 짓을 저지른 것일까? 저렇게 시험해서 온전하게 돌아가는 제품이 있을까? IMF때 금리올리고 회사 다 망하게 놔두고 국유화하지 말라고 한 그들이 정작 자기네들 문제생겼을때는 금리내리고 회사 국유화시키고 … 한 것처럼 그렇게 하면 망하는 것을 다른 나라에게 요구하는 것인가? 생각이 들기도 한다.

미국내 제품을 대상으로 시험해도 과연 통과할까? …

Task X death
Now that the experts’ testimony and findings have been made public through the Oklahoma trial, let’s get into details. What defects were found in Toyota’s electronic throttle control systems?

Barr said that the 2005 Camry L4 source code and in-vehicle tests by the experts confirmed that some critical variables are not protected from corruption, and sources of memory corruption are present. He believes that Toyota’s engineers sought to protect numerous variables against software- and hardware-cause corruptions, but they failed to mirror several key critical variables, and they made no hardware protection available against bit flips.

Stack overflow and software bugs led to memory corruption, he said. And it turns out that the crux of the issue was these memory corruptions, which acted “like ricocheting bullets.”

중요변수에 대한 FMEA를 하는 것이 필요하다는 의미이다. 하면 품질관점에서 좋아지긴 하겠지, 근데 그 기준이라는 것이 상당히 결정하기 어려운 문제인데, 위의 문구만 보면 Detailed design수준의 FMEA가 필요한 것처럼 보인다.

Stack overflow및 memory corruption에 대한 대비책을 위해서 protection을 hw level 뿐만 아니라 sw level에서 해야 한다.


Barr explains the issue this way:

Memory corruption as little as one bit flip can cause a task to die. This can happen by hardware single-event upsets — i.e., bit flip — or via one of the many software bugs, such as buffer overflows and race conditions, we identified in the code.

There are tens of millions of combinations of untested task death, any of which could happen in any possible vehicle/software state. Too many to test them all. But vehicle tests we have done in 2005 and 2008 Camrys show that even just the death of Task X by itself can cause loss of throttle control by the driver — even as combustion continues to power the engine. In a nutshell, the fail safes Toyota did install have gaps in them and are inadequate to detect all of the ways UA can occur via software.

메모리 bit flip에 의해서 task가 die에 이르는 문제는 아주 심각한 문제인데, mcu에서 이런 문제를 해결하기 위한 설계가 필요할 뿐만 아니라 OS에서도 cover해줘야 한다. 그런데, …. 이건 정말 맘먹고 털려고 작정해서 턴게 아닌가? 이런 생각이 든다. 아직까지 그와 같은 엄격한 수준의 상세화가 표준화되지도 않았는데, 과연 누가 저렇게까지 설계를 했을까 생각이 들기도 한다.. 그런데 누군가는 그렇게 했을지도 모르지..

Just to clarify, the “tasks” are equivalent to apps running on smartphones or PCs. All software malfunctions from time to time — we often have to reboot our machines. The 2005 Camry L4 has a set of dozens of apps (or tasks). Because they are all meant to be running always, the death of one could have dire consequences.

When asked if the whole case for unintended acceleration could be pinned on the task X death, Barr replied, “The task X death in combination with other task deaths.” There are dozens of tasks and 16 million different ways those tasks can die. The experts group was able to demonstrate at least one way for the software to cause unintended acceleration, but there are so many other ways that could have happened.

Barr also said more than half the dozens of tasks’ deaths studied by the experts in their experiments “were not detected by any fail safe.”


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s