It might have been bad solder. Some of that lead-free stuff is utter garbage and could have failed mechanically (of it hadn’t been on right) or reflowed due to the heat.
Most chip failures do not surface as "just stop working" - they surface as the chip overheating much more often. Or the chip over-volting other components. Or failing to throttle down, resulting in excessive battery wear. Or surfacing the occasional computation error. Or some feature just no longer being available.
Also remember: batteries, screen, BIOS... all are controlled by their own chips these days; chips whose failures are attributed to the component they're attached to.
> I don't think I've ever heard of someone's phone just not working mysteriously.
My Nokia 808 suddenly died after a couple of years. Blank screen, did not react to any button presses or reset attempts. Even after replacing the battery.
Bootloop is down to hardware ageing, specifically solder, as I understand it. System resets etc. don't fix it. I've had two phones die from it. The cure for the brave involves reflowing the solder in an oven.
The Nexus 5x CPU solder failed pretty regularly at 1 year. I'm not sure if that counts as the chip lasting that long, because the CPU was technically fine, just poorly soldered, but the product wasn't working.
This is a real thing. For game consoles, which run their chips pretty "hot" in order to meet the performance requirements for high-end gaming, the main SOCs have a design lifetime. There are deliberate trade-offs between performance and the expected lifetime of the part. This lifetime is difficult to estimate prior to ship.
One console made a firmware update about a year after its release that increased its clocks by about ten percent. I'm guessing they did this after seeing telemetry numbers indicating that the box's cooling system was doing a better job than expected, and that there was lifetime headroom available (probably other factors were involved, too, but heat is a major one).
Consoles are pretty serious about effective cooling. There is definitely price pressure, but that's not a one-sided thing -- you need to make sure that the unit lasts long enough, and that its cooling system doesn't generate too much noise for the environment (typically a quiet living room). And then you need headroom for envrionmentals (e.g., being stuff into the back of an entertainment unit next to other equipment). The current generation of consoles are quiet compared to the last generation, and more powerful.
Cooling "needs" have to take into account the whole product and user experience, not just keeping a single chip from melting down.
I think all the console makers learned a lesson from the Red Ring Of Death. No one wants to be having a massive portion of their systems become useless hunks of scrap because of poor thermal design not dissipating enough heat and causing the solder to crack.
RROD cost Microsoft a billion dollars, maybe two billion. That doesn't scream "Please shave the cooling system down to the absolute minimum cost on the next console" to the hardware team.
The XBone cooler is a pretty decent one, because the alternative is a ton of warranty work, plus lawsuits and bad press. And worse, a bad customer experience.
It's not just about the CPUs or APUs, which generally have very high yield and long lifetimes. It's all the other chips. SSD begins exhibiting errors after extensive use. Other chips that are considered good enough are at the high end of their acceptable performance tolerances when they are included in devices, but the effects are additive. So the more chips running at the high or low end of what is considered acceptable, the more likely that problems will crop up over time.
Silicon chips age due to electromigration, which is exacerbated by small feature sizes. Chips made 20 year ago could take decades for enough migration to cause failure which is why you’ve never known or cared before. Today due to the much smaller processes we use it’s closer to a few years.
In the past, you might have automotive-grade microcontrollers for functions like ABS, which consume less than a Watt maximum. You paid a little attention to having enough ground vias on the PCB for thermal conductivity, and that was about it to qualify the design for the high end of the temperature range.
Degraded lifetime wasn't so much of a concern.
These days, you've got ARM processors with a TDP of 15W or more, and keeping the die temperatures below the maximum when the ambient temperature is 85C, well, that starts to get interesting. Especially if you don't want to use a fan, and the processor is stuck somewhere without adequate airflow.
And then you've got high-end systems with a TDP in the 150W range. Then you've got to have a good cooling solution to run your application at office environment temperatures, nevermind the full automotive temperature range. And what's going to be the lifetime for these parts running at elevated temperatures, even if you are staying within the maximum temperature limits? Sigh
Yep. The junction temperature is rated up to that. What I'm saying is that even so, thermal design is a challenge. And I have concerns about chip lifetime under those sorts of conditions.
As more critical functionality in cars comes to be controlled by MCU/CPUs and software over ECUs and mechanical components, I wonder if software obsolescence will have a larger impact than potential chip failure from increased use.
It's not hard to imagine a future where Tesla stops developing and pushing updates for cars after 5-10 years, basically forcing an owner to trade up as the hardware in older models becomes incompatible with the latest self-driving algos. It's also not hard to imagine a future where Amazon/Uber/Bigco own and update large fleets of vehicles, maintaining and renting them out much like AWS does compute cycles, taking the burden off the user to maintain their own hardware.
> As more critical functionality in cars comes to be controlled by MCU/CPUs and software over ECUs and mechanical components, I wonder if software obsolescence will have a larger impact than potential chip failure from increased use.
I don't really see that. Once the software's out there and works, it'll work even if it isn't updated with the latest feature set. I can't really imagine the government changing self-driving car standards so fast that a non-updated car could not operate safely and would have to be retired.
Any broken "cloud server dependence" would probably cause so much weeping and gnashing of teeth that Tesla would be foolish to try it, and again, I doubt the government would allow it for safety reasons.
There's the flip side of that too: they can be "stored" in environments which are occasionally in excess of -40°C, and have to function properly then as well (and be subsequently heated back up to above 100°).
15W chips use ~2% of a HP each, and 150W = 1/5 horsepower worth of parasitic load. So using several high end chips should result in significantly worse fuel efficiency.
Are they really necessary or just cheaper in R&D terms?
Modern cars have tons of advanced features that require a decent amount of computing. I mean just look at the evolution from radio cassette players to modern integrated GPS/media player/bluetooth/etc systems. Also things like automatic parking assist, rear view camera, computer vision algorithms to detect if the driver is falling asleep. It's feature creep all over the place.
> Also things like automatic parking assist, rear view camera, computer vision algorithms to detect if the driver is falling asleep. It's feature creep all over the place.
You have a weird definition of "feature creep". Some of those things you listed save money, others save lives. I definitely wouldn't include such things in "feature creep".
Sorry, it might not have been the right word. I didn't mean that these features weren't useful, only that in the past decades the past decades the amount of software in cars really exploded from basically 0 to having a bunch of fully featured computers dealing with various subsystems, from fuel injection to radio playback. Software "creeped" everywhere.
Going by your numbers and my best guess, I think the processing of self driving cars needs 1/4 hp. The first few generations will need more than that. (you can get by on less if you are willing to sacrifice safety)
It takes a lot of processing to detect "objects" and figure out which are of interest. Give me a data structure of objects; with their meaning (a place that can hold the car without getting stuck) and their possible future behaviors (where it can actually get to given the laws of physics and reasonable assumptions) and a self driving car is easy. Getting that list is hard, and frankly beyond my knowledge.
Mostly I'm taking how much energy humans use, and we do a bad job...
The turn to commercial off the shelf (COTS) parts to cut costs has raised the cost of using specialized ones (or caused people to have to develop custom solutions at great cost due to market gaps).
High temperature tested CPUs (above consumer grade) is a good example.
IOT causing more and more embedded technology might help change that tide, due to the need for components that can handle extreme and longer term deployments. But, the last so many years of the push to cut costs through COTS has come at a big cost for sustainability.
Chips aging aside, tin whiskers also are one of the main reasons why manufacturers shouldn't use electronics except where it is really really really necessary, especially on things that move at high speeds with humans on board or in vicinity. NASA itself -which use electronics on stuff that gets in orbit- did some good research on the subject.
https://nepp.nasa.gov/Whisker/background/index.htm
Tl;dr: Tin whiskers are very thin spontaneous metallic formations which can short nearby pcb tracks or conductors and are believed to be the cause of many failures in electronic devices. There is no evidence of a single cause for their formation but it seems certain that eliminating lead (RoHS legislations etc.) from solder is one of them.
Can't they coat the PCBs to mitigate/elminiate the problem?
I remember watching a teardown of a spare-no-expense embedded military computer, and the guy couldn't stop talking about how much "conformal coating" the board had.
From what I understand, conformal coating is intended to protect against external agents (moisture, etc), and it does the job well, but it's not equally effective when the problem is between adjacent conductors of the same pcb; for that purpose the substance should be able to penetrate the tiny space between chip pins. It would probably help to some extent though.
Depends. If its dipped (or just sprayed on) it won't get inbetween fine pitches, but application in a vacuum will force it into every tiny space. It's essentially the same as vacuum-potted transformers.
Typically, the coating is Parylene, which is a room temp vapor phase conformal coating (e.g. roughly uniform thickness independent of the contact angle of the materials). It's used for corrosion, moisture, vibrational. wear, and reduced breakdown voltage, etc.
It is not used for Sn (tin) whiskers at all, since it wouldn't penetrate the pads/package of the IC and wouldn't stop such a high modulus process in any case. Whiskers are most problematic with modern Pb-free solders and fine pitch SMT practices (QFNs & BGAs), but it's worth noting that other materials (like Zn) also have significant issues.
> The coating completely penetrates spaces as narrow 0.01mm.
I would have thought that would make it very effective in protecting against whiskers in QFPs that might have a lead pitch of 0.4mm or larger. There's not a lot you can do under the bodies of BGAs and QFNs, but if you're worried about whiskers you're probably using QFPs instead.
This app note suggests that Parylene and a few other dip-type conformal coatings do slow down tin whiskers, but don't stop their spread:
I’ve read about tin whiskers extensively, and the consensus seems to be that while conformal coatings will stop tin whiskers from falling accross two leads, they won’t stop a whisker that’s growing from one lead toward another - they will just pierce right through it.
Conformal coating is standard for PCBs going into space. Lead solder (not RoHS) is used in assembly. For commercial BGA packages, companies specialize in de-balling the RoHS balls and replace them with lead balls.
I happen to know that the design life for the Ford EEC IV, the ignition control system for 1980s Fords, was 30 years. That was achieved; many 30-year old Fords are still running with the original electronics.
article says "Chips developed for computers and phones were designed to operate at peak performance for an average of two to four years of normal use. After that, functionality began to degrade"
Enterprise ASICs (i.e. routers) already run at these very hot (and cold) temperatures on the very latest nodes. They also demand reliability over long periods - although the consequences of failing to meet that are obviously not as severe. Ageing is much worse at 7nm but it's already accounted for during STA. Like everything else, you just assume the worst and it's another chunk of your clock period you never see. But expecting to run above 125C... not sure about that.
Err what?
reply