Chip Aging Accelerates

IshKebab | karma 13023 | avg karma 1.29 · 2018-02-15 13:05:12+00:00

> Chips developed for computers and phones lasted an average of two to four years of normal use.

Err what?

reply

dyeje | karma 2354 | avg karma 2.17 · 2018-02-15 13:13:02+00:00

I wonder if the average is skewed because so many people buy new phones every year / every other year.

nordsieck | karma 6553 | avg karma 3.64 · 2018-02-15 08:02:43

Still, it's not like the chips are failing. People typically get a new phone because:

* Battery is too degraded

* Hardware is unsupported by the current OS version

* Screen failure

* Water damage (not as much with newer water resistant phones)

* Want to be able to run apps faster

* OS/Firmware failure (e.g. bootloop)

* Want to own the latest phone (non-performance reasons)

* Want a phone with a new hardware feature (e.g. wireless charging)

I don't think I've ever heard of someone's phone just not working mysteriously.

reply

kuon | karma 2114 | avg karma 3.05 · 2018-02-15 14:12:31+00:00

It is not a phone, but my coffee machine electronic just died (but after 7 years). It had no visible damage (water, electric shock).

After changing the main PCB it was working again, like new.

I regret not playing more with the old PCB though, like checking capacitors or resistors values.

reply

ComputerGuru | karma 29702 | avg karma 4.11 · 2018-02-15 14:58:50+00:00

It might have been bad solder. Some of that lead-free stuff is utter garbage and could have failed mechanically (of it hadn’t been on right) or reflowed due to the heat.

organsnyder | karma 5646 | avg karma 3.59 · 2018-02-15 15:23:35+00:00

I fixed our 20-year-old range by replacing capacitors. They failed exactly like the caps on early-mid-2000's motherboards.

kuschku | karma 11518 | avg karma 1.46 · 2018-02-15 09:46:32

Yeah, I’m right now using a 14 year old monitor, and I’ve replaced two caps, but it’s still running.

Next to it I have a 6 year old monitor, in which I can’t replace the caps because it’s glued together.

reply

marcosdumay | karma 27273 | avg karma 1.67 · 2018-02-15 08:18:27

Besides, when something fails, it's almost always capacitors, mechanical devices or FLASH memory.

Unless those "chips" the article is talking about is only FLASH, it's a very weird statement.

reply

pvtmert | karma 137 | avg karma 0.88 · 2018-02-15 14:30:37+00:00

actually i had s4 mini and it died after 3 or so years, no booting, black screen only. power button does nothing and computer doesnt see.

HankB99 | karma 1024 | avg karma 1.76 · 2018-02-15 14:44:41+00:00

Did you diagnose the cause of the failure? My guess is it was a failed power supply and that likely resulted from failed capacitors.

falcolas | karma 33589 | avg karma 3.15 · 2018-02-15 16:27:27+00:00

Most chip failures do not surface as "just stop working" - they surface as the chip overheating much more often. Or the chip over-volting other components. Or failing to throttle down, resulting in excessive battery wear. Or surfacing the occasional computation error. Or some feature just no longer being available.

Also remember: batteries, screen, BIOS... all are controlled by their own chips these days; chips whose failures are attributed to the component they're attached to.

reply

btashton | karma 568 | avg karma 4.44 · 2018-02-15 11:01:54

I assume by bootloop you are talking about the LG failure that impacted the Nexus 5x among others. That _was_ attributed to hardware failure.

https://en.m.wikipedia.org/wiki/LG_smartphone_bootloop_issue...

reply

pps43 | karma 644 | avg karma 1.49 · 2018-02-15 12:13:15

> I don't think I've ever heard of someone's phone just not working mysteriously.

My Nokia 808 suddenly died after a couple of years. Blank screen, did not react to any button presses or reset attempts. Even after replacing the battery.

reply

barrkel | karma 34063 | avg karma 3.87 · 2018-02-15 19:08:16+00:00

Bootloop is down to hardware ageing, specifically solder, as I understand it. System resets etc. don't fix it. I've had two phones die from it. The cure for the brave involves reflowing the solder in an oven.

ekianjo | karma 34827 | avg karma 2.44 · 2018-02-15 14:00:52+00:00

The used market for computers strongly suggests otherwise...

venomsnake | karma 6230 | avg karma 1.59 · 2018-02-15 16:20:36+00:00

Go to task manager. See how much of the time you are using less than 10% of the CPU

blattimwind | karma 8132 | avg karma 3.26 · 2018-02-15 14:01:41+00:00

Survivorship bias. A lot of low-end gear dies in this time frame due to insufficient cooling, badly designed cases etc.

Neil44 | karma 2216 | avg karma 3.13 · 2018-02-15 14:17:14+00:00

I agree, when I’m troubleshooting a general rule is ‘it’s never the CPU’. I think the writer has misinterpreted something to derive that statement.

carlmr | karma 7214 | avg karma 2.48 · 2018-02-15 14:44:43+00:00

From the rest of the article I would have said 2 to 4 years of use at a high utilization. In that case maybe.

It's never the CPU has been my rule as well, except for a new build. Then it might be the CPU.

reply

toast0 | karma 25207 | avg karma 2.17 · 2018-02-15 14:57:19+00:00

The Nexus 5x CPU solder failed pretty regularly at 1 year. I'm not sure if that counts as the chip lasting that long, because the CPU was technically fine, just poorly soldered, but the product wasn't working.

thinkythought | karma 177 | avg karma 1.29 · 2018-02-15 15:29:31+00:00

See also: The xbox 360 fiasco

kabdib | karma 15100 | avg karma 5.39 · 2018-02-15 09:04:43

This is a real thing. For game consoles, which run their chips pretty "hot" in order to meet the performance requirements for high-end gaming, the main SOCs have a design lifetime. There are deliberate trade-offs between performance and the expected lifetime of the part. This lifetime is difficult to estimate prior to ship.

One console made a firmware update about a year after its release that increased its clocks by about ten percent. I'm guessing they did this after seeing telemetry numbers indicating that the box's cooling system was doing a better job than expected, and that there was lifetime headroom available (probably other factors were involved, too, but heat is a major one).

reply

ethbro | karma 13805 | avg karma 2.91 · 2018-02-15 16:22:50+00:00

I'd be charitable and say that gaming chips run hot because cooling systems are "optimized", not for performance reasons.

And by optimized, I mean shoehorning the smallest / cheapest solution in there that barely meets cooling needs.

reply

kabdib | karma 15100 | avg karma 5.39 · 2018-02-15 16:30:39+00:00

Consoles are pretty serious about effective cooling. There is definitely price pressure, but that's not a one-sided thing -- you need to make sure that the unit lasts long enough, and that its cooling system doesn't generate too much noise for the environment (typically a quiet living room). And then you need headroom for envrionmentals (e.g., being stuff into the back of an entertainment unit next to other equipment). The current generation of consoles are quiet compared to the last generation, and more powerful.

Cooling "needs" have to take into account the whole product and user experience, not just keeping a single chip from melting down.

reply

lione | karma 63 | avg karma 1.11 · 2018-02-15 13:05:43

I think all the console makers learned a lesson from the Red Ring Of Death. No one wants to be having a massive portion of their systems become useless hunks of scrap because of poor thermal design not dissipating enough heat and causing the solder to crack.

kabdib | karma 15100 | avg karma 5.39 · 2018-02-16 01:21:32+00:00

RROD cost Microsoft a billion dollars, maybe two billion. That doesn't scream "Please shave the cooling system down to the absolute minimum cost on the next console" to the hardware team.

The XBone cooler is a pretty decent one, because the alternative is a ton of warranty work, plus lawsuits and bad press. And worse, a bad customer experience.

reply

jepler | karma 1485 | avg karma 2.57 · 2018-02-15 09:12:06

While we tend to think of them as unchanging, perfect slabs, integrated circuits change over their lifetimes and eventually fail. Unsurprisingly, these effects are greater when feature sizes are smaller. Here are some links I dug up on the subject: https://semiengineering.com/transistor-aging-intensifies-10n... http://ieeexplore.ieee.org/abstract/document/7796117/ https://en.wikipedia.org/wiki/Negative-bias_temperature_inst...

A vivid real world example of this was the 2011 Cougar Point chipset problem which would cause the SATA ports to essentially burn out well short of the expected lifetime of the system. https://semiengineering.com/transistor-aging-intensifies-10n...

reply

api | karma 31631 | avg karma 2.2 · 2018-02-15 15:28:23+00:00

In my experience there are parts that fail in the first few years, but after that you get a long tail with some lasting decades.

opwieurposiu | karma 2450 | avg karma 3.88 · 2018-02-15 16:36:13+00:00

This is known as the "Bathtub Curve" https://en.wikipedia.org/wiki/Bathtub_curve

SemiTom | karma 645 | avg karma 1.13 · 2018-02-15 16:33:59+00:00

It's not just about the CPUs or APUs, which generally have very high yield and long lifetimes. It's all the other chips. SSD begins exhibiting errors after extensive use. Other chips that are considered good enough are at the high end of their acceptable performance tolerances when they are included in devices, but the effects are additive. So the more chips running at the high or low end of what is considered acceptable, the more likely that problems will crop up over time.

kec | karma 886 | avg karma 2.64 · 2018-02-15 17:35:00+00:00

Silicon chips age due to electromigration, which is exacerbated by small feature sizes. Chips made 20 year ago could take decades for enough migration to cause failure which is why you’ve never known or cared before. Today due to the much smaller processes we use it’s closer to a few years.

kingosticks | karma 1221 | avg karma 1.19 · 2018-02-15 22:01:29+00:00

And hot-carrier injection (HCI), oxide breakdown and, negative-bias temperature instability (NBTI).

ansible | karma 5563 | avg karma 2.43 · 2018-02-15 14:32:09+00:00

Computing in automotive is a challenge.

In the past, you might have automotive-grade microcontrollers for functions like ABS, which consume less than a Watt maximum. You paid a little attention to having enough ground vias on the PCB for thermal conductivity, and that was about it to qualify the design for the high end of the temperature range.

Degraded lifetime wasn't so much of a concern.

These days, you've got ARM processors with a TDP of 15W or more, and keeping the die temperatures below the maximum when the ambient temperature is 85C, well, that starts to get interesting. Especially if you don't want to use a fan, and the processor is stuck somewhere without adequate airflow.

And then you've got high-end systems with a TDP in the 150W range. Then you've got to have a good cooling solution to run your application at office environment temperatures, nevermind the full automotive temperature range. And what's going to be the lifetime for these parts running at elevated temperatures, even if you are staying within the maximum temperature limits? Sigh

reply

carlmr | karma 7214 | avg karma 2.48 · 2018-02-15 14:43:29+00:00

Automotive chips under the hood have to be rated up to 125°C environmental temperature. 150°C in some locations.

ansible | karma 5563 | avg karma 2.43 · 2018-02-15 08:56:53

Yep. The junction temperature is rated up to that. What I'm saying is that even so, thermal design is a challenge. And I have concerns about chip lifetime under those sorts of conditions.

roymurdock | karma 7590 | avg karma 4.45 · 2018-02-15 15:28:33+00:00

As more critical functionality in cars comes to be controlled by MCU/CPUs and software over ECUs and mechanical components, I wonder if software obsolescence will have a larger impact than potential chip failure from increased use.

It's not hard to imagine a future where Tesla stops developing and pushing updates for cars after 5-10 years, basically forcing an owner to trade up as the hardware in older models becomes incompatible with the latest self-driving algos. It's also not hard to imagine a future where Amazon/Uber/Bigco own and update large fleets of vehicles, maintaining and renting them out much like AWS does compute cycles, taking the burden off the user to maintain their own hardware.

reply

Slansitartop | karma 1057 | avg karma 2.49 · 2018-02-15 16:18:00+00:00

> As more critical functionality in cars comes to be controlled by MCU/CPUs and software over ECUs and mechanical components, I wonder if software obsolescence will have a larger impact than potential chip failure from increased use.

I don't really see that. Once the software's out there and works, it'll work even if it isn't updated with the latest feature set. I can't really imagine the government changing self-driving car standards so fast that a non-updated car could not operate safely and would have to be retired.

Any broken "cloud server dependence" would probably cause so much weeping and gnashing of teeth that Tesla would be foolish to try it, and again, I doubt the government would allow it for safety reasons.

reply

falcolas | karma 33589 | avg karma 3.15 · 2018-02-15 16:22:00+00:00

There's the flip side of that too: they can be "stored" in environments which are occasionally in excess of -40°C, and have to function properly then as well (and be subsequently heated back up to above 100°).

Retric | karma 55819 | avg karma 2.05 · 2018-02-15 09:15:53

15W chips use ~2% of a HP each, and 150W = 1/5 horsepower worth of parasitic load. So using several high end chips should result in significantly worse fuel efficiency.

Are they really necessary or just cheaper in R&D terms?

reply

simias | karma 28508 | avg karma 6.15 · 2018-02-15 16:02:10+00:00

Modern cars have tons of advanced features that require a decent amount of computing. I mean just look at the evolution from radio cassette players to modern integrated GPS/media player/bluetooth/etc systems. Also things like automatic parking assist, rear view camera, computer vision algorithms to detect if the driver is falling asleep. It's feature creep all over the place.

oblio | karma 25694 | avg karma 2.47 · 2018-02-15 16:29:52+00:00

> Also things like automatic parking assist, rear view camera, computer vision algorithms to detect if the driver is falling asleep. It's feature creep all over the place.

You have a weird definition of "feature creep". Some of those things you listed save money, others save lives. I definitely wouldn't include such things in "feature creep".

reply

simias | karma 28508 | avg karma 6.15 · 2018-02-15 17:13:27+00:00

Sorry, it might not have been the right word. I didn't mean that these features weren't useful, only that in the past decades the past decades the amount of software in cars really exploded from basically 0 to having a bunch of fully featured computers dealing with various subsystems, from fuel injection to radio playback. Software "creeped" everywhere.

bluGill | karma 18458 | avg karma 1.43 · 2018-02-15 16:04:03+00:00

Going by your numbers and my best guess, I think the processing of self driving cars needs 1/4 hp. The first few generations will need more than that. (you can get by on less if you are willing to sacrifice safety)

madez | karma 3080 | avg karma 2.26 · 2018-02-15 16:07:20+00:00

Can you break down your guess?

bluGill | karma 18458 | avg karma 1.43 · 2018-02-15 16:31:16+00:00

It takes a lot of processing to detect "objects" and figure out which are of interest. Give me a data structure of objects; with their meaning (a place that can hold the car without getting stuck) and their possible future behaviors (where it can actually get to given the laws of physics and reasonable assumptions) and a self driving car is easy. Getting that list is hard, and frankly beyond my knowledge.

Mostly I'm taking how much energy humans use, and we do a bad job...

reply

nimos | karma 1398 | avg karma 5.5 · 2018-02-15 10:57:59

Optimised acceleration/deceleration would probably save more fuel than self driving hardware uses, for city driving at least.

At 60mph you'll be using 10-20hp to maintain speed so about a 1-2% loss per 150W chip(well a bit more for conversion losses).

reply

sixdimensional | karma 2330 | avg karma 3.03 · 2018-02-15 16:47:31+00:00

It's even worse in avionics.

The turn to commercial off the shelf (COTS) parts to cut costs has raised the cost of using specialized ones (or caused people to have to develop custom solutions at great cost due to market gaps).

High temperature tested CPUs (above consumer grade) is a good example.

IOT causing more and more embedded technology might help change that tide, due to the need for components that can handle extreme and longer term deployments. But, the last so many years of the push to cut costs through COTS has come at a big cost for sustainability.

reply

squarefoot | karma 13102 | avg karma 2.95 · 2018-02-15 09:53:18

Chips aging aside, tin whiskers also are one of the main reasons why manufacturers shouldn't use electronics except where it is really really really necessary, especially on things that move at high speeds with humans on board or in vicinity. NASA itself -which use electronics on stuff that gets in orbit- did some good research on the subject. https://nepp.nasa.gov/Whisker/background/index.htm

Tl;dr: Tin whiskers are very thin spontaneous metallic formations which can short nearby pcb tracks or conductors and are believed to be the cause of many failures in electronic devices. There is no evidence of a single cause for their formation but it seems certain that eliminating lead (RoHS legislations etc.) from solder is one of them.

reply

Slansitartop | karma 1057 | avg karma 2.49 · 2018-02-15 16:08:27+00:00

Can't they coat the PCBs to mitigate/elminiate the problem?

I remember watching a teardown of a spare-no-expense embedded military computer, and the guy couldn't stop talking about how much "conformal coating" the board had.

Edit: I think this is the video: https://www.youtube.com/watch?v=55z_0BYb5is

reply

squarefoot | karma 13102 | avg karma 2.95 · 2018-02-15 10:28:18

From what I understand, conformal coating is intended to protect against external agents (moisture, etc), and it does the job well, but it's not equally effective when the problem is between adjacent conductors of the same pcb; for that purpose the substance should be able to penetrate the tiny space between chip pins. It would probably help to some extent though.

blattimwind | karma 8132 | avg karma 3.26 · 2018-02-15 16:43:32+00:00

Depends. If its dipped (or just sprayed on) it won't get inbetween fine pitches, but application in a vacuum will force it into every tiny space. It's essentially the same as vacuum-potted transformers.

the-dude | karma 9031 | avg karma 2.39 · 2018-02-15 16:31:56+00:00

I dig the guy much better than David Jones ( EEVBlog ).

Here is Mike's BBC micro : https://www.youtube.com/watch?v=la-sGpTpkxE

Great stuff.

reply

kurthr | karma 8308 | avg karma 3.17 · 2018-02-15 16:52:27+00:00

Typically, the coating is Parylene, which is a room temp vapor phase conformal coating (e.g. roughly uniform thickness independent of the contact angle of the materials). It's used for corrosion, moisture, vibrational. wear, and reduced breakdown voltage, etc.

It is not used for Sn (tin) whiskers at all, since it wouldn't penetrate the pads/package of the IC and wouldn't stop such a high modulus process in any case. Whiskers are most problematic with modern Pb-free solders and fine pitch SMT practices (QFNs & BGAs), but it's worth noting that other materials (like Zn) also have significant issues.

http://www.paryleneengineering.com/why_use_parylene.htm

reply

LeifCarrotson | karma 16599 | avg karma 4.61 · 2018-02-15 19:37:28+00:00

From that page:

> The coating completely penetrates spaces as narrow 0.01mm.

I would have thought that would make it very effective in protecting against whiskers in QFPs that might have a lead pitch of 0.4mm or larger. There's not a lot you can do under the bodies of BGAs and QFNs, but if you're worried about whiskers you're probably using QFPs instead.

This app note suggests that Parylene and a few other dip-type conformal coatings do slow down tin whiskers, but don't stop their spread:

https://www.maximintegrated.com/en/app-notes/index.mvp/id/52...

reply

emh68 | karma 113 | avg karma 1.51 · 2018-02-15 11:38:11

I’ve read about tin whiskers extensively, and the consensus seems to be that while conformal coatings will stop tin whiskers from falling accross two leads, they won’t stop a whisker that’s growing from one lead toward another - they will just pierce right through it.

planteen | karma 2242 | avg karma 2.56 · 2018-02-15 14:06:11

Conformal coating is standard for PCBs going into space. Lead solder (not RoHS) is used in assembly. For commercial BGA packages, companies specialize in de-balling the RoHS balls and replace them with lead balls.

Animats | karma 143047 | avg karma 6.11 · 2018-02-15 17:26:25+00:00

Two to four year lifespan?

I happen to know that the design life for the Ford EEC IV, the ignition control system for 1980s Fords, was 30 years. That was achieved; many 30-year old Fords are still running with the original electronics.

reply

SemiTom | karma 645 | avg karma 1.13 · 2018-02-15 12:50:00

article says "Chips developed for computers and phones were designed to operate at peak performance for an average of two to four years of normal use. After that, functionality began to degrade"

kingosticks | karma 1221 | avg karma 1.19 · 2018-02-15 17:33:14+00:00

Enterprise ASICs (i.e. routers) already run at these very hot (and cold) temperatures on the very latest nodes. They also demand reliability over long periods - although the consequences of failing to meet that are obviously not as severe. Ageing is much worse at 7nm but it's already accounted for during STA. Like everything else, you just assume the worst and it's another chunk of your clock period you never see. But expecting to run above 125C... not sure about that.