Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

The target of the checklists is the problem.

“When a measure becomes a target, it ceases to be a good measure.”

The measure has to be concrete enough to not be manipulatable.

For example, aircraft mechanical failures or hospital infections are very clear and obvious metrics. And besides outright lying and manipulation, the metrics speak for themselves and can't be corrupted.

You can't, however, measure "education" or "intelligence". You can approximate it and the more you make depend on the results the worse the already bad metrics become.



sort by: page size:

Lots of reasonable metrics become bad once you make them a target.

It's Goodhart's Law: "When a measure becomes a target, it ceases to be a good measure." Too many people are incentivized for the metrics to look good to trust the metrics anymore.

The problem simply is that most things are very hard to measure directly if not impossible.

"When a measure becomes a target, it ceases to be a good measure."

https://en.wikipedia.org/wiki/Goodhart%27s_law


Once a measurement becomes a target, it ceases to be a good measurement.

When a metric is targeted, it fails to be a good metric.

Doesn't work. "When a measure becomes a target, it ceases to be a good measure."

You can totally build something that is a decent metric, but as soon as you create incentives to "game it" (optimize for the metric not the actual goal you're trying to measure), it will be gamed, and creating metrics that are resistant to that is nearly impossible in most cases.


Part of the problem is that an imperfect measure often isn't better than no measure, unless you're very careful about how the measurement is used. You end up with people devoting huge amounts of resources to gaming the metrics, which produces worse outcomes than even not evaluating people at all would.

They are bad metrics.

Maybe it helps you understand if you think about how easy they are to game. You could just as well create useless lines of documentation as you could create useless lines of code.

Goodhart's law says:

"When a measure becomes a target, it ceases to be a good measure".


>when a measure becomes a target, it ceases to be a good measure.

I see this get thrown around a lot but it's not really true. If a metric can be gamed then I agree it's a flawed or poor metric, but there are definitely good metrics that make good targets as well.

Concrete example: I used to work in aircraft maintenance putting Search and Rescue aircraft into the sky to go pull people out of lakes or out of crevasses, etc. One of our KPIs/metrics was the number of hours spent in what's called the "Red" state, i.e. you have no serviceable aircraft that can fly if a callout happens, meaning the region is lacking airborne SAR assets.

There isn't really a way to (legally) game this metric. Either your aircraft is serviceable or it's not. The only way you could cheat is to just lie on your statistics and release aircraft for missions that are actually not serviceable, but that's going to bite you in the ass sooner or later, would require a conspiracy of 10+ people to lie on official airworthiness documents, and doing so is a federal crime not to mention a big ethical no-no.

Our monthly target was zero. I.e. we tried to go each month keeping at least one serviceable aircraft at all times. We only hit that target a few times while I was in that job but it was rewarding, and on months where the Red indicator was particularly large I would drill down with senior staff to determine if it was an anomaly or if there was a trend starting, and we'd address it.

And yet it's still a good measure because it's directly measuring what was our primary objective (i.e. can you put aircraft in the sky to carry out rescue missions or can you not?)


    > ...any metric oriented organisation might want to have an explicit alignment process that ensures...
Yes, I think many folks would agree that metrics aren't an easy thing to use properly.

One huge problem, however, is that organizations that are apt to rely on metrics usually aren't sensitive enough to realize that there's more to their use than coming up with a wish-list and then dreaming up bullshit KPI's which are somehow intended "move the needle" towards correct outcomes. And when it doesn't someone just bullshits their way around the failure instead of doing the hard work needed to honestly address mistakes and problems.

That's, I think, what happens a lot in dysfunctional school districts, though many of us might see this comedy of errors show up in orgs with a matrix management structure.


Similar to Goodhart's law: When a measure becomes a target it ceases to be a good measure [0]. You see it everywhere and it usually happens when we try to quantify a quality (like "competence"...)

[0] https://en.wikipedia.org/wiki/Goodhart%27s_law


Exactly.

> When a measure becomes a target, it ceases to be a good measure. - Goodhart's Law

> The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor. - Campbell's Law

You want to use metrics as a very broad breadth first search to help cull the search space and use trust systems as a depth first search. But once you have found signal through trust, you should completely ignore that first search and even look more into things that you had previously excluded.

Find researchers that you agree with, find the papers that they recommend, read those papers and check that you agree with them. If they recommend something from a no-name university, look at those first.

Unfortunately if this is not your field and you aren't able to determine quality, this becomes impossible. If it's important to you, you need to learn it. This is why I don't like non-technical managers. If the people who approve the grants do not understand the result, this is inevitable. It might work early on when trust still lingers, but as metrics take over the social systems always fall apart.


"When a measure becomes a target, it ceases to be a good measure."

"We can’t measure what counts, so we count what we can measure."


This is related to Goodhart's Law, which is "When a measure becomes a target, it ceases to be a good measure." It's a large area of study to figure out how to get the outcome you actually want, while measuring something close, because good metrics can be hard to come by.

Perhaps a more general observation is that a measure that faces pressure by intelligent agents needs to be monitored and adjusted by equally intelligent agents to patch the exploits.

So you can either use a simple measure that doesn't work, or a complex measure that does work, but at potentially great cost (evaluating all the terms, bureaucratic inertia by the controllers, etc.)

Some metrics seem to provide more of a free lunch - be more robust - than others. It's not obvious which are robust, however.


when a metric becomes the target, it ceases to be a good metric

This is a critical part. The problem is not using metrics (like SAT/ACT), the problem is using metrics blindly. That's the thing that always bugs me is when people just go on metrics alone as if they perfectly align with what's being attempted to be measured when metrics are only guides which choices must then be made from through careful evaluation.

Most real-life goals can't be easily expressed with few "easily trackable" metrics. Take example from the article, customer satisfaction. That's half of the problem; the other half is that metrics can be gamed, and they'll end up being gamed accidentally or for profit if you're not careful. Like, the easiest way to make a system stable is to make it so painful to use that people don't use it - and if they don't use it, they can't break it. Or, the example of return tracking from the TFA.

I read a lot about how data-driven companies measure this and that, often through questionable, privacy-violating measures. What I don't read about is how do these companies ensure the metrics are actually valid - that they're correctly sampling the population[0], or that they're measuring what the authors think they're measuring, or that they're not being misreported or otherwise gamed (very common if the value of a metric impacts someone's career or even workload).

--

[0] - e.g. voluntary surveys usually don't, telemetry increasingly doesn't either as more and more people are aware of it and disable it.


I agree that bad metrics can be worse than none. I've certainly seen other examples in books on management.

In this case, though, it sounds like they've put a lot of effort into coming up with good metrics that are hard to game and, most importantly, are convincing to the surgeons themselves. The article talks about that at some length.

(BTW thanks for plugging static code analysis -- I work in that field :-)

next

Legal | privacy