>> Although these are very imperfect indicators of success, here they are.
There you got your response at the beginning of the post. It seems that you are attacking for the sake of attacking? Sam is providing these data because people asked for them.
> did the recommendations team develop the system specifically with those KPIs in mind?
Yes they did - in fact they had input on defining them and helped in tracking them.
> Did they ever have access to adequate information to truly solve the problem your team needed solved?
They believed so. Their team was also responsible for our company data warehousing so they knew even better than me what data was available. Basically any piece of data that could be available they had access to.
> And was the same result observed for other uses of their recommendation systems?
I did not have first-hand access to the results of their use in other recommendation contexts. As I mentioned in my original post I only had second-hand accounts from other teams that went the same route. They reported similar results to me.
>> their models have 20,000 vectors in determining credit worthiness. How would you begin to break that down to something explainable?
Well, somehow they decided that their 20k-parameter model is accurate. They should at least be able to explain why they took that decision, even if the model itself is too complex.
> It’s not entirely clear why the University of Washington team gets such a weird result — since their data isn’t public, we can’t check it — but it’s worth noting at least two important issues with their study.
I cannot understand how economic studies are supposed to be credible when the data they use is not provided along with their methodology.
Is this for privacy reasons? If so, surely we can come up with obfuscation standards?
Submission statement: The author argues for decentralising databases in science while maintain consistent file formats.
I am not sure about this. In genetic epidemiology we have three databases for genome wide association summary data called “sumstats”. Each has it’s own way of formatting the data and are in various states of maintenance. GWAS Atlas is no longer receiving many of the latest summary statistics. While MRC’s IEU database stores their fairly up to date sumstats as a very different file format (custom VCF) which is fairly simple to convert to a more standard format, but is confusing for the less tech savy users.
This arguably a pretty centralised system already. But it is already very difficult to be sure you have the most recent (and best quality) sumstats for a particular phenotype. Decentralisation would make this much worse! On the other hand, centralisation risks the singular database become unmaintained due to funding constraints and because the whole thing is likely managed by a single post-doc who needs to move on from their job every 3 years so that they have a chance for career progression.
> Any stories of inadequate examples or meaningless correlations?
A customer's marketing group was tying visitor data to geodemographic data. They put together a database with tons of variables, went searching, and found a multiple regression with a Pearson coefficient of 0.8+, a low p, decided to rewrite personas, and started devising new tactics based on the discovery.
Fortunately, they briefed the CEO and the CEO said that the dimensions in question (I honestly don't remember what they were) didn't make intuitive sense, and demanded more details before supporting such a major shift in tactics. More research was done, and this time somebody remembered that this was a product where the customers aren't the users, so they need to be treated separately. And it turned out the original analysis (done without fancy analytics) was very close to correct.
If the CEO hadn't been engaged during that meeting, they would've thrown away good tactics on a simple mistake. The regression was "reliable" by most statistical measures, but it was noise.
A similar example holds for validity, where I saw a team make wonderfully accurate promotion response models, but they only measured to the first "conversion" instead of measuring LTV. And after several months of the new campaign, it turned out that the new customers had much higher churn, so they weren't nearly as valuable as the original customers.
> Care to elaborate how how to be more sure of reliability and validity?
I'm not a statistician or an actuary. I'm a guy who took four stat classes during undergrad. I know just enough to know that I don't know that much.
Disclaimer aside: my biggest rules of thumb are to make sure that you're measuring the thing you want to measure (not a substitute), to make sure the statistical methods you're using are appropriate for the data you're collecting, and to make sure you understand the segmentation of your market.
Can you post your research? How many millions of accounts did you analyse? What tools did you use?
reply