Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

I've thought about this problem before. I'm a neuroscientist, and I would often like to try out other people's modeling techniques on my data, but since the vast majority of published papers do not have corresponding published implementations, I have to implement the algorithms myself, and hope I'm doing the same thing the paper did, and hope they did the same thing the paper says. There is some pedagogical value to this, since it ensures a good understanding of how the algorithms work, but often this pedagogical value is limited. In many cases the main innovation in the paper is a better optimization algorithm and not a different way of framing the problem.

There is little incentive to publish your code. Refusing to give away your implementation does not in any way constrain your ability to publish, and giving away your code has only minimal benefits for your career. On the other hand, it's risky, since someone might find a bug in your implementation that changes your results. Additionally, a competing lab might show their algorithm is better than yours, or worse, improve your algorithm and publish a higher impact paper based on it, which might affect the profile of your publication. Finally, publishing your code means you have to package it in a way that is usable by others. Given these facts, it is not entirely surprising that most people would not publish their code.

Since science is pretty decentralized, it's hard to achieve the kind of large-scale change in behavior you'd need to make code sharing standard. The only people who could simply decide that people should publish their code and make it happen as a consequence are the funding agencies (e.g. NSF), who have only recently begun thinking about data sharing and have yet to make code sharing a priority.

One thing I'd like to see happen is a "viral" AGPL-like license for scientific code. Code would be freely redistributable, but if it is used in a commercial product or publication, the license should require that the modified work be freely redistributed. Given the choice between building on publicly available code or writing everything from scratch, I feel that most researchers would choose the former, even if it meant releasing their code as well.



view as:

simonster ,thanks for the comment. You touched on a lot of the reasons why we are building Algorithmia.com. If we can remove the barrier to getting the code up and running we believe more people will get access and use innovative scientific code.

The competition between labs is an interesting point. In a non- academic world we would see competition as good: better results, algorithm optimizations etc. Sadly I can see this as a deterrent to publishing code in the academic world though.


The academic world right now is built on first mover advantage in some pretty serious ways.

I think the big one is the lack of incentives.

Publishing your code costs time in terms of getting it polished for public use, money by way of time. And it has very little benefit - you've lost exclusive rights to a publication platform, and Github doesn't show up in tenure portfolios.


In an ideal world the additional citations that public source code created would counter some of these problems I guess.

The existence of papers about software is a decent step in the right direction, but it needs:

a. People to cite what software they used way more heavily than they do (this includes me). b. The effort to be rewarded in and of itself. A paper, cited or not, is credit. There's a huge leap of faith that, after all the time prepping software for release that anyone will use it.


Legal | privacy