Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login
The Safyness of Static Typing (blog.metaobject.com) similar stories update story
64 points by mpweiher | karma 52065 | avg karma 5.58 2014-06-18 17:56:44 | hide | past | favorite | 103 comments



view as:

But when the type system can ensure that you handle nil values safely and can check that every case is covered in a switch statement I'm sure that covers more than 2% of bugs.

Why are you sure?

Not properly guarding for None has been over 2% of my bugs in working with Python.

I'd guess it's more like 5-10%.

There's probably another 5-10% that are problems dealing with trying to uniformly iterate over different kinds of structures, but I'm willing to admit I might just not know the correct way to do it (ie, map or fold or something).


The interesting thing is that the majority of statically-typed languages wouldn't catch the null for you. C, C++, Java, C#, and many others accept null as a valid input for any type.

There are plenty of languages like Haskell, Haxe, Idris, Agda, Roy, Rust, and Elm that get this right. Anecdotally it seems that most developers who advocate static typing aren't using these languages. It seems like the majority are writing in one of the Algol derived languages from the first paragraph.

Scala gets it half right with Option, but included null for Java compat. I've not written much Scala, but I've already encountered numerous bugs that resulted from some library returning an unexpected null, and the type system decided that was totally fine.


Yes C/C++ are pretty weakly typed and Null pointers are an issue in them all. They still catch some issues though and I find it does help me sometimes compared with Ruby for example. If they only caught 2% of the issues I would be a little surprised but it is certainly in the range of possibilities depending on the stage of development (I think it would catch more as you type your code but possibly 2% or less when evaluating a finished, debugged tested product).

You can also add Swift to the list of languages that get it right in this regard. When receiving a return value from obj-c that can be null it comes back as an optional so you SHOULD safely transform it to a real value or only perform conditional actions on it. Of course there is the possibility to coerce it into the full type.


Probably experience. I can confirm this based on my work in dynamic languages. I don't check in most of these bugs though, so the methodology used to arrive at 2% would miss them.

http://www.impredicative.com/ur

Bam, static typing system that prevents a bunch of those exploits from the list of 25 most exploited bugs.

The generics paper is not exactly relevant. Or I'm not quite sure what it's supposed to prove.

The other research paper is kind of disingenuous:

a.) They measure programmer productivity on a small, 27 hour project. Advantages of static typing grow proportionally with the size of the project. I personally am of the opinion that scripting languages are ok for smaller projects, but not that great for larger projects.

b.) They use two custom languages, one inspired by Smalltalk, one inspired by Java and based on the results make a pretty general conclusion about static typing. I'm not convinced that such conclusion can be made based on the experiments they present.


Experience. Tony Hoare's speech about being to blame for nil and the estimate the financial damage caused. I'm also suspect I've seen a coverage of a research doc about bugs where nils were a noticeable share but I've no idea where that is.

One of the recent bugs in my app that was causing a crash from time to time involved the platform library returning nil when there was no user input in the text field rather than the empty string it promised in the doc.


>Because static typing is only worth the 2%

>If you felt it helped it's because it was the placebo effect

>Remember the time you mistyped something and the program crashed at runtime when it should have been a compile error. That was the 2%.

>There's not one time you passed in the wrong object and duck typing smoothed over the error, until you find it at runtime. That was the 2%.

>Remember the time the compiler made your program faster because it could deduce additional information from the types? That was the 2%.

>Remember the time your IDE helped you write programs, detect potential errors because it could deduce more information from the types? That was the the 2%.

>if you advocate static typing, then you are comparable to a religious zealot.

What a load of garbage.


You left out the most important part: there is evidence -- in the form of at least one peer-reviewed study -- to support all these claims.

>This paper presents an empirical study with 49 subjects that studies the impact of a static type system for the development of a parser over 27 hours working time.

Because all the programs you've ever written were parsers.

full text: http://courses.cs.washington.edu/courses/cse590n/10au/hanenb...

I'm not actually surprised at this result when they forced people to use their toy static typed language, in their toy IDE (which couldn't take advantage of the static typing i.e. intellisense style prompts, type error underlines etc).


And you never need to refactor them. It's also not clear the sophistication of the type system, or the familiarity of the 49 participants with the type system, whether they were working together or independently, ...

But most damning is the line immediately following:

"In the experiments the existence of the static type system has neither a positive nor a negative impact on an application's development time (under the conditions of the experiment)."

Unless the abstract is tremendously misleading, the paper flat out does not say what TFA claims.


At a closer look (I missed the pdf link, the first time) it seems the abstract is tremendously misleading. Under the conditions of the experiment dynamic typing was faster. Of course, the conditions of the experiment involved a small project in an unfamiliar language with an unfamiliar type system over the course of 27 hours with an unchanging set of requirements, so draw your own conclusions as to how well this generalizes.

Comparing the description of the paper to the abstract, TFA seems to substantially overstate things.

From TFA: "[T]here was a study [...] which found the following to be true in experiments: not only were development times significantly shorter on average with dynamically typed languages, so were debug times."

From the abstract: "This paper presents an empirical study with 49 subjects that studies the impact of a static type system for the development of a parser over 27 hours working time. In the experiments the existence of the static type system has neither a positive nor a negative impact on an application's development time (under the conditions of the experiment)."


As I mentioned elsewhere, the abstract seems to disagree with the paper. I'm not sure what conclusions to draw from that fact...

It may be peer-reviewed, but that 2% number remains to be verified independently.

No need - if you open the linked PDF, you'll see that the percentage was arrived at by searching github issues for things like "type error", "argument exception" and so on - in projects written in dynamically typed languages.

QED.

I find that in strongly-statically-typed languages, I spend more time juggling the types of data than I do solving the problem at hand.

I find that in strongly statically-typed languages, laying out the types of the data helps me solve the problem at hand.

At least, for statically typed languages I actually use. I certainly have experience with unhelpful type juggling.


You're supposed to solve most of the problem in the type design phase.

This. If you're writing code before having at least thought out a preliminary solution, you're doing it wrong.

Types let you codify your idea, and then make sure your implementation aligns with it as you write it.


If you get the types wrong, you haven't really solved the problem, have you?

It's not just static typing. It's strong static typing like you see in Haskell, Idris, Scala, F#, etc.

This article misses the point of what a type system brings to the table. The key benefit isn't catching errors.

The benefits are largely discoverability, enabling better tooling, improving the number of optimizations compilers can make, and guiding the programmer. These are all things we benefit pretty much unilaterally from. The trade off is having to be slightly more explicit (Java and C# haven't helped this, as they insist you sign everything in triplicate, rather than promote type inference).

For functions with sufficiently rich types, there are often only a couple ways to implement the function. Sometimes, there are few enough that a compiler can actually derive it for you.

Having switched between typed and untyped languages repeatedly, I can't emphasize enough just how much rich, strong types contribute to the readability of the code.

And on yet another front, it may only be "2%", but I'm sure most people on here know how it feels to have written a couple hundred lines of code, only to suddenly find something of the wrong type somewhere it shouldn't be... It only takes a fraction of a percentile for a program to be utterly and completely useless.


To expand a little bit:

I have found at least anecdotally that the benefits of a strong, rich type system are multiplicative with other features, not additive. For example, types in Java are largely a nuisance. The type system lacks facilities to express obvious things (I want a list of things that all have this interface) in clean ways. Instead you have to resort to "clever" hacks which ultimately just circumvent the guarantees you wanted to establish. For small projects this is a non-issue. For large projects, this is hell.

By contrast, type systems thrive in contexts with algebraic or sealed case types, or any form of pattern matching really. Or just plain old enums. In conjunction, these features enable very powerful static checks. Forcing you to handle None/Null/NONE/Nothing/nil cases everywhere encourages critical thinking. This introduces more issues, such as staircase code, but these are largely (I would personally say completely, and with a nice surplus) fixed by things like pipes, monads, computation expressions, and so forth.

This extends to libraries as well. I can't count the number of times I've used libraries which changed their API's in "non-breaking" ways, such that certain functions returned types I assumed would never be returned. Was this my fault? Yes. But if the return type had been strongly typed and I had been working in such a language as described above then:

1) I would be forced to handle every extant constructor in that type.

2) If a new one was added, I'd get a compiler warning alerting me to the fact that I hadn't handled it (this is great!).

3) If the type changed entirely, I'd get an error warning me to the fact that my code was no longer compatible with the API provided.

In the end, working with and writing libraries is about respecting contracts. Types are a tool for codifying those contracts. Strong typing and matching facilities are even more powerful tools for alerting you to violations.


FWIW I don't think the normal syntax for bounded generic types in Java should be considered a "clever" hack, it's just a feature of the language which is actually quite powerful (in spite of type erasure):

List of things that all have interface I: List<? extends I> l;


I love refactoring in a good, statically typed language. Change the types in a carefully breaking way, and you're immediately told every single place you need to change. Seven times out of ten I can manage to squeeze this out of my C, even before static analysis tools, though certainly wind up leaning more heavily on my tests than in some other languages.

I usually have tests for features, regardless of static or dynamic types. You need feature tests anyway, because even sophisticated/complicated static type systems can't catch serious implementation logic bugs. And these tests implicitly cover what a static type system would give you and more.

I disagree. Unit and feature testing are vastly overrated for what they do, and their current use is faddishly subscribed to in a seemingly cult like fashion. Unit tests are excellent for testing non-stateful behavior of code. That is, they're great for testing contracts, and they have a bit more flexibility than Eiffel style contracts. However, using them to the extent of "returns an Object of this type" is absurdly verbose and pollutes the code base with roundabout information.

These sorts of things are best left to a static type system, which is a more specialized and effective system for that portion of validating your code. On the other end of things, tests of more stateful processes or behaviors are better served by E2E testing.

There's nothing wrong with using multiple forms of testing. Different tools excel at testing different sorts of things.


> However, using them to the extent of "returns an Object of this type" is absurdly verbose and pollutes the code base with roundabout information.

I did not talk about unit tests. I do not test stupid things like "returns object of this type". I test functionality (more like integration tests). If the functionality works then things like "returns object of correct type" or "items in collection are treated correctly" are tested implicitly or don't matter.

People with ridiculously complex type system programming languages have to write these automated tests as well, because even ridiculously complex type systems like Idris don't ensure the correctness of non-trivial logic.


Your argument that typing is tested implicitly or doesn't matter is false, because your function might have 73 code paths, of which your test only covers 1.

Tests and types are complementary. A test will prove that in at least one case the function does what it should, but a type system can prove that in all cases at least the types are correct. There was a blog post describing this as establishing a upper vs. a lower bound.


You have to write tests for the other 72 code paths anyway, because types can't ensure the behavioral correctness of these 72 code paths. Correct types don't help when the behavior can be broken. And when you have to write these tests anyway then you don't need the types anymore.

C# has local type inference, and is getting a bit better in type parameter inference in the next version.

The languages with strong type inference (say OCaml or Haskell) provide less in the form of tooling like autocompletion either through neglect (their communities don't demand it) or design limitations in Hindley Milner (mostly algebraic and structural, poor support for nominal typing where code completion menus begin to make more sense in ways that we currently understand them). Scala would be in the sweet spot of a well tooled statically typed language if the IDE situation could ever be straightened out enough.

> For functions with sufficiently rich types, there are often only a couple ways to implement the function.

That is only kind of true in pure function land. Any kind of state manipulation, even though monads with powerful dependent typing, drastically opens up how a function can be implemented.


> That is only kind of true only pure function land.

More than that: it's only true in total pure function land. Admitting divergence explodes the state space of functions.


Strictly, yes. However, most of the time this bites less hard than it might, because most of the non-total implementations are obviously wrong where we intend a total function. It still can bite, of course, and I've been meaning to play around in some total languages (or languages with a compiler-enforced total subset).

Readability is one of the reasons I try to avoid the use of the 'var' keyword in C#. It's useful in object instantiation, e.g.

   FooFizzBuzzProvider bar = new FooFizzBuzzProvider() 
   vs 
   var bar = new FooFizzBuzzProvider();
But in other instances it hides the underlying type too much, e.g.

    var query = (from entity in dbContext.Entities
             join entity2 in dbContext.Entities2.Include( e => e.Entity3) 
             on entity.entity2Id equals entity2.entity2Id
             select new { e1 = entity, e2 = entity2.Entity3 });
It works. It's nice. But sometimes it's too nice. At some point the returned type needs serialized, or some other issue which requires a strong type. You end up rewriting the dynamic stuff with strong types anyway.

C#'s var keyword tells the compiler to infer the type. It still has a statically determined type so that shouldn't interfere with type-polymorphic functions such as serialization routines...

I consider regular expressions to have the rare and elusive "write-only" flag set. Those bastards are almost impossible to decipher after the fact.

I incrasingly consider dynamic languages to fall into the same "write-only" camp. One week get work done fast and efficiently. A few weeks later a new edge case is encountered and it doesn't work. Figuring out exactly why can be more than a little frustrating.

Dealing with code other people wrote can be even worse. Yes the language is dynamic. No the expectations are not. If you call a function with some object and it's not the right kind then it's either not going to work or it won't work like you expect. Jumping into the middle of a system and trying to figure out what the requirements and expectations are for every object in every function is a collosal waste of time. Yes it could be documented, and that takes even more time! Static typing makes it remarkably clear and straight forward.


The OP says that much of the static typing benefits are in documenting code.

Having code properly documented, in a machine-verifiable form, is no small benefit. It's what maintainability is based on.


It's worth noting that one can hack together some machine-checkable documentation orthogonal to typing, too. As a limited example, I do this with inline TODOs (in a greppable format) while I'm working my way through a problem.

Machine-verifiable TODO items sounds interesting. If this comment was bait, it worked. Where're some examples I can look at?

I don't have much relevant posted anywhere. The gist is "git grep TODO", but I do a little extra work to make it extra convenient with vim's quickfix buffer:

    search:
        @git grep -n "$$PATTERN"

    todo:
        @PATTERN=TO''DO make search \
            | sed -n 's/\([^ \t]*\)\(.*TO''DO \(([0-9]*)\)\):\?\(.*\)/\1 \3 \2\4/p'
            | sort -k 2
Running ":make todo" in vim then pulls everything of the form "TODO (1):" out, sorts according to priority (the value in parentheses), and dumps them into my quickfix buffer (including the locations), which I can easily step through.

I also surface changes in the todo list as comments in my git commit buffer, for reference.


This is about where I've come down.

Back when my job involved writing a lot of greenfield code that I'd also never have to look at ever again, I was pretty fond of dynamic languages. Nowadays I'm working on a big product with a codebase that ain't getting any younger. For this job, I'm inclined to say static typing is a godsend. Sometimes you can tell a lot about what a function's supposed to do by its parameter and return types that you can't tell by its probably-nonexistent comments or its cryptic name.

That said, the last two languages I've worked in both allow you to forego strong typing in favor of duck typing, and sometimes that feature's a godsend, too.

I'm starting to think of it roughly the same as I think about mutability: Dynamic typing may not be a great default for maintainability reasons, but at the same time sometimes it's great to have the option for pragmatism's sake.

(None of this being anything I can back up empirically, of course.)


Whenever you find yourself potentially writing "write-only" code, it's worth it to take the extra time to split everything with newlines and insert lots of comments.

Some people think over-commenting is bad, but in my opinion, I'd rather take verbose comments over obfuscated code any day.


In my experience:

Accurate comments and confusing code is better than no comments and confusing code, which is better than confusing comments and confusing code. Clear code is better than any of these, regardless of the comments. Necessary comments with clear code is better than no comments with clear code, which is better than superfluous comments with clear code, which is better than inaccurate and confusing comments with clear code (in which case the easiest transition is to "no comments with clear code).


Wow, this is fantastic. Now I have a word ("safyness") for what I and others have been saying for so long: that static typing prevents the bugs that are easiest to catch. If it were free, I would take it, but it often does so at a large cost to productivity, and even understanding of the dynamic/runtime behavior of the system.

That second problem I summarize using the phrase the "map is not the territory". (http://en.wikipedia.org/wiki/Map%E2%80%93territory_relation) I had heard this phrase for many years, outside the context of programming. I never quite understood what it meant until I spent time around hardcore C++ and Haskell people. Static typing is a model (a map) for runtime behavior.


what I and others have been saying for so long: that static typing prevents the bugs that are easiest to catch

Not only are such bugs the easiest to catch, but they include bugs that can be insidious, expensive, and prohibit certain refactorings if they are not caught.

This isn't to say that static typing is the only way. It's more accurate to say: If you aren't catching those bugs automatically, you are doing it wrong.


Right. And it's not just that the typing is static, but that you are using it properly. Cf:

    struct price { int value; };
    struct quantity { int value; };

    int placeOrderUnsafe(int price, int quantity);
    int placeOrderSafer(struct price, struct quantity);

    ...

    placeOrderUnsafe(quantity.value, price.value) /* does the wrong thing */
    placeOrderSafer(quantity, price) /* caught at compile */

Surprising to me to use "the map is not the territory" this way when static typing constrains and makes provable claims about the territory.

If none, or even most of those top 25 errors don't mostly look like type errors to you, you don't know enough about good type systems to be making judgement calls about them.


"Saf" means naive in Turkish.

I am actually a big fan of statically typed languages. But it's naive to think a type system will make your programs correct by catching type errors. I do believe a type system helps making programs correct as a design aid.


Curry and Howard would like to have a word with you. http://en.wikipedia.org/wiki/Curry%E2%80%93Howard_correspond...

If they expect me to read some wikipedia article and then try to guess what they mean, they should work on their communication skills first.

In other words; if you have an argument, please do share. Don't just appeal to authority.


Tl;dr it directly proves that type systems can make your programs correct by catching type errors.

A correct program is correct whether there's a type system or not.

Adding a type system to a correct program doesn't make it more correct.

Adding a type system to an incorrect program doesn't make it correct (unless you refactor it).

So, type systems DO NOT make programs correct. They help you design correct programs. I am not overlooking the role of static typing in writing correct programs. But I'd disagree if you said type systems by themselves affect correctness of a program directly. A program doesn't come into existence instantly, it is evolved by a design process.


> Adding a type system to an incorrect program doesn't make it correct (unless you refactor it).

Of course not, but the compiler will tell you that the program is incorrect since it can deduce it.


This means type systems help you make your program correct, I agree.

Your argument was "type systems make your programs correct", I still disagree with that.

I think the distinction is important enough.


Are you downvoting me because you can't stand different opinions?

No, people are downvoting you because you are just wrong.

Please see this diagram

http://ro-che.info/ccc/17.html


FWIW, you cannot downvote someone who replies to you. It could not possibly be adamnemecek.

A type system is a tool. No tool makes anything better without actually using it.

Saying that applying a type system to an incorrect program doesn't make it correct without refactoring is like saying hanging a painting with nails doesn't work without a hammer.

It's true, but misses the point. Programmers tend to introduce implicit type information (anything from dynamic types to how you choose which functions to apply to which values). Bunches of bits have to be classified somehow to be useful.

A static type system lets you specify and define that classification explicitly, rather than implicitly.

So, if your mental model of the correctness of a program includes type information (and I assume it does in some manner or another) a static type system is invaluable for proving (re CHC) one facet of the correctness your program. Moreover, if your model isn't perfect – it rarely is – then a type system can allow you to scrutinize and analyze your model more rigorously.


A straight line is correct whether it was drawn with a straightedge or not.

Adding straightedge to a straight line doesn't make it more straight.

Adding a straightedge to an crooked line doesn't make it straight (unless you redraw it).

So, straight edges DO NOT make lines straight. They help you draw straight lines. I am not overlooking the role of straightedges in drawing straight lines. But I'd disagree if you said straightedges by themselves affect straightness of a line directly. A blueprint doesn't come into existence instantly, it is evolved by a design process.


In my current project (in C), I statically ensure that functions are run on the correct thread. "Oops, that function isn't supposed to be called there because it makes obscure assumptions" is one of the hardest bugs to catch, and my (unsophisticated!) type system regularly catches it for me.

How do you do this?

I create a struct representing each thread. Outside of a debug build, these structs are empty. This is always passed as the first argument to any interesting function (as it happens, while the C standard doesn't guarantee it, in GCC this generates the same bytecode as omitting the parameter), and always named th.

When defining any global variable designed to be accessed without a lock, I also create a LOCAL_varname_ASSERT_THREAD function that takes the relevant thread as an argument and does nothing, for static assertion of type equality. This is all wrapped in macros to be convenient and readable.

When you really start using your type system, it's amazing what can be "a type error", even in something as kludgy as C.

At this point, I've been working on the codebase for a year and a half, and it's been in production for much of that. It's >100k LOC, has multiple threads, generates responses in <10us, has been through some major refactorings, and I've had maybe 3 problems involving concurrency that even hit my tests, with one (a high level livelock in my message passing) winding up in production. I don't know how I would have managed this without access to these kinds of static guarantees (which is not to say there aren't any other tools which could have replaced this one, but it is to say there is tremendous value to this tool when you know how to use it!).


Very clever!

This is interesting. Pretty sure I get it but if this is open source I'd like to see it. Two comments:

1) I'm not sure it would be much more effort to implement this scheme in a dynamic language. Most of them can introspect their own code and you could easily flag such an error at startup time.

2) I like the style of passing a thread to every function. In that case, I wonder why you even have globals that multiple threads need to access. What I try to do is initialize ALL shared data in main(). And then pass those structures ONLY to the exact threads that need them. This can be done in either static or dynamic languages; it enforces a nice structure and is easy to read.

I'm not saying a dynamic language would be better for this project. C is a great tool and appropriate for a huge number of problems (also inappropriate for a huge number of problems).

FWIW Ritchie's C (the one they wrote Unix in) was very weakly typed, and that heritage still shows. I don't think it's an accident that C is popular; its type system doesn't get in your way, doesn't bloat your code, and also allows creative use/abuse.


"1) I'm not sure it would be much more effort to implement this scheme in a dynamic language. Most of them can introspect their own code and you could easily flag such an error at startup time."

That's just implementing a static type system in the dynamic language.

"2) I like the style of passing a thread to every function. In that case, I wonder why you even have globals that multiple threads need to access. What I try to do is initialize ALL shared data in main(). And then pass those structures ONLY to the exact threads that need them. This can be done in either static or dynamic languages; it enforces a nice structure and is easy to read."

I code much closer to that when latency matters less. As it stands, reshuffling different views for different functions takes precious nanoseconds every function call.

Edited to add: My response to 1 should not be interpreted as "BAM! Point for statically typed languages!" My contention is that statically checked type systems are phenomenally useful - this really just weakens the notion of static or dynamic types being a fundamental attribute of the language. I think a more accurate perspective is that for any language and any type system, there is some subset of the language that abides by the type system. That intersection may or may not be useful, and may or may not be checkable, but the most joy is to be found where it is both. The only real wins from integration of that checker with the compiler are 1) it's unavoidable (which can be relevant if you generally have sloppy process, but fix your process), and 2) the type information may provide invariants useful for optimization.


I rotate between Java, Ruby (and CoffeeScript) in my current job.

Java is less productive in terms of having much less expressive style and abysmal library support for higher level abstractions.

On the other hand, in Ruby I spend hours of my day fighting various DSLs - rspec, Factory Girl, ActiveRecord etc. getting my code free of things that could be trivially eliminated with static typing, so long as it was expressive enough to cope with the abstractions I'm working with.

And that's the catch. The abstractions I'm working with in Ruby are simply not sanely expressible in Java. I'd need a much, much better type system to program at the same level of power.

Overall, I've found the productivity of dynamically typed languages highly unconvincing when compared to typed DSLs - not embedded DSLs, actual DSLs with parsers and type checkers and semantic errors at the same level of abstraction as the domain. But maintaining a DSL is not something many people can afford to do, so we muddle on, bouncing between horrifically verbose Java and Ruby that needs ridiculous levels of testing to stop it falling apart into a pile of mud - especially if you ever want to even think about refactoring it some day.


I find that the two languages are actually not as dissimilar as you may be led to believe. You can write your Ruby a bit more like Java (use more plain old class objects for most of your business logic, and wrap usage of the heftier DSLs/gems inside of simpler interfaces), and support it with the same kind of tests (which are now easier to write & maintain because your code's concerns are more isolated). Sure, you miss out on the type checking, but I'm not sure that is as painful as it sounds when you remove many of the other factors it can be easy to fall into with Ruby.

The key here is that, while Ruby lends itself to building DSLs layered within DSLs (I'm looking at you, Rails), you don't have to use it that way. And, I'd argue, when you reach a point where your Ruby looks a bit more Java- or C#-esque, the more "truthy"/"safy" aspects of static typing end up playing a much smaller role than it seems from the outset.


> for example, most of the 25 top software errors don't look like type errors to me, and neither goto fail; nor Heartbleed look like type errors either,

Most of them do to me. (But I'm coming from Haskell, so it's not only the static typing, it's the very strong one).

Anyway, http://ro-che.info/ccc/17.html


I'm seeing where a lot of them could be prevented if you set up your types very scrupulously, but I don't think most of them are inherently type errors.

That said, I would say that the OP's assertion is itself a type error. The link is a list of "most dangerous" errors, where "most dangerous" is specifically defined as bugs that create security vulnerabilities. This is being used in support of a statement about what the most common bugs are. "Security vulnerabilities" and "software defects" are two different (if related) things, so that's a type error in the argument. And "dangerous" and "common" are two different characteristics, so that's a second type error in the argument.


Aren't "dangerous" and "common" different values of the same type?

They might be different subtypes of a supertype called "stuff you should worry about". But not every common error leads to an arbitrary code execution exploit (for example), and not every arbitrary arbitrary code execution exploit is a result of a common error.

Here is the actual statement about 2%:

http://schd.ws/hosted_files/buildstuff2013/ce/The%20unreason...

"2% of reported Github issues for Javascript, Clojure, Python, Ruby, are type errors"

The study referenced in that presentation, http://www.inf.fu-berlin.de/inst/ag-se/teaching/V-EMPIR-2014..., study 80 implementations of the same program by 74 people. The reported times in the study do not last for over 63 hours.

The study referenced in the blog post, http://courses.cs.washington.edu/courses/cse590n/10au/hanenb..., study "49 subjects that studies the impact of a static type system for the development of a parser over 27 hours working time."

The only conclusion that can be drawn from these datapoints, if 2 is enough, is that programs written in no longer than 3 days time can be implemented faster in dynamic languages than statically typed languages. There's nothing that calls out maintainability of a program over years of time, nor the number of pre-written libraries used in those programs recorded.


I too have noticed that static typing enthusiasts use dogmatic rhetoric more often than their opponents. One reason may be that the benefits of static typing are more obvious (catching type errors, documenting code). The largest benefit of dynamic typing, I believe, is that it encourages you to solve your problems with generic data structures like lists and maps, rather than inventing new types for everything in your program. This, I think, is what Alan Perlis was talking about in his forward for SICP [1] when he compared Pascal to Lisp, noting that in the former "the plethora of declarable data structures induces a specialization within functions that inhibits and penalizes casual cooperation".

[1] http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-5.html


Do you mean heterogeneous lists and maps?

No, that's not what I was referring to. Perlis' quote refers to something much more fundamental. When you make custom data structures (classes, types, etc) to represent each distinct part of your problem, this specificity makes your functions less reusable and moves away from Perlis' famous quote (in the very next sentence) about the benefit of many functions operating on few data structures.

I'm not sure if I understand.

Your data structures can be parametrized, no? I mean nobody creates a ListOfFoos or HashMapOfBars, it's List[Foo] or HashMap[Bar] where Foo and Bar can be replaced with anything and methods defined on List and HashMap would still work.

Can you perhaps give a more concrete example?


Take the example of a simple game where you move a player around. In statically-typed languages, it is typical to define a type called Player containing two floats representing the x,y position. In a dynamically-typed language, it is typical to just use a map and store the position as key-value pairs.

Note that inheritance does not solve the problem of overly-specialized functions. The statically-typed language could make Player inherit from the built-in HashMap type, but the functions that require Player will still not work with anything else.


> but the functions that require Player will still not work with anything else.

I think you are equating statically typed to Java. There are plenty statically typed languages which will take care of this, while still avoiding many runtime errors. Heck, even C++ templates will do this for you, although C++ with concepts would be a lot better, sans the compile time. Typeclasses also address the same problem: to signal compile error early.


"In statically-typed languages, it is typical to define a type called Player containing two floats representing the x,y position. In a dynamically-typed language, it is typical to just use a map and store the position as key-value pairs."

What would you say the latter buys you?

I'm not saying I don't see any advantages, particularly compared to practices in some specific statically typed languages, but I think that clarifying this would be useful to everyone in this discussion.


I believe gw's intended notion of a data structure extends beyond Lists and Hashmaps. For example consider the following two structs.

  struct Point { int X; int Y; ... }
  struct Size { int Width; int Height; ... }
In either case the data structure is (int, int). But they're typed differently: one is typed as Point and the other as Size. This makes it very annoying to write functions that are compatible with both. Let's say we write

  AddThenSquare(Point A, Point B) {
    return new Point((A.X + B.X)*(A.X + B*X),
                     (A.Y + B.Y)*(A.Y + B.Y));
  } 
To make Size compatible with that function you need to either refactor your code so that both Point and Size adhere to the same interface, or alternatively implement conversion functions. The latter option is what C#'s .NET framework opts for where Point and Size are taken as parameters to each others constructors. But it doesn't scale. What happens when you use a third party math library with its own point implementation? Your number of conversion functions grow by n*(n-1) where n is the number of classes that need to be converted between. To be fair I think C# has technical reasons for not having an interface between Point and Size (structs don't play nicely with interfaces) but even if it was included it would still be a royal pain making sure all other math libraries adhered to the same one.

In practice it's rare anyone will have the foresight to put interfaces everywhere they're needed, and it's even rarer that programmers will use the same interfaces across libraries. It begins to sound like the "specialization within functions that inhibits and penalizes casual cooperation" that Alan Perlis was talking about.

At some point you start to wonder why the compiler cares at all what the name of the data is if the underlying information is the same. You could have dodged the headache altogether if you were in Python where a point is a tuple (x, y).

Still I'm not convinced static typing is to blame... it's more that the kind of static typing employed by Java, C++, C# is especially rigid.


Why would you want to write a function onto such different data types? Yes, both are (Int, Int) but the elements of the 1st one are not correlated and the elements of the 2nd one are (somewhat) correlated.

I can't think of a meaningful function that should be able to support both Point's and Size's. Except for prettyprinting or serialization maybe, but that can be handled in statically typed languages, no?


Adding a point to a size can give you a corner of a shape. You can have functions for scaling to aspect ratios, normalizing, and stretching. I don't see why any would be specific to one type or the other. In math you don't usually deal with points or sizes, you deal with vectors.

But admittedly conversion between Points and Sizes is not a pain point for most people. It's much worse when dealing with types that have the exact same purpose, just from different libraries.


Depending on your type system, Heartbleed can be a type error. As can at least (a) buffer overflows; (b) buffer size calculation; (c) format string misuse; (d) SQL injection. There are many real programming languages used in practice in which all of these are type errors. There are also more experimental and/or research-oriented type systems that protect against other vulnerabilities; for example, reliance on untrusted input in a security decision, or integer overflows.

Despite that, I do agree with the notion that type systems are a tradeoff, and some projects benefit more from strict typing than others. Dynamically typed languages certainly have their place.


Type system features are indeed tradeoffs, but whether type enforcement is done at compile time or run time need not be. It seems like the dynamic camp recognizes this best (eg Common Lisp, Dylan, Typed Racket), while beginning with a static foundation would be more suited to providing the raw efficiency that it enables.

In a sense this is already done with eg C+lua or C+python. But having them in the same language, with the exact same type system/object model, with the same syntax would be something else (hint hint ;)


> ... most of the 25 top software errors don't look like type errors to me....

First off, most of the 25 top software errors aren't actually _programming_ errors. There's nothing any programming language, whether static or dynamic, can do to help you with them. So that's really a red herring.

Secondly, look again. At least two of the top 25 _are_ actually type errors:

CWE-134 Uncontrolled format string: this goes away if you constrain the input to be a discriminated union type instead of a plain string.

CWE-190 Integer overflow or wraparound: this goes away if you use an integer type which _can't_ wrap around. In fact, they almost come out and suggest exactly this in the details:

> Use a language that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid. > > If possible, choose a language or compiler that performs automatic bounds checking.


Trying to discuss the merits of static type systems by using a Java-level type system as your example case is disingenuous and silly. None of the static typing advocates I know (myself included) advocate for type systems that weak. If you want to make an honest comparison, you need to compare to a strong type system like those in the ML family. Those type systems can solve many of the 25 bugs referenced.

Even stronger type systems, e.g. based on dependent types, can even solve the buffer overflow problem at type level.

Alas, no widespread language uses such a system yet.

Various problems revolving around combining tainted / untainted strings, like SQL injection, must be definitely solvable by a type systems of ML, Haskell, or Scala.


I'm glad to see others find using a Java-level type system is disingenuous!

Can we just leave Haskell out of it? It really makes no sense for the Haskellers to come here and say 'but these would all be type errors in my program'. Its like when there is an argument between Bud Light and Miller Light and you come in with 'but no seriously you have to try this new IPA I just brewed'. Its just not even the same topic.

I recently had a Professor of Computer Science state unequivocally that anyone who doesn't use static typing should have their degree revoked.

From what I have seen, static typing has one important quality: It is somewhat more clueless-management proof! It's entirely possible to have perfectly cromulent development in a dynamically typed environment. Unfortunately, over a 10 year lifespan, it's also likely that during some span of time, the project will be mismanaged and someone will do something stupid vis-a-vis putting an object of the wrong type in an instance var, temporary, or collection, thereby causing problems sometime down the line.

Does this mean that dynamic typing is no good, or should only be used for prototyping? I think not. I think it's more indicative of the poor quality of management of programming teams in the general population.


> In fact, Milner has supposedly made the claim that "well typed programs cannot go wrong". Hmmm...

That quote from Milner is from his 1978 paper "A Theory of Type Polymorphism in Programming". The actual quote is about a technical property of some type systems called "preservation". In his paper, "wrong" is a value without a type that no program evaluates to. Milner's statement that well-typed programs don't go wrong is a technical statement, not the title of an editorial piece.

Edit: here's a link to the paper:

http://www.research.ed.ac.uk/portal/files/15143545/1_s2.0_00...


The 2% figure doesn't take into account opportunity cost. Maybe a project written in a dynamic language has a well-engineered, comprehensive unit test suite that is large enough to cover for most type errors. Maybe that same project in a statically-typed language might not require so many unit tests - or maybe it might. Maybe developers working on the dynamic language version of the project spend more time reading through the code to understand how unfamiliar modules work - or maybe they don't. This sort of stuff is far more useful to know - and far more difficult to measure.

The 2% figure states little more than the fact that a project being developed by competent engineers can be debugged and tested to the point where the project becomes quite reliable and errors become rare, regardless of whether or not it is implemented using dynamic or static typing. I don't think any reasonable static typing advocate would try to argue that static typing is the only way to catch type errors.

The parser study is even less informative. It covers a project of very small scope using both a custom language (with a type system of very little expressive power) and a custom development environment, and does not take into account long-term maintainability or extensibility.

If you want evidence, keep on looking.


Types are invariants, but only some invariants can be expressed as types. The most complex invariants -- which are the most difficult to keep working as the code evolves -- generally can't be expressed as types. This is why, to me, static typing as a religion is not attractive.

On the other hand, type systems are improving. On the other other hand, no computably checkable type system will ever let us express all our invariants. So this question will never be entirely settled.


> On the other other hand, no computably checkable type system will ever let us express all our invariants. So this question will never be entirely settled.

I don't understand. Unless we can produce a type system which literally prevents all errors, type systems are useless?


Where did I say they were useless?

When I say that static typing as a religion does not appeal to me, I mean that I think dynamically-typed languages are reasonable choices for some kinds of programs, and that I do not agree with the sentiment quoted in the article that their use should be considered grounds for revocation of one's degree.

That doesn't mean I don't see the point of static typing as well.


The greatest advantage I've found for static typing is IDE support. I find it's a lot easier to change things in something like Java using Eclipse or IntelliJ IDEA, vs Python using PyCharm, just because the IDE can have deep knowledge on what you're the code is doing, and all the places that need to change. Of course, if the language has restricted expressiveness, you might introduce other components to make up for it (e.g. XML, templating DSLs like FreeMarker, SQL); then you've basically introduced dynamic functionality and can end up with runtime errors anyway.

I miss the automated refactoring I enjoyed writing a hundred thousand lines of C#, but it's all feels. I'll never miss the time I spent fighting the type system when it got in my way. Covariance and contravariance in generics, doubly so. That's probably feels, too.

We're not talking productivity, though, but safety. Bugs that static typing would have caught are rare enough that I call them out as I make them in pairing sessions to throw a bone to the Java fans on the team. Dynamic typing is simply not causing a massive uptick in bugs.

Our maintenance problems in JavaScript come mainly from trouble following code using functional composition and callbacks. I'm not sure there's a type for "this method had better call either call back or call an asynchronous method, and the same goes for the callback provided to that method, in infinitum", but I'd find that handy.


Legal | privacy