Base-1 makes sense in sciences where you've Fortran, MATLAB, R, Mathematica. If anything before Python's rise in popularity base-0 was the strange choice. Fortunately you can use arbitrary indices with OffsetArrays.
Anything other than base 0 would be unacceptable in python; people would go crazy. Personally, this is the biggest reason I can't use R—my brain doesn't work with position-based indexes; I'm just hardwired to offset.
This is something I don't fundamentally understand. As someone who works a lot with MATLAB I'm very used to and like 1-based indexing. But when I use C or Python, 0-based indexing is not something I complain about or hold against the language. It's just the way things are.
Maybe if you don't think of it in terms of a different index basis and instead you think of it as indexing vs. offsets then it becomes easier to switch between the two?
all serious numerical languages have that. it's more natural
0 base indexing is only good for calculating memory offsets. Nothing else. like in go `vec[a,b]` is indexing `a` to `b-1` which is purely because it's more convenient due to 0-indexing. this `b-1` is hugely confusing and big gotcha for the layman.
I'll start by saying that I greatly prefer 0-based, and have used but 0- and 1-based indexing, but the choice is largely arbitrary.
0 makes sense as the '0-th offset' when thinking from a pointer perspective, but I often find when teaching, that 1-based comes more naturally for many students (the 'first' item).
You mention mathematical or scientific work...but I often/mainly see enumerations (such as weights x_1, x_2, ... x_n or SUM 1 to N) start with 1, so for these 1-based can be a more natural/direct translation of mathematical notation to code.
Every language for scientific computation uses 1-based indexes, to follow the tradition of using them in formulas. FORTRAN, APL, Julia, R, MATLAB, GNU Octane, etc.
I remember that (some versions of) BASIC back in the day had an option to switch between 1-based and 0-based indexing, to appeal to both Fortran and C folks.
Programmers seem in general to know that 0-based works best, but mathematicians (and mathematical and statistical programming languages) seem to prefer 1-based, and I don't understand why that is? Isn't mathematics also easier with 0-based?
E.g. if you divide a matrix of 100 columns into 20 vertical bands of width 5 each.
Mathematicians use 1-based indexing for both the element index and the band index, so there band n would start at coordinate "(n - 1) * 100 / 20 + 1"
For a programmer, band n would start at "n * 100 / 20"
That's two correction terms that you need to add in math which programmers don't!
I had to use Matlab for microphone arrays once and it was full of + 1's and - 1's everywhere due to that.
Another example of mathematics and off by one errors: a polynomial. They call it "degree n" if the highest power is n, except I see n+1 coefficients in there and need to allocate an n+1 sized array to contain its coefficients, so why not call its degree the amount of terms, including the "x^0" one. The powers themselves in the polynomial are already hinting at 0-based indexing in this case.
Mathematicians, please use coordinate "0,0" for the top left element of a matrix :)
I actually prefer the 0 indexing of python. Systems code (c/c++ etc.) already uses 0 based indexing. So it is nice that when you do data science the convention stays the same.
I know Visual Basic gets a bad rap, but as a learning language it had an interesting feature, `Option Base`. By putting `Option Base` in a module, it changed how the indexing of arrays worked. It defaulted to 0, but for some applications (and also, when you're first learning), 1 can be convenient.
Of course, there are problems with this in a professional setting, such as how do you enforce uniformity across modules, and what happens if you copy code from one module that's Base 0 to one that's Base 1 and vice versa. But when I was first learning how to program, it was helpful to me to have a language that allowed for some choice.
In the meantime, 23 years of programming have led me to believe that index base 0 makes sense. For many applications it's moot, because we should be using higher level functions (like map and reduce) for processing lists. In every other application (such as working with data on a grid), dealing with offsets does make things easier.
Perhaps the convention is arbitrary. But, lots of industries have arbitrary conventions that we all agree on just to aid communication, and I disagree with the original author that the term "groupthink" applies in this situation.
It lets you translate freshman level math problems with the same indices. As soon as you deal with Fourier transforms, Vandermonde matrices, polynomials, or pretty much anything where the array index is related to the value in the array, zeros make more sense. People like 1-based because they're familiar with it, and that's fine, but pretending that it makes more sense because their familiar with some math that uses 1-based is silly.
The problem is that libraries which assume 0-base break when you have a 1-based array. And vice versa. Trying to combine libraries with different conventions becomes impossible.
Therefore changing the base leads to more bugs than either base alone.
That said, the more you can just use a foreach to not worry about the index at all, the better.
Of 0-based and 1-based, the only data point I have is a side comment of Dijkstra's that the language Mesa allowed both, and found that 0-based arrays lead to the fewest bugs in practice. I'd love better data on that, but this is a good reason to prefer 0-based.
That said, I can work with either. But Python uses 0-based and plpgsql uses 1-based. Switching back and forth gets..annoying.
can you give me an example? I did all sorts of numerical indexing programming, from fortran, matlab, python/numpy and R, and I am definitely more comfortable with 0 based indexing.
As someone who does quite a lot of image and signal processing the fact that their arrays are 1-based is a complete deal breaker. The first few years of my career I worked with Matlab and I hated 1-based arrays with a passion. I think I never encountered a situation where the Matlab way makes stuff easier, almost always the 0-based index is the more natural choice.
After I switched to Python/C I can say that I never want to work with 1-index languages again.
Historically 0 based is for low level languages. 0 based makes sense for C.
My issue is higher level languages Python, Java, C# there is plenty of things that just are complicated and doing arrays off 0 is one of them. Doing data science or statistics just makes it obvious that you have two different sets on numbers. 1 doesn't mean the same thing in every instance in your programming and the functional programming side of me hates that.
In mathematics, a matrix doesn't have an 'offset' or a 'starting point.' I think that perceiving matrices in those terms is an artifact of thinking of matrices as sitting in an address space, where the elements of your matrix are part of a larger span that can contain other data.
A matrix is a collection of elements and nothing more. Indexing starts at 1 because that's the first element in your matrix, and numbering the first element 1 makes sense. Numbering from 0 makes sense when it represents an offset into something, but a matrix isn't that. 0-based indexing just always feels to me like letting the implementation details leak out. (I don't feel this way when an array actually represents a chunk of memory, rather than a math object.)
The proliferation of +1 and -1 depends on the application. Some work better with 1-indexed, some work better with 0-indexed. Personally I get annoyed at having to use `len-1` too often when working with 0-indexed arrays. This is why some languages like Julia (and FORTRAN apparently?) let you choose your index-base, which... has trade-offs.
reply