There's one important use case for Dijkstra's alternative (c), that is, an interval closed at both ends. That is when your set has a maximum element and you want to be able to express an interval including the maximum. This is, of course, important in any programming language whose basic integer types have a bounded range. For instance, the following loop in C never halts:
for(uint8_t i = 0; i < 256; i++) { ... }
What's even worse, the following loop has undefined behavior because signed integer overflow is not defined:
These would be peculiarities of the C programming language, not issues of typed integers or of expression ranges. Other, actually-strictly-typed languages will complain that 256 doesn't fit in a byte, or if 256 is a larger sized integer, that automatic comparisons between differently sized integer types is not allowed.
Yes, but my point was that the solution is to use a closed range instead; in this case, use `i <= 255` as the loop condition. Another solution, of course, is to widen the type of your induction variable.
This is arguably a limitation of C-style for loops.
For example, Python:
for i in range(0, 256):
Rust:
for i in 0..256 {
and many other languages have similar things.
Proper ranges also mean that the compiler never needs to do complicated reasoning that may depend on signed integer overflow having undefined behavior in order to prove that the loop has a finite number of iterations or that `i` doesn't wrap around.
Yes, I considered mentioning Rust as a more modern example. But even in Rust, ranges such as `0..max+1` are awkward enough that a closed range syntax was added to the language. With a range like `0u8..=255u8` you can also ensure that your induction variable can directly be passed to things expecting a u8 without a cast that might hide an accidental overflow.
I expect most counting for loops are one of two types: 'count from A to B inclusive' and 'count N numbers starting from M'.
We commonly have syntax for the first, or sometimes some variant of it designed to avoid the `-1' in the common case, at the cost of complicating some other cases. It's also common to have some rather over-general iterating mechanism called "for" that covers both, or tries to.
But I don't think I've ever seen anything special for the second, for some reason. And the iteration counter and the loop value could have different types - so you could count 128 int8_t values if you wanted. Though in most cases you'd just use it as a replacement for "for(i=m;i<m+n;++i) {do stuff with i}" or "(for i=0;i<n;++i) {do stuff with m+i}".
(One possible exception: the ancient 1980s computer I used as a boy had two syntaxes for saving blocks of memory. You could enter the range as "A B" (save from A to B inclusive), or as "M+N" (save N bytes starting from M).)
In fact, there's no way to make a C for loop run 256 times using only an 8-bit counter, because there aren't enough states. Looping N times requires N+1 evaluations of the termination condition, and with only 256 states of the index variable you can't have more than 255 iterations.
You can do it with a do { } while loop, which only needs N evaluations of the termination condition for N iterations.
Wow. That is the best handwriting I've ever seen. Makes sense if you want people to read what you've written. I was baffled by the fact that my mathematics peers in uni
had sich illegible hand writing, like they were doing it out of spite. I found I was correct more often when I took the time to articulate myself legibly, then again I concede I lost probably letter grade on average from all my exams because I ran out of time so often. Maybe that explains it.
Is that his real handwriting??? That's not a fancy font?
Wow, it really IS a treat. I've always heard people say how much a person's handwriting can tell you about that person. Mine is a mangled mess, and I'm pretty disorganized in most areas of my life.
No it doesn't. Let's define the natural numbers to start with 42. Then the left side of the sequence 42, 43, 44 being represented as 41 < x is unnatural, since 41 is not a natural number.
It's clear that the author structured his essay carefully to avoid this exact assumption. He explicitly avoids considering whether the natural numbers start at 0 or at 1 until after he has chosen "a)".
Starting at 1 is more intuitive, and therefore IMO better.
Regarding some of the advantages of zero-based:
- Indexing backwards from the "end". Your language can always add an `end` keyword like Matlab does, and this stops
being an advantage.
- Indexing cyclically using modular arithmetic. Yes, this is an advantage. Albeit a rare one for me.
I commit less off-by-one errors with 1-based, and I don't have to double-check as much -- so on balance, I prefer it.
[edit]
The amount of karma this comment is getting is undergoing something like Brownian motion.
Matlab uses 1-based indexing. I've used it extensively and it was definitely a bad move. One particularly annoying example is splitting an array into blocks, e.g.
for i = 1:numel(s)
foo(s[i])
end
That is fine, but what if you want to process in blocks of N? Now you have to do:
for i = 1:(numel(s)/10)
foo(s[(i-1)*N + (1:N)])
end
Ugh no thanks. This is as simple as it gets too. For more complex array manipulation... enjoy adding and subtracting your 1's. 0-based indexing is just way more natural.
Yes probably reshape would be the natural MATLAB way to do it. Under the hood it doesn't change the memory allocation of the mxArray, just the metadata -- so the reshape takes virtually no time. But if you wanted to do the loop like that, I think you could simplify that to 1:10:end , btw.
Also there is a blockproc function that will do it for you, you just pass in the chunk size and a function handle to 's', but it's in the Image Processing toolbox
I can agree that's intuitive, but I'd like to point out that's not quite the same thing as numbering a sequence.
If we have a sequence a_1, a_2, a3, ... we can talk about a_3 by calling it the third term. If we have a sequence a_0, a_1, a_2, ..., the third term is actually a_2.
Whether or not we should index starting at 0 or 1 is probably dependent not only on intuition, but the application at hand. For most analytic purposes it's generally more useful to talk about the nth term, and we don't need to know a specific number to reason about the distance between any two indices. For other purposes, such as programmatic ones, it is useful to know e.g. the traversal distance between two items in a list.
In my opinion it's best to first consider whether you're working in more of a mathematical or programmatic context, and then secondarily who will have to read it later on.
You don't have to talk about memory locations, or computers at all, for the intuition to work. `a_2` is two spaces away from the beginning of the list.
The fact that our ordinal numbers are closely connected with the off-by-one cardinal numbers (e.g., "third", meaning the element of a sequence in position 2, is closely etymologically related to the word "three") is an unfortunate defect of language.
> we can talk about a_3 by calling it the third term
It's still the "third" term. People commonly refer to the "zeroeth" term in a list as the "first" term, the "first" term as the "second", and so on. Admittedly, the usefulness of "zeroeth" depends on how often you think about the mechanics of array traversal, and that's probably not often if you don't program computers.
In other words: You noticed that array offset operations have an additive structure, and as the neutral element of addition is the zero, the only sane way to do indexing is to start at 0.
Having reached introductionary programming to university studendts, I would say that it’s subjective for the individual but still generally true that 1-based indexing is more intuitive.
Intuition is the name we give to what we learned early.
0 and 1 are both perfectly fine starting points. They both have advantages and disadvantages.
0 is more common since a lot of languages use C as a starting point in some fashion. C chose 0 because then it can create syntactic sugar for sequential memory access via pointers. Or arrays.
> Indexing backwards from the "end". Your language can always add an `end` keyword like Matlab does, and this stops
Having used Matlab a bunch, in practice this sucks, because “end” is treated as this weird special case in the language grammar, and many reasonable and convenient expectations of syntax that should work turn out not to. Folks writing complicated Matlab projects end up needing to work around it, and the workarounds are brittle and confusing.
Using negative integers is a whole lot easier to reason about and work with.
Everyone learns in grade school how to do arithmetic with negative integers (adopted by European mathematicians in the 17th century). Even Matlab experts don’t always understand Matlab ‘end’ arithmetic.
Do you have any examples? In all my years of teaching and working with matlab I have literally never seen any student or code that showed such issues so I am very curious?
Only when counting. Why should 'numbering' be the same as 'counting'? I don't see why that would lead to a generally more intuitive when most of what you calculate with 'numbering' is offsets and indexes. Offsets begin at zero. Indexes could be by ordinal and not by offset, I suppose, but I don't see how that could be a benefit in typical index calculations--mostly they're just offsets in my world, distances between indexes.
I rarely ever refer to specific, literal indexes, so I can't say that I have much of an opinion on them, but that does seem to be the area where indexing by ordinal would make the most sense. Perhaps this is the main operation that dominates numerical calculations? Rails even adds english ordinal methods (e.g. first, second, third, fourth, fifth) for this purpose.
The discussion of how to denote the bounds of a sequence may be missing one consideration: Intuition from language. When speaking we articulate a range by stating the first and last permissible value. E.g. "Pick a number between one and ten". My point is understanding the length of the sequence is rarely as important as quickly grasping its bounds.
"In corporate religions as in others, the heretic must be cast out not because of the probability that he is wrong but because of the possibility that he is right." +1
In another thread last week, someone mentioned that 1-based indexing is more common in numerical analysis, so it has appeal for languages that are "math focused". Anyone have insight on that?
As far as I can recall, I've seen both 0 and 1-based indexing in statistics.
> To denote the subsequence of natural numbers 2, 3, ..., 12 without the pernicious three dots
What's so pernicious about them? I don't see it. It seems like a clear and intuitive way to communicate a sequence to me. I even wrote a little range generator in JS to explore parsing declarations like that. https://github.com/chrisbroski/iterize It seems to work fine.
I _think_ he calls them pernicious because of the ambiguity.
2,3,4,12 is also a "subsequence of natural numbers", but probably not what was meant by 2,3,...,12. You might counter that ... means the complete subsequence, but what do you mean by "complete"?
If I write 2,4,6,8,...,24, you probably want the ... to mean 10,12,14,...[1], not 9,10,11,12,...
Basically by the time you make ... precise, you are better off just writing the mathematical notation.
I always like to point to the Julia package TwoBasedIndexing (https://github.com/simonster/TwoBasedIndexing.jl/) which is IMHO provides the very best way to index an array. Yes, Julia defaults to 1-based indexing, and allows zero-based indexing (https://github.com/JuliaArrays/OffsetArrays.jl) or indeed whatever sort of indexing you want, yet I think the world is going to eventually come around to THE ONE TRUE indexing scheme of being two-based.
I remember reading a funny quote that the obvious compromise between 1- and 0-based indexing was 0.5, and that it was dismissed without proper consideration. Unfortunately, I do not remember where I read that or who wrote/said it. :(
When I was learning QBasic some billion years ago I recall that I was happy with arrays generally starting at 1 (even though this is user specified in Basic). When I learned C I initially wasn't thrilled about re-learning arrays, but eventually grew to only like 0-based arrays.
I'm thinking this is like people who drink coffee will cite studies saying it is good for you, and people who hate it will cite opposite studies.
I had been comfortably programming in C and python for years and found 0-based indexing to be perfectly sensible. For the last couple of years I've been programming a lot in Julia. Initially I was horrified by the idea of indexing from 1 but I got the hang of it remarkably fast. Now I feel it's totally natural either way, depending what language I'm in.
I think it's only ever been a matter of preference as you say. And Julia was otherwise such a pleasure to write code in that it was easy for me to pay the small price to learn that.
Maybe indexing should be considered harmful. We're living in the age of really smart compilers anyways. It's time to move on. Elixir sets the example - do everything with TCR, provide an Enumerable interface, and beyond that use map/reduce.
There are cases where zero-based indexing is more natural than one-based indexing. An example is naming the centuries: It would be nicer if this were called the twentiethan century instead of the twenty-first. When people talk about the fifteenth century, I have to think for a bit to understand what they mean. As such, I propose adding a suffix "an" to the end of any ordinal number to indicate a zero-based convention is being used -- this century would then be the twentiethan century [twenti??n] as well as the twenty-first. A bit like degrees vs radians.
Programmers seem in general to know that 0-based works best, but mathematicians (and mathematical and statistical programming languages) seem to prefer 1-based, and I don't understand why that is? Isn't mathematics also easier with 0-based?
E.g. if you divide a matrix of 100 columns into 20 vertical bands of width 5 each.
Mathematicians use 1-based indexing for both the element index and the band index, so there band n would start at coordinate "(n - 1) * 100 / 20 + 1"
For a programmer, band n would start at "n * 100 / 20"
That's two correction terms that you need to add in math which programmers don't!
I had to use Matlab for microphone arrays once and it was full of + 1's and - 1's everywhere due to that.
Another example of mathematics and off by one errors: a polynomial. They call it "degree n" if the highest power is n, except I see n+1 coefficients in there and need to allocate an n+1 sized array to contain its coefficients, so why not call its degree the amount of terms, including the "x^0" one. The powers themselves in the polynomial are already hinting at 0-based indexing in this case.
Mathematicians, please use coordinate "0,0" for the top left element of a matrix :)
Interesting that you mention polynomials, as the constant term in them is gennerally given the subscript 0. In fact, it sounds like you want some form of 1 based indexing, where constant polynomials are degree 1, linear degree 2, etc.
So random question because I know I would get this wrong in an interview, but would a 0 based indexing system have fewer assembly language instructions to calculate a memory offset than a 1 based system?
Naively, yes. In practice, you could treat them exactly the same and either decrement the physical pointer by 1 unit (so an index of 0 would underflow) or leave the first index unused or for metadata (eg. array length)
There might be some other tricks you can play depending on your instruction set, but nothing comes to mind.
Nope, not in most architectures. In a 0-based indexing system, the base pointer points to the first element of the array. In a 1-based indexing system, the base pointer points to the slot in memory that isone spot behind the first element in the array. This is how Julia (and Ada) does arbitrary indexing.
Stan Kelly-Bootle's famous bon mot on the subject: "Should array indices start at 0 or 1? My compromise of 0.5 was rejected without, I thought, proper consideration."
Once upon a time when I was young and foolish, I was doing something [1] in C++, which has 0-based arrays. There were a lot of places in the code where 1-based arrays would have made the code quite a bit clearer, but a lot of places where 1-based arrays would have made it much less clearer.
I wanted the best of both worlds, and so overloaded the () operator so that if arr is an array, then arr(i) = arr[i-1].
This worked reasonably well, especially for arrays where it was always clearer to go 1-based, or arrays where it was always clearer to go 0-based, so that the array was always accessed with the same operator.
The only places it was questionable whether or not it made the code clearer was where I used both in the same section of code. E.g., something like b = a[i] + a(i) is arguably less clear than b = a[i] + a[i-1] or b = a(i+1) + a(i).
[1] I think it was implementing an arbitrary precision integer library using algorithms from TAOCP, but I don't remember for sure.
reply