Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

> No, because in the context of a binary search you should be able to map everything to a positive integer. So if your question is "find the midpoint, so I can start searching there" your "midpoint" is not going to be a negative number in the context of binary search.

That's not true. Binary search doesn't map everything to a positive integer. You are incorrectly looking at the index offset, but the issue is determining the midpoint between two signed numbers.



sort by: page size:

It is being pedantic, but the issue is in the context of binary search:

Let us say that I ask you to find the number I am thinking about between -1000 and 1000, by repeatedly guessing a number. With each guess, I tell you whether your guess is correct, smaller or larger than my number. A binary search algorithm tries to find a value in an interval by repeating finding the midpoint, using smaller and smaller intervals. You might start with 0, then use either -500 or 500 and so forth.

You don't need negative numbers to do this.... you can easily do it with only positive numbers.


The -1 will never actually happen though -- it's -1 if the binary search algorithm itself tries to do an out-of-bounds array access. If that happens, your binary search had a bug. So in this case possible bugs are being obscured, rather than surfaced. This is bad.

Integer division by 0 defining to 0 is interesting and certainly convenient; arguably you will accidentally end up obscuring bugs, when you didn't intend to divide by 0.

The broader point I'm making is it doesn't seem great to me to represent what are basically bugs with Maybe.


Other comments point out that can overflow for some values.

However if you're doing mid-point in something like binary search where you already know y >= x AND x >= 0, then x + (y - x) / 2 is indeed a fine choice. It's a good one to remember.


How does that avoid the problem? Assuming you used a signed integer for size and the list is large enough, there is a sign roll over on size. If size becomes less than -2 then low + size/2 is less than low (on the first pass of a binary search you now have an invalid iterator for mid). Assuming you used unsigned integers size can still not be the correct value if a rollover has occurred though this time mid will always be valid.

Good call. As per link in http://news.ycombinator.com/item?id=1277701 (elsewhere in this page):

the binary search program that Bentley proved correct and subsequently tested in Chapter 5 of Programming Pearls contains a bug [...] it fails if the sum of low and high is greater than the maximum positive int


Binary search is one of those algorithms which seem really easy, but in practice is hard to get exactly correct. See https://en.wikipedia.org/wiki/Binary_search_algorithm#Implem... .

One of the classic problems is numeric overflow, when people compute "(lowerBound + upperBound) / 2. This implementation avoids that problem by doing "lowerBound + (upperBound — lowerBound) / 2)". However, as the code is in Python, where integers are arbitrary sized, this isn't actually a problem. I think this was copied from a C implementation; the "cast" also reveals a C background as in Python this is a conversion.

It also appears there is a mistake in this code, in the part which says

        elif x < number:
             lowerBound += 1
It should likely be lowerBound = middleIndex. As it stands, something like binarySearch(range(9), 8) will test items [4], [5], [5], [6], [6], [7], [7], before finding the value 8 at position [8].

This is the sort of error which is hard to identify without careful testing, as the code will generate the correct answer even with the wrong algorithm.

P.S. 4 submissions to the same essay during 2 days seems like excessive promotion.


(int32_t)3975308642 == -319658654

Overflow makes intuitive arithmetic do unusual things.

Similarly, using a+b/2 for binary search midpoints is wrong. http://googleresearch.blogspot.com/2006/06/extra-extra-read-...


Ah. I was confused because of this: "the index — a one-byte integer — could never exceed 255 by definition." If it can hold the value 255, it couldn't be a signed integer, so there would be no room for negative numbers. Though I suppose this must have been their mistake.

It’s not adding negative indexes, a negative number will index from the end.

So [0, 1, 2, 3].at(-2) === 2


You can't search for -1 either. Are they going to switch to negate_number(ONE, true) for people who haven't learned that syntax?

(The "true" causes the value to be returned rather than printed, obviously.)


Also, this one:

  a + (b - a) / 2
has issues that if b is large and a is negative but "large" (in magnitude), then b - a can overflow.

I mention that one, since that was how I was taught to do it for binary search, to prevent overflow. (Binary search having the advantage that it can't overflow, b/c a and b represent indexes, and are thus non-negative, so this works there but breaks in the more general case discussed by the paper.)


I just learned that Java has an unsigned right shift operator >>> from an article cited as a reference in that Wikipedia section. The article has some more details on the subtleties of binary search. For instance, the 20 year old bug in an algorithm proven to be correct was down to integer ranges:

  int mid = (low + high)/2;
breaks (obviously in retrospect, am I missing something?) for

  low + high > Integer.MAX_INTEGER
It can be replaced by the elegant (but to me, somewhat oblique)

  int mid = (low + high) >>> 1;
http://googleresearch.blogspot.de/2006/06/extra-extra-read-a...

Also from that page, here's the link to the bug in Sun's tracker, priority 2-High, evaluation: "Can't even compute average of two ints" is pretty embarrassing.

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5045582


In defense of the author re: 2, since the application domain seems to be manipulating indices for a binary tree represented in an array, the integer will always be non-negative.

On top of that, this 'fix' from the article, even if it were corrected to use size_t instead of unsigned int...

  In C and C++ (where you don't have the >>> operator), you can do this:
  6:             mid = ((unsigned int)low + (unsigned int)high)) >> 1;
Has the same bug as the original snippet, just in unsigned space. Now, granted, binary searching a >2GB byte array in a 32-bit process is an unlikely use case... but it is technically feasible!

I agree with you, but it doesn't sound like it would have helped things in this domain, as a negative index into a string (if they used signed ints and subtracted larger values) would have produced (according to the spec, undefined) values that were likely just as wrong.

> My code also doesn't handle negative integers correctly, but I'm pretty sure there is a way to do it.

It's pretty easy to show by induction that you can't handle negative numbers in the general case. Consider: if memory locations 0 and 1 are both negative, so is their sum. But the only operations available to you are setting a value to 0, and incrementing it. You can't produce a negative number that way.


Conveniently, we are not shown the type of n or its valid input range: if it is a int16_t or we know it to be upper bounded by 1<<30, that's proper vanilla binary search.

-1 seems ok here. The algo is returning the index at which the array contains the search value; returning -1 is a pretty normal way of indicating that there is no such index. She could of course change the search function to return a Maybe Int, then return Nothing in the case where the array does not contain it. Either way makes sense to me. The -1 value is actually warming on me more and more because if the returned index is later used to access the array, it will definitely return Nothing. But using Maybe.andThen to monadically deal with a Maybe wrapped return value that would also be cool.

Float division by zero works the way IEEE floats work: it becomes +/- Infinity. For Integers it actually becomes zero. This makes sense because to me you probably should have handled the case where the denominator was 0 before you got to the point of doing division. Like are you taking the mean of an empty list? What's going on?


No, you get room for one more negative int. x = x < 0 ? -x : x; is a common buggy abs(), e.g. when printing an integer. One should test if it's negative, perhaps print the '-', and then ensure it's negative for the rest of the per-digit code.
next

Legal | privacy