For someone quite interested in these kind of problems, what's the current best practice for solving these kind of problems? Not on this scale but still large enough that Google's built-in api for solving it (only 25 points allowed) can't do it.
I did something similar for a client about 4 years ago and used simulated annealing to reasonable success. I would be interested to know if there are better options now!
99% of their code will be ordinary CRUD plumbing, dealing with orders and integrating with other systems. The algorithmic code is the fun part, and they've had years to tune it.
This is just self-congratulatory PR, trying to persuade potential investors that's they're a hard tech company.
Ocado is an interesting case. Their public line is that they're a technology company that happened to end up delivering groceries. They invest a lot in automation and robotics - definitely a hard tech company:
This is not a vapourware start-up, they've been leading the grocery delivery business in the UK for years. That said, they have the same problem as Amazon in that they invest so much into RnD that they're not terribly profitable (about £10M on a $1Bn turnover).
That said, in this particular example I don't know how much their solution differs from Royal Mail (which uses fixed routes) or people like DHL, Fedex, et al. Routing isn't exactly a new problem and courier companies presumably have always had a lot of people working in operations research.
The missus uses Ocado, they really have their shit together in the UK on the logistics side (a field I work in but not groceries), the deliveries are on time, the process is slick and professional and the drivers are polite and helpful.
Is there any chance I could send you an email or chat with you further about this? I've been using the ortools python package and while it works, I've found support hard to find and would just like to chat with someone about it. I'm an academic and would love to talk if you have the time.
I think it depends if you're interested in knowing how to solve these problems, or simply in solving them. If the latter, there's a bunch of commercial solvers out there. I won't name any for fear of an adverse reaction :0) but a trawl through the first page of Google results for 'optimization solvers' would probably work.
more about the how is what interests me. I just find it fascinating. I've tried nearest neighbour, ant colony and simmulated annealing thus far (Or read about them and tweaked some code)
The article is quite vague on the problem formulation, other than being somewhat equivalent to the TSP, so I'm just going to speculate a bit onto how they solve it. The TSP can be formulated as an Integer Linear Program (http://examples.gurobi.com/traveling-salesman-problem/). This type of problem is very well studied, and several free and commercial problem formulation tools and solvers are available, some that scale to thousands of variables.
They also probably decouple their problem into disjoint regions before solving it, to reduce the dimensionality of the problem(s).
The article says that their solution is in house, and they don't seem to be using integer linear programming. My guess is that they start with a random feasible solution using a fast greedy heuristic, then iteratively improve it using a simulated annealing process.
There are a few commercial options available, with different levels of complexity, performance and price. Depends on whether you need to support multiple vehicles with several deliveries on each, specific delivery time windows, and things like that.
For example, Oracle sell a product called 'Real Time Scheduler' [1]. Google finds me a bunch of other products too, but I haven't looked at them in depth.
The off-the-shelf software can deal with tens to low hundreds of orders per solution, - which can scale reasonably well if you're willing to divide your service area with boundaries you can't optimise across.
Ocado built a custom system which can support more orders with fewer boundaries; this offers better routing efficiency, which is important as delivery efficiency is a key cost driver. If your business does something like, say, washing machine repair your time-per-customer might be dominated by repair time rather than travel time; in that case you might be able to operate a nationwide business on off-the-shelf software.
The underlying algorithm in Ocado's system is based on simulated annealing [2] - although those precise words don't seem to have made it into the article!
This is a similar problem to figuring out search results when you try to find an airline reservation. It's a problem people have been working on for almost 50 years and it still works, although there are a lot of likely shortcuts. I also remember we had some kind of system that tried to optimize pricing and schedules for airlines which which ran for days sometimes (this was in 2000 or so). I think modern hardware is much better and allow algorithms that are much more capable. I wonder how you test something this complex that is never actually done; how do you know its working optimally (or at least acceptably)?
My buddy and I gave a talk about how we solve the same problem at RedMart. We're not as large as Ocado (yet?), but this video has more technical content: https://www.youtube.com/watch?v=6fmFb3r7Tmg
My use case is most often in the kitchen, hands covered in flour and realise I'm running out of product X. At the moment I bark at my Echo to add something to its internal "shopping list" feature, but I'd rather those commands hooked directly up to Ocado!
There have been internal discussions of providing an Alexa skill (or something broader to support other home automation/digital assistant products).
I believe it was a 20% time project, so I don't know how far it's got. I'll ask the person who was working on it.
We have an API for our mobile apps, but due to the constraints of backwards compatibility it's not especially elegant, which is why it's not publicly documented. I'll ask internally about making it more available, but as I'm sure you appreciate, it's difficult to show there's customer demand for that kind of thing :)
The JIT compiler works pretty well, and with tools like JITWatch, Honest Profiler and the -XX:+LogCompilation -XX:+PrintAssembly options you can understand how the hot paths are running in a lot of detail.
For our current operation, the optimisers' servers run Ubuntu on bare metal in our own data centres. Of course, like in any large organisation we make plenty of use of virtualization and EC2 - just not for our optimisers at this point in time.
Our current design isn't well suited to adaption to a GPU, because it branches a lot and the memory accesses aren't strided evenly. So we couldn't just plug our current code into a java-to-cuda compiler; we'd need to change the design.
Thank you for the answer. I was not aware that these kind of considerations impact the cpu vs gpu debate. I was under the impression that most cpu heavy computations can be ported to the gpu.
I take that it's a 'kind' of decision tree or random forest with gradient boosting, so usually it 'can' run faster on a CPU if optimized than on a GPU (if I'm not mistaken). That's at least what I get from BigML's offering.
It is not decision tree or any machine learning algorithm.
It is a local search algorithm that probably has simulated annealing or tabu search as a metaheuristic.
Although, they probably segment the deliveries to some common starting points and the problem size is reduced significantly - maybe to around thousand orders per starting point.
Research can easily optimize thousands of deliveries very effectively.
Do you route based on anything except distance? Obviously you can't deliver to people at 2AM, and drivers want to get home by a reasonable hour, but what about other stuff like weather and road traffic conditions, priority or late parcels, etc? If so, how do you weight these competing and probabilistic concerns?
We take into account long-term trends in weather and traffic conditions like road works, school holidays, and recurring traffic patterns.
The drivers' sat navs also (IIRC) have live traffic, which can send them down different _roads_ in response to things like accidents causing traffic jams.
Of course, there are limits to our ability to respond quickly to changes in traffic conditions for simple logistical reasons - many drivers' vans are loaded once at the start of an 8-hour shift, and we can't predict traffic accidents 8 hours before they happen!
Certain other competing concerns are balanced with a 'cost function' which has been tuned empirically.
Took a couple classes in graph theory in college which was interesting because of the complexity involved in solving for optimal routes. We started with brute-force search and moved on to other shortest path problem algorithms like the ones mentioned here:
reply