Tooting my own horn here, but I'm working on a book covering the Go Standard Library. Still in progress, and not at the sync package yet, but it's coming along. Check it out if you feel inclined.
You know what'd be cool? To take arbitrary code in a language, pattern match on the implementation in that language of each std lib function, and actively recommend substitutes for duplicated code.
That would be very cool. I wonder how hard it would be though. At least in Go I know that the standard lib contains lots of duplicate code (primarily so that things that should be small don't require larger things as dependencies. I think the time package's String() functions use reimplemented fmt package functionality for example, since fmt is a much larger dependency than time should have.)
Yes. What I mean though is that if you are reimplementing, say, fmt.Printf, such a suggestion system might correctly suggest you use fmt.Printf instead, but also suggest you can use func (m Month) String() string from time, or something equally silly.
Since the standard libs in Go duplicate code, you would have to be careful that your suggestion system isn't picking up false positives. I think the idea has a lot of promise though.
Even so why would you write that boilerplate code out each time?
Something like this would work even better:
result = src.asyncMap { |e| dowork(e) };
Except Google Go returns several values instead of tuples, so you can't just collect all the results as-is. And with no generics it would be annoying to actually use e and the result, since they would need casts. Too bad.
If you think you can do asyncMap in Google Go, without casts and without manually collecting multiple return values, by all means show us the code. I would find that really interesting.
It's a one-off script, I didn't really care about errors. If this was something run regularly inside of a bigger application yes I'd have full error handling.
But tiny throwaway scripts get built up into huge applications all the time, sometimes by other people who don't have the mental TODO to go back and handle errors.
This is one of the things I love about go. If he wants the return value but doesn't want to address the errors, he has to actively do it. It makes you think twice when you are putting a _ for an error return. And now to another programmer coming in to maintain it, they stick out like a sore thumb.
I know zero Go (it's a little far down on the to-learn list), and maybe he edited it, but I went back and read the GP's code and didn't see anything that looked like ignoring errors.
It says call ReadFile. In go a function can return multiple values. ReadFile returns an byte array and an error value. Normally you'd check to see if err is nil, if it is then no error happened. If it isn't you can check it out for information about the error and handle it.
A unique feature about go is that declaring a variable and never using it is an error and not a warning. Actually there aren't even compiler warnings. Either it is right or wrong. This means if he had called:
file_in, err := ioutil.ReadFile("domains.txt")
but never checked err it would not build. So to get the byte array but not the error you use the _ symbol to tell it to throw away that return value. This is what I was on about in that you have to actively ignore error handling if you want a return value.
It seems tantalising for the compiler to also protest when return values of type error are not assigned to anything. An obvious inconvenience being that use of fmt.Println and similar would suddenly become noisy.
> A unique feature about go is that declaring a variable and never using it is an error and not a warning. Actually there aren't even compiler warnings. Either it is right or wrong.
Ahhh. Nice. That sounds like a feature I could get behind, too. Thank you.
Go is great for stuff like this, especially when part of an actual system.
That said, if this was just an adhoc job (to figure out which domains point to a specific IP address) you can just use "xargs -P" or GNU parallel and it becomes a pretty basic shell script, along the lines of: cat domains.txt | xargs -P 1000 -n 1 host
So what's the difference between xargs and parallel? I thought the point of parallel was that it was xargs with the addition of running things in parallel. But if xargs can do that already, is there any reason to use one over the other?
For what it's worth, GNU Parallel seems to be fairly new; at work I have an ubuntu distro from this year and it's not in the package repo yet. xargs is POSIX so you can expect it everywhere, though no parallel option is specified (merely encouraged).
In addition to the above, it's worth nothing that parallel also supports running the jobs on multiple remote systems via ssh, giving you an easy way to take advantage of a whole cluster.
To be fair to languages without such great parallelism support: you can do this using asynchronous/event-loop-based code because the parallelism will be limited by the nameserver anyway (the calling code does almost nothing, it mostly waits for the net / the nameserver).
> To be fair to languages without such great parallelism support: you can do this using asynchronous/event-loop-based code
Well you can do with callbacks anything that you can do with channels and goroutines. Go's primary appeal is that it makes concurrent[1] code easy to reason about, not that it enables you to do anything that you "couldn't do" otherwise.
Continuations are just GOTOs, and just like GOTOs, some people love them and some people hate them, but even people who like them can find them difficult in large doses. Goroutines and channels are nice, because they fit the structure of imperative code, whereas callbacks sort of resemble imperative code but "inside out".
from concurrent.futures import ThreadPoolExecutor as Pool
from socket import getaddrinfo
def lookup(domain):
try:
result = getaddrinfo(domain, 80)
exception Exception as e:
print("error %s -> %s" % (domain, e))
else:
print("done %s -> %s" % (domain, result))
nconcurrent = 20
with open('domains.txt') as file, Pool(nconcurrent) as pool:
for domain in (line.strip() for line in file):
pool.submit(lookup, domain)
To run multiple processes instead of threads, change the import to ProcessPoolExecutor.
To support multiprocessing.Pool (for Python 2 where concurrent.futures is not in stdlib), replace pool.submit() with pool.apply_async() and use contextlib.closing() around the Pool().
in the DNS case asynchronous event handling would be super easy to do. in python asyncore with something like dpkt to construct and read DNS lookups works like a champ, as does twisted. i did a simple async DNS resolver in pure python (asyncore, dpkt) and can sustain thousands of lookups a second. GNU adns also has bindings in various languages.
you can get Go's parallelisms via CSP (e.g. python-csp, ruby-csp) and replace a lot of fragile threading/parallel code with it. i've been doing that in lieu of learning Go (i know i know .. i'm lazy) and been very pleased.
anyhow, many ways to skin cats. those are just two or three.
You wouldn't do DNS lookups asynchronously in Go to begin with. Modeling concurrency of any sort in Go the way you would with an event loop is usually a code smell.
agreed, and i should have been more clear. the author's blog post states that one of the reasons he explored Go was that his initial sketch of a solution in his language of choice (ruby) was sequential. my point was that you can get performance in ruby with asynchronous operations, and that you don't need to go parallel for something like this.
then again if it was a matter of "well, i had a problem to solve and i had a desire to explore another language, so solving it in that new language was a way to explore" then the point is moot.
however agreed 100% or more on the "code smell" of doing an event loop in Go.
If I had to simultaneously generalize the idea and make it specific enough to explain it further, I'd say fiddly callback state machines are a code smell in Go.
Scala is really great for this too. The downside is the spin up time for the JVM, but the upside is that if you use SBT's script launcher it will compile and cache the script transparently for you , you can still pull in _any_ JVM dependency and you can run it just like a shell script. I needed to test for a port being open in parallel and it was a cake walk to use NIO's SocketSelector to do it reactivly.
I also include a list of 100 domains in a domains.txt if anyone wants to try for themselves.
require "socket"
require "celluloid"
class IPGetter
include Celluloid
def get(url)
Socket.getaddrinfo(url, "http")[0][2]
end
end
pool = IPGetter.pool(size: 100)
ips = {}
File.open("domains.txt").each_line do |line|
line.chomp!
ips[line] = pool.future.get(line)
end
ips.each do |url, ip_future|
puts "#{url} => #{ip_future.value}"
end
reply