the google “benchmarks” are pretty cooked and nvidia posted their press release at some point.
Bottom line is that P100 is pretty powerful and for a small time researcher/enthusiast/wise guy you ca be very competitive with just 4x1080tis and 32 threads which you can build for around 5k-6k depending on how lucky you are.
I built something like this just2 months ago and the prices have gone down like 10% over the black friday
I put in the amd 1950x into an asus zenith motherboard with 32gigs of ram (dont’ need more then this and 64 would not post at speeds I paid for) with all m.2 nvme storage and 4x 1080tis for 8K canadian. That is $6k usd with all premium components, liquid cooling and it runs peep quiet on my desktop. FYI this is the original devbox idea that nvidia was selling for 15k now with 32 threads to boot @4ghz /13k cuda cores with peak 100% load around 1200 watts.
This is what you should get unless you are an institution/agency/have grant money. As an fyi I had to remove the backplates (both back of the card and the exhaust grills off the cards so they can run 100%x4 without overheating with stock cooling otherwise. Running at 100% the 4 GTXs will pull about 1000 watts from the wall and you will be able to dry your hair behind the box. If this becomes an issue (which it won’t as currently it is really hard to have 100% load on gpu for more than 1-2 minutes) you can always water cool the gpus or buy the ones with that stuff pre-installed. The thing about watercooling is counter-intuitive as air cooling actually is better at cooling down a system once it heats up periodically.
Linus did a video on linus tech tips revealing that they setup the latest canadian supercomputer similarly to this going with amd epyc (thredripperx2 )and vega cards. This will be the way to go for next 4-5 years until we get graph processors. Remember the GPU acceleration of general computing algos is an afterthought and hence the primary problem will remain (on our PCs which don’t have the fancy NVLink/high bus speeds/capacities) . Google is doing good things overall, but I would not go drooling over the TPU tech as the benchmark is better than it seems and really the silent benchmarks Amazon MXnet released are imho more impressive (linear acceleration upto 256x gpus is the punchline).
Anyhow, back to trying to figure out how to sample policies in my stupid self-play toy (it woulda been nice if the open-ai wizards that wrote the https://arxiv.org/pdf/1710.03748.pdf could tell me if they considered re-training previous policies and then sampling them randomly as that would lead to a monte-carlo minimization type self-play policy )