We know you’ve all been working diligently and waiting, so here it is!
The challenge phrase is:
I would much rather hear more about your whittling project
And here is the phrase dictionary.
Remember:
- The cut-off time/date for the contest is 6pm PDT tomorrow July 21st
- You must be following @engineyard for your entries to count
- Max of five entries per person
While your CPU’s are burning through this, you might want to check out our developer webinar on the Engine Yard Cloud at 11am PDT tomorrow; after all, if you’re going to win the cloud credit, don’t you want to learn all about it? ;)
Good luck!
Note: If you liked this contest, stay tuned for our September contest! Similar levels of trickery + programming will prevail.

Having a blast with this contest! I suck at coding something good (using this contest to help me learn some C++) and only getting like 25k/s, but its fun nonetheless! Already got a 47! What is everyone else getting?
Thanks for participating Spenser, glad you're having fun! Keep working on it — with so many possibilities, you've still got a great chance at doing really well :)
If it is code you wrote, 25k/s is great :)
My best right after about 15 minutes is 41
Yeah, it definitely is my own code :P But its quite hackish… Only had like a day to work on it (Friends>Geeking it out) so my Hex-Binary conversion is an array of a-f && 0-9 :P
20 minutes in I now have a 45 and a 46 (running it on a few machines)
Feel free to use my FastHammer library in exchange for a tweet out:
It's not the fastest algorithm out there (someone wrote something in CUDA that's way faster on nvidia GPUs) but it gets 725k/s on one core of my 2.2 GHz Core Duo.
I predict a fair effort in this contest should yield about a 35, and the winner will likely have around 25. If there's some lucky dog with a giant farm of nvidia GPUs using the CUDA program, there might even be a 20. We may all be surprised, though!
(btw, I'm not participating in the contest; I'm just watching from the sidelines.)
Would love to use that! But sadly it is made for c/ruby and I don't have the time (or brainpower atm) to implement it with my C++ code :( 725k/s sounds quite nice though! I am going to laugh if it is some newbie programmer on a dinky laptop like myself that wins it by pure luck though! Crosses fingers
Same for me. This presented a perfect way for me to start learning C++. I know I won't win, but it was fun. I'm down to 46.
Have some faith! :D
Working on this has been fun (and a rush). I only wish I had more time to optimize and refine my crappy code.
Right now my best score is 43. I'm counting on a little luck to get a good result, but I'm still skeptical that its even possible to do better than low 30s.
Glad you're having fun! 43 sounds pretty darned good, considering that you've got another 29 hours to go!
From my testing, getting down to ~40 happens quickly. Every point after that is really difficult!
43 does sound good. I'm using a custom GA and down only to 56 afte 40 minutes.
Using a GA sounds interesting. I'm curious as to how well the GA will be able to learn how to modify the input to get a better score.
Unfortunately, I'm just using a dumb brute-force approach. Aside from luck, I think whoever has more compute power could get a higher score than I will generate.
Doing a Twitter search, I can see some submissions in the mid-30s already!
Actually, I ignored the whole "change the capitalisation bit" as well as the random five character word at the end.
I ended up with midpoint crossover and a mutation rate of .25
I'm using a GA too. Mine does everything, capitalization and the 5 random characters.
Also random crossover and some different mutations, like change a letter's capitalization, change a word and change the random characters.
Now I am at 53.
I wouldn't mind seeing your code on this. I did a quick refactor to get the capitalisation (String.swapcase) but eschewed the random characters.
Lowest score for me so far is 47.
46 after about 44k generations
43 does sound good. I'm using a custom GA and down only to 56 after 40 minutes.
Anyone going to be publishing their code afterwords? Mine's not great but I might put it up anyways if anyone is interested.
I would love to see some code too! I will throw mine up somewhere when its all finished.
Tyler, I already have mine in Github..
On my 2.4ghz Macbook pro it does roughly 1.3M per second per core. I didn't write it to be publicly usable though, so if you want to change the keywords you have to edit them by hand and recompile.
http://github.com/JeremyChase/eypc/tree/master
Goodness! Your code went so fast on one of the builds that it crashed my terminal and I lost my three 44s :P That's one way to beat the competition I guess! But your code got a 44 almost instantly, so no biggie!
Oh… dude, I'm sorry.. You need to build with -DPRODUCTION or to pipe the output to less. Otherwise it makes a mess :)
Haha, no biggie. My code sucked compared to yours! Already at a 42 with yours, so I don't mind at all :P How long have you been coding? I am definitely going to learn a lot from your code!
The CUDA forum posted their code for it already. People are averaging about 200 megahashes/sec on a 280GTX.
We're going to collect as many of them as we can after the contest and do a follow up post showing off some of the coolest implementations :) Will post the details in the next Contest-related blog post.
I hit 43 @ try #18557, nothing better yet.
47's on both cores (running two instances).. after about 5 minutes.. nothing new since.
anyone attempt anything cooler than just guessing at random, like trying to find a disturbance vector or path through the 80 sha1 cycles that make the hash? I know its highly unlikely to find a collision, but finding a few local collisions in the first couple cycles would prolly help lead to some decent hamming numbers.
Got a score of 36, running the CUDA cracker from the nvidia forum. 29 hours to go. If you have an NVidia GPU you can use it to crack too.
Crowdsourcing brutefocing: help us win the challenge! Just browse to http://rustyengines.silentmac.com/jsengine.php and let it run in the background.
Incredibly scalable, just open a new tab! :-P
Cloud computing in the browser! http://contest.cligs.ee/
Just open a tab and let JavaScript do its thing ;)
I've got 34 almost immediatly with my ATI GPU cracker. But in fact it's just pure luck as I was stuck yesterday for hours on 36 with test data. I doubt anyone will go below 30.
Thanks for creating a contest that has been so much fun to dig into! So far only 41 on my humble MB pro, but it is still early :)
I'll probably publish my code in my snippets project on GitHub. It's not intended to be elegant, but it gets 1.7M SHA1+Hamming Distance checks per core per second on my Core 2 Duo. In an hour of testing with a random 1000 words against some of their test phrases, I got down to mid 30's. Given the search space drops off sharply, I'd bet that the winning distance will be in the high 20's, and I'd further guess that 5-10 people will hit the same distance.
I've only got 4 cores running it, and I started late. So far I've found a 41.
I'd have loved to see a 'worst' as well, as the results tend to form a bell curve, so worst should be equally hard as best. :)
Edit: 40, about a second after I pressed submit.
Only at 39 right now, but started late. Using libgcrypt's SHA-1 implementation and an unoptimized custom Hamming distance calculation. 11 cores total at work. What does "notift" mean? Was that supposed to be "notify"? If so, can I assume that "notift" is valid anyhow?
You should assume that "notift" is valid if that was in the dictionary file (even it was a typo)
I was just thinking about tomorrow afternoon…and I think the contest is broken…
I'm not probably not going to submit an answer if other people have already submitted a lower score. Since searching for @engineyard will result in other submitted answers, this yields a rather concerning alternative method for winning.
Write a sniper-bot that uses twitter's api and hashes all submitted results as of the last minute of the contest, then post the lowest submitted result as my own. I wouldn't be lowest, but the rules allow for a random drawing from the lowest score.
I'm not going to do this, of course, but if I thought of it… In the best case, it means that everyone who realizes this will wait until the last possible second to submit to prevent theft of answers.
The easiest way to deal with this is to only accept answers via some opaque method — probably email. But that defeats the purpose, doesn't it?
Hi Nathan,
Thanks for your comment :) In the case of the same hash being submitted multiple times, only the first submission counts. This should deal with your concern.
Thanks!
Yup. And the rules said that to begin with. My mistake. Sorry :-)
My twitter account has the "protect my updates" flag set b/c I dislike getting the porn spam on twitter. When I tweet, will you guys still get it?
Greetings, Props to the CUDA folks; there's some crazy high numbers there. I'm through 10 random phrases (no case distortion) times all printable combinations of 5 characters so far, and only down to 39.
@Nathan I presume that given the search space, @engineyard will disallow later tweets of the identical string. That would be…bad, otherwise.
– Morgan
Generation 381: hamming distance 47
Maybe the old GA is competitive after all…
Right you are. I misunderstood the rule that stated that "if the same phrase is submitted multiple times, only the first will be used". I thought that was to prevent spamming by a given person. Sounds like it is actually there to prevent this very hack.
NEVERMIND. NOT BROKEN! (sorry for the confusion…)
Gah; I tried the CUDA approach and it hit 39 in about 2 minutes (my CPU-based code was at 39 after a few hours), and is down to 34 now. Wow… I need to learn more about CUDA now. :)
Running around 83M hash/sec GPU, plus 4 cores at 1.7M/sec nets around 90M tests/sec. It's not a cluster of 20,000 machines like some at universities have set up, but it's not bad. And, most importantly, it's fun! :)
We could really use some kind of verifier, so that if we get a result, we can plug our result into it and get a confirmed hamming distance before we submit.
I don't know, but I would expect that being able to verify your own entries is part of the contest. There's nothing arcane about the problem specification.
If you get the expected results with the example sentence/dictionary in the original post, you should get the same with the real values.
I've been verifying the ones tweeted so far; a few I'm worried that they may have a bad algorithm because they tweeted entries with HD's of 61, 65, and 88, so I think an easy verifier wouldn't have been a bad thing as a sanity check so folks with bad algorithms don't go off into the weeds.
With a contest like this that is essentially going to end up being a lottery with a pre-qualifying round, it's not about winning as much as the amazingly cool ways people find to get stuff done. In that respect, I'd hate for folks to be disappointed that they ran 30 hours with a bad program.
I'm using four different SHA algorithms varying in efficiency, and I'm anal about checking 'best' strings against the safest and least efficient when a new best shows up. (I verify on a computer I'm not using for the contest because heavy CPU usage causes it to power off. :) )
I agree that you should be okay if your test results jibe with the example results, but (for instance) some algorithms I've seen don't work on single-block keys (if your 12 words + 5 characters are < 64 bytes), so special cases of inputs could break while the test strings (which are all >64 characters) work.
Thus I agree, a verifier would be a useful thing. Actually, MOST useful would be @engineyard auto-dming tweeters with the hamming distance calculated off of their tweet!
That'd put a quick resolution to the problem of bad algorithms, as users would near-instantly see that there's a disconnect… Since you have to be following @engineyard in the first place, this would work out well.
I built this site to find entries, hit the rules, and then show the lowest Hamming. I might turn on a verifier tomorrow I can manage.
http://digitalworkboxlabs.com
[...] one amused me: a cloud computing company had a contest that was meant to show off Ruby and cloud computing strengths. It was won by people brute-forcing [...]
Well, don't get too excited, I just found a 'mundane' detail that I overlooked in my code. If you are actually running it you'll want to refetch and run again..
I believe the words need to be space separated. Is that what you mean by mundane? I like the speed but I could not figure out how to change that.
Adrian, the word spacing is fine.. I hadn't updated the key correctly. :)