Engine Yard Contest Challenge Phrase And Dictionary

By Engine Yard | July 20th, 2009 at 11:07AM

We know you’ve all been working diligently and waiting, so here it is!

The challenge phrase is:

I would much rather hear more about your whittling project

And here is the phrase dictionary.

Remember:

  • The cut-off time/date for the contest is 6pm PDT tomorrow July 21st
  • You must be following @engineyard for your entries to count
  • Max of five entries per person

While your CPU’s are burning through this, you might want to check out our developer webinar on the Engine Yard Cloud at 11am PDT tomorrow; after all, if you’re going to win the cloud credit, don’t you want to learn all about it? ;)

Good luck!

Note: If you liked this contest, stay tuned for our September contest! Similar levels of trickery + programming will prevail.

This website uses IntenseDebate comments, but they are not currently loaded because either your browser doesn't support JavaScript, or they didn't load fast enough.

54 Responses to “Engine Yard Contest Challenge Phrase And Dictionary”

  1. Spenser Spenser says:

    Having a blast with this contest! I suck at coding something good (using this contest to help me learn some C++) and only getting like 25k/s, but its fun nonetheless! Already got a 47! What is everyone else getting?

    • leahsilber Leah Silber says:

      Thanks for participating Spenser, glad you're having fun! Keep working on it — with so many possibilities, you've still got a great chance at doing really well :)

    • Jeremy Jeremy says:

      If it is code you wrote, 25k/s is great :)

      My best right after about 15 minutes is 41

      • Spenser Spenser says:

        Yeah, it definitely is my own code :P But its quite hackish… Only had like a day to work on it (Friends>Geeking it out) so my Hex-Binary conversion is an array of a-f && 0-9 :P

        20 minutes in I now have a 45 and a 46 (running it on a few machines)

    • Michael H Buselli Michael H Buselli says:

      Feel free to use my FastHammer library in exchange for a tweet out:

      <a href="http://github.com/cosine/fast_hammer/tree" target="_blank">http://github.com/cosine/fast_hammer/tree</a>  
      

      It's not the fastest algorithm out there (someone wrote something in CUDA that's way faster on nvidia GPUs) but it gets 725k/s on one core of my 2.2 GHz Core Duo.

      I predict a fair effort in this contest should yield about a 35, and the winner will likely have around 25. If there's some lucky dog with a giant farm of nvidia GPUs using the CUDA program, there might even be a 20. We may all be surprised, though!

      (btw, I'm not participating in the contest; I'm just watching from the sidelines.)

      • Spenser Spenser says:

        Would love to use that! But sadly it is made for c/ruby and I don't have the time (or brainpower atm) to implement it with my C++ code :( 725k/s sounds quite nice though! I am going to laugh if it is some newbie programmer on a dinky laptop like myself that wins it by pure luck though! Crosses fingers

  2. Tyler Smith Tyler Smith says:

    Same for me. This presented a perfect way for me to start learning C++. I know I won't win, but it was fun. I'm down to 46.

  3. Doug Doug says:

    Working on this has been fun (and a rush). I only wish I had more time to optimize and refine my crappy code.

    Right now my best score is 43. I'm counting on a little luck to get a good result, but I'm still skeptical that its even possible to do better than low 30s.

  4. pjonesdotca pjonesdotca says:

    43 does sound good. I'm using a custom GA and down only to 56 afte 40 minutes.

    • Doug Doug says:

      Using a GA sounds interesting. I'm curious as to how well the GA will be able to learn how to modify the input to get a better score.

      Unfortunately, I'm just using a dumb brute-force approach. Aside from luck, I think whoever has more compute power could get a higher score than I will generate.

      Doing a Twitter search, I can see some submissions in the mid-30s already!

      • pjonesdotca pjonesdotca says:

        Actually, I ignored the whole "change the capitalisation bit" as well as the random five character word at the end.

        I ended up with midpoint crossover and a mutation rate of .25

        • Thijs Thijs says:

          I'm using a GA too. Mine does everything, capitalization and the 5 random characters.

          Also random crossover and some different mutations, like change a letter's capitalization, change a word and change the random characters.

          Now I am at 53.

          • pjonesdotca pjonesdotca says:

            I wouldn't mind seeing your code on this. I did a quick refactor to get the capitalisation (String.swapcase) but eschewed the random characters.

            Lowest score for me so far is 47.

    • pjonesdotca pjonesdotca says:

      46 after about 44k generations

  5. pjonesdotca pjonesdotca says:

    43 does sound good. I'm using a custom GA and down only to 56 after 40 minutes.

  6. Tyler Smith Tyler Smith says:

    Anyone going to be publishing their code afterwords? Mine's not great but I might put it up anyways if anyone is interested.

    • Spenser Spenser says:

      I would love to see some code too! I will throw mine up somewhere when its all finished.

    • Jeremy Jeremy says:

      Tyler, I already have mine in Github..

      On my 2.4ghz Macbook pro it does roughly 1.3M per second per core. I didn't write it to be publicly usable though, so if you want to change the keywords you have to edit them by hand and recompile.

      http://github.com/JeremyChase/eypc/tree/master

      • Spenser Spenser says:

        Goodness! Your code went so fast on one of the builds that it crashed my terminal and I lost my three 44s :P That's one way to beat the competition I guess! But your code got a 44 almost instantly, so no biggie!

        • Jeremy Jeremy says:

          Oh… dude, I'm sorry.. You need to build with -DPRODUCTION or to pipe the output to less. Otherwise it makes a mess :)

          • Spenser Spenser says:

            Haha, no biggie. My code sucked compared to yours! Already at a 42 with yours, so I don't mind at all :P How long have you been coding? I am definitely going to learn a lot from your code!

    • Bob Denver Bob Denver says:

      The CUDA forum posted their code for it already. People are averaging about 200 megahashes/sec on a 280GTX.

    • leahsilber Leah Silber says:

      We're going to collect as many of them as we can after the contest and do a follow up post showing off some of the coolest implementations :) Will post the details in the next Contest-related blog post.

  7. bcl bcl says:

    I hit 43 @ try #18557, nothing better yet.

  8. Jordan Jordan says:

    47's on both cores (running two instances).. after about 5 minutes.. nothing new since.

  9. Jordan Jordan says:

    anyone attempt anything cooler than just guessing at random, like trying to find a disturbance vector or path through the 80 sha1 cycles that make the hash? I know its highly unlikely to find a collision, but finding a few local collisions in the first couple cycles would prolly help lead to some decent hamming numbers.

  10. dorito dorito says:

    Got a score of 36, running the CUDA cracker from the nvidia forum. 29 hours to go. If you have an NVidia GPU you can use it to crack too.

  11. sil3ntmac sil3ntmac says:

    Crowdsourcing brutefocing: help us win the challenge! Just browse to http://rustyengines.silentmac.com/jsengine.php and let it run in the background.

    Incredibly scalable, just open a new tab! :-P

  12. Egze Egze says:

    Cloud computing in the browser! http://contest.cligs.ee/
    Just open a tab and let JavaScript do its thing ;)

  13. Ivan Ivan says:

    I've got 34 almost immediatly with my ATI GPU cracker. But in fact it's just pure luck as I was stuck yesterday for hours on 36 with test data. I doubt anyone will go below 30.

  14. Joseph Joseph says:

    Thanks for creating a contest that has been so much fun to dig into! So far only 41 on my humble MB pro, but it is still early :)

  15. Morgan Morgan says:

    I'll probably publish my code in my snippets project on GitHub. It's not intended to be elegant, but it gets 1.7M SHA1+Hamming Distance checks per core per second on my Core 2 Duo. In an hour of testing with a random 1000 words against some of their test phrases, I got down to mid 30's. Given the search space drops off sharply, I'd bet that the winning distance will be in the high 20's, and I'd further guess that 5-10 people will hit the same distance.

    I've only got 4 cores running it, and I started late. So far I've found a 41.

    I'd have loved to see a 'worst' as well, as the results tend to form a bell curve, so worst should be equally hard as best. :)

  16. manitoba98 manitoba98 says:

    Only at 39 right now, but started late. Using libgcrypt's SHA-1 implementation and an unoptimized custom Hamming distance calculation. 11 cores total at work. What does "notift" mean? Was that supposed to be "notify"? If so, can I assume that "notift" is valid anyhow?

  17. Michael Mullany Michael Mullany says:

    You should assume that "notift" is valid if that was in the dictionary file (even it was a typo)

  18. Nathan Fritz Nathan Fritz says:

    I was just thinking about tomorrow afternoon…and I think the contest is broken…

    I'm not probably not going to submit an answer if other people have already submitted a lower score. Since searching for @engineyard will result in other submitted answers, this yields a rather concerning alternative method for winning.

    Write a sniper-bot that uses twitter's api and hashes all submitted results as of the last minute of the contest, then post the lowest submitted result as my own. I wouldn't be lowest, but the rules allow for a random drawing from the lowest score.

    I'm not going to do this, of course, but if I thought of it… In the best case, it means that everyone who realizes this will wait until the last possible second to submit to prevent theft of answers.

    The easiest way to deal with this is to only accept answers via some opaque method — probably email. But that defeats the purpose, doesn't it?

  19. lisa lisa says:

    My twitter account has the "protect my updates" flag set b/c I dislike getting the porn spam on twitter. When I tweet, will you guys still get it?

  20. Morgan Morgan says:

    Greetings, Props to the CUDA folks; there's some crazy high numbers there. I'm through 10 random phrases (no case distortion) times all printable combinations of 5 characters so far, and only down to 39.

    @Nathan I presume that given the search space, @engineyard will disallow later tweets of the identical string. That would be…bad, otherwise.

    – Morgan

  21. pjonesdotca pjonesdotca says:

    Generation 381: hamming distance 47

    Maybe the old GA is competitive after all…

  22. Nathan Fritz Nathan Fritz says:

    Right you are. I misunderstood the rule that stated that "if the same phrase is submitted multiple times, only the first will be used". I thought that was to prevent spamming by a given person. Sounds like it is actually there to prevent this very hack.

    NEVERMIND. NOT BROKEN! (sorry for the confusion…)

  23. Morgan Morgan says:

    Gah; I tried the CUDA approach and it hit 39 in about 2 minutes (my CPU-based code was at 39 after a few hours), and is down to 34 now. Wow… I need to learn more about CUDA now. :)

    Running around 83M hash/sec GPU, plus 4 cores at 1.7M/sec nets around 90M tests/sec. It's not a cluster of 20,000 machines like some at universities have set up, but it's not bad. And, most importantly, it's fun! :)

  24. We could really use some kind of verifier, so that if we get a result, we can plug our result into it and get a confirmed hamming distance before we submit.

    • Angus Angus says:

      I don't know, but I would expect that being able to verify your own entries is part of the contest. There's nothing arcane about the problem specification.

      If you get the expected results with the example sentence/dictionary in the original post, you should get the same with the real values.

      • Morgan Morgan says:

        I've been verifying the ones tweeted so far; a few I'm worried that they may have a bad algorithm because they tweeted entries with HD's of 61, 65, and 88, so I think an easy verifier wouldn't have been a bad thing as a sanity check so folks with bad algorithms don't go off into the weeds.

        With a contest like this that is essentially going to end up being a lottery with a pre-qualifying round, it's not about winning as much as the amazingly cool ways people find to get stuff done. In that respect, I'd hate for folks to be disappointed that they ran 30 hours with a bad program.

        I'm using four different SHA algorithms varying in efficiency, and I'm anal about checking 'best' strings against the safest and least efficient when a new best shows up. (I verify on a computer I'm not using for the contest because heavy CPU usage causes it to power off. :) )

        I agree that you should be okay if your test results jibe with the example results, but (for instance) some algorithms I've seen don't work on single-block keys (if your 12 words + 5 characters are < 64 bytes), so special cases of inputs could break while the test strings (which are all >64 characters) work.

        Thus I agree, a verifier would be a useful thing. Actually, MOST useful would be @engineyard auto-dming tweeters with the hamming distance calculated off of their tweet!

        That'd put a quick resolution to the problem of bad algorithms, as users would near-instantly see that there's a disconnect… Since you have to be following @engineyard in the first place, this would work out well.

  25. Kyle Daigle Kyle Daigle says:

    I built this site to find entries, hit the rules, and then show the lowest Hamming. I might turn on a verifier tomorrow I can manage.

    http://digitalworkboxlabs.com

  26. [...] one amused me: a cloud computing company had a contest that was meant to show off Ruby and cloud computing strengths. It was won by people brute-forcing [...]

  27. Jeremy Jeremy says:

    Well, don't get too excited, I just found a 'mundane' detail that I overlooked in my code. If you are actually running it you'll want to refetch and run again..

  28. Adrian Adrian says:

    I believe the words need to be space separated. Is that what you mean by mundane? I like the speed but I could not figure out how to change that.

  29. Jeremy Jeremy says:

    Adrian, the word spacing is fine.. I hadn't updated the key correctly. :)

Leave a Reply