• Sales: (866) 518-YARD

Programming Contest! Win iPhone 3GS & $2,000 Cloud Credit

By Engine Yard | July 14th, 2009 at 9:07AM

We’re kicking off a programming contest today that is sure to challenge even the most comp-sci heavy engineers out there, and we’re excited to see what you all come up with. With the difficulty of the challenge in mind, we’ve got some great prizes for the winner: an iPhone 3GS AND $2,000 of Cloud (Flex or Solo) credit. Now to jump right in and answer all your questions…

What is the contest?

You must tweet a sequence of twelve words that when hashed is bit-wise closest to a hash of a challenge phrase that we will announce the morning of July 20th.  All words must be from a 1,000 word dictionary we will provide at that same time. You are allowed to append up to five random characters to the end of your entry. We’re pretty confident you’ll want to write a program to automate the finding of close matches, so announcing this a week in advance should give you enough time to get your programs up and running.

How do I enter?

To enter the contest, follow @engineyard on Twitter and tweet your best candidate word sequence before 6pm Pacific Time on July 21st. This means you have about 30 hours between the availability of the challenge phrase and dictionary, and the entry submission cut-off time.

As previously mentioned, the winner of the contest will get an iPhone 3GS AND $2,000 of Cloud (Flex or Solo) credit). [You can also choose an alternative of load test credits worth $2,000 from browsermob]. Second prize is another iPhone 3GS.

So how does it work exactly? (Update! Example Now Clearer!)

Let’s take an example: a dictionary excerpt is: “Cloud, Ruby, DHH, one, eight, six, active, record, controller, data, rspec, mongrel, MySQL, postgresSQL, tokyo, MRI, jruby, rubinius, memcached, exception, metaprogramming, reflection.” Let’s also say we announce that the challenge phrase is I am not a big believer in fortune telling

To submit a contest entry, you would follow us on Twitter and tweet your best entry, e.g:
“@engineyard Rubinius one eight six active active record memcached exception JRuby DHH TOKYO sdfe3″

We will take the SHA-1 hash of this phrase: Rubinius one eight six active active record memcached exception JRuby DHH TOKYO sdfe3 which hashes to cd36b6dc8d4ed51b36dd7fce08f500392a7fb782 and compare it to the SHA-1 hash of I am not a big believer in fortune telling (which hashes to: 6cac827bae250971a8b1fb6e2a96676f7a077b60).

When we say “compare,” we mean that we will take the Hamming distance between the two hashes; the sum of the count of dissimilar bits when the hex hashes are converted to binary.

For example, here the binary of cd36...etc. is:
1100110100110110...etc.
and the binary of 6cac...etc. is:
0110110010101100...etc.

So calculating the Hamming distance is done as follows:
- first two bits (1 vs 0) don't match -> +1 to Hamming distance
- second two bits (1 vs. 1) do match -> no change to Hamming distance
- third two bits (0 vs. 1) don't match -> +1 to Hamming distance

etc.

In the case of the complete example hashes above, the Hamming difference is 74. If you are the submitter with the lowest Hamming distance, you win the prizes – it’s that simple ;)

Extra Prize: If you manage to achieve a Hamming distance of zero, we’ll throw in a MacBook Pro: you are either highly improbable, have mad algorithm cracking skills or you work for the NSA, any of which makes you cool enough to deserve random goodness and recognition. Note: we know the probability of anyone getting to a zero Hamming distance is truly vanishingly small, but we wanted to acknowledge anyone making it there!

There are some obvious brute force strategies to win this prize. We’d suggest building a really fast word permutation algorithm, and finding a fast SHA-1 hash algorithm. Then find a way to get your hands on a whole bunch of computation for the 24 hours that the contest will run (hmm… perhaps the cloud would be useful).

More details and conditions:

  1. Only US ASCII printable characters in your custom five character string please (we really want to avoid Unicode rat-holes)
  2. The words in your string must be single space separated; no other punctuation is allowed
  3. Spamming new entries as you find better ones is a fail whale; limit your contest tweets to a maximum of five
  4. The dictionary will be a Macintosh TextEdit file in RTF format, with each word on a separate line, and no white-space.
  5. In the case of a tied Hamming distance (entirely possible), the winner will be chosen by lottery among people with the best distance
  6. You may permute capitalization for the dictionary words (i.e. you may use Ruby, rUby, RUBY, and RUBy)
  7. Please scrub your custom five character string for the five words you can’t say on television
  8. If the exact same string is submitted multiple times, only the first submission counts
  9. Employees and contractors of Engine Yard, and their family members are not eligible

Okay, so maybe “It’s that simple” was a bit, well, over-simplified, but we’re confident we’ll get some great submissions. The Ruby community is nothing if not persistent, creative and intelligent — so show us what you’ve got!

Miscellaneous clarifications

1) When you convert the hexadecimal hash to binary — you need to convert the hex to the equivalent number in binary e.g. “c” = “12″ in decimal = “1100″ in (big endian) binary. Make sure that your binary conversion function is NOT treating “c” as ASCII letter “c” — which gets you the completely different answer of “63″ in decimal or “01100011″ in binary.
2) Be careful if you end up using string hashing functions in C. Remember that C strings are null terminated, and from reports we’re getting, at least some string functions out there take the Null string terminator ( “”) as an input to the hash function. This will get you in trouble because we will not be including a null terminator when we calculate the hash of your tweets. Naturally, we will treat your tweet as a (sane) Ruby string.

Another example for folks

People have asked for another example to test their hashing algorithms so here it is:

Example challenge phrase #2:
What you write today will become legacy
which hashes to:
7f83e6b422af5ca4e3112486aea3e702e98a894e or in hex to binary (big-endian):
0111 1111 1000 0011 1110 0110 1011 0100 0010 0010 1010 1111 0101 1100 1010 0100 1110 0011 0001 0001 0010 0100 1000 0110 1010 1110 1010 0011 1110 0111 0000 0010 1110 1001 1000 1010 1000 1001 0100 1110

Example contest entry tweet #2:
@engineyard RuBy one eight six rspec mongrel MRI jruby jruby memcached exception reflection utf8E

We will take the hash of RuBy one eight six rspec mongrel MRI jruby jruby memcached exception reflection utf8E
which hashes to:
075a32acb1816b570607189475ebbbaccce8b79f or in hex to binary (big-endian):
0000 0111 0101 1010 0011 0010 1010 1100 1011 0001 1000 0001 0110 1011 0101 0111 0000 0110 0000 0111 0001 1000 1001 0100 0111 0101 1110 1011 1011 1011 1010 1100 1100 1100 1110 1000 1011 0111 1001 1111

The hamming distance between these two hashes (again, remember treating the hash as HEXADECIMAL, NOT ASCII) is 80.

Example dictionary file: sampledictionary

Share this post:
  • email
  • Digg
  • del.icio.us
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • Twitter
  • Google Bookmarks
  • Facebook
  • LinkedIn
Popularity: 4% |
Rate this post: 1 Star2 Stars3 Stars4 Stars5 Stars
Loading ... Loading ...

This website uses IntenseDebate comments, but they are not currently loaded because either your browser doesn't support JavaScript, or they didn't load fast enough.

157 Responses to “Programming Contest! Win iPhone 3GS & $2,000 Cloud Credit”

  1. Jochem Jochem says:

    For the people reporting that the EngineYard's hashes or binaries are incorrect, first try this in Ruby: http://pastie.org/547790
    The expression results match the examples of EngineYard.

  2. tubby tubby says:

    What's the lowest score anyone has gotten in testing? Best I've gotten is 48 after 12 hours of testing? Anyone lower than that?

    • Angus Angus says:

      41 after about 30 minutes with a single thread in C, checking ~2.5 million hashes/second.

      Improvements in "best value" seem to drop off pretty quickly after mid-40s, I think that's the threshold between "easy" guesses and exponentially harder ones.

      I blogged some observations and comments from messing around with this: http://projectgus.com/?p=38

      • tubby tubby says:

        Thanks! I got 41 now as well. I enjoyed reading your article and agree 100% with what you wrote. The winner will have the most CPU horsepower. I'm doing C++ on an 2.5 year old AMD dual core box. I can do about 2.5 million a second. Someone who can do 20 million or more a second will blow me out of the water. I still might get lucky though… as it is a lottery ;)

      • MarkusQ MarkusQ says:

        Without algorithmic improvement each successive bit takes ~3.25 times as long, so to get two more bits (on average) you'll need an order of magnitude more resources. That means it's still got a large element of chance unless you do something tricky.

    • eric eric says:

      41 here also, after ~45 mins or so.

    • Josh Josh says:

      I'm sitting at 28 as my lowest, but thats using some random phrases and alternate words for everything and a clever algorithm, I figured it was all random pretty much anyhow. I'm wondering if Amatch::Hamming#match is returning the proper results. No, I don't plan on entering but it was fun exercise to write in about 5 minutes.

  3. [...] first read about the Engineyard Programming Contest yesterday and I thought it was a silly contest, winnable only through the application of raw brute [...]

  4. antirez antirez says:

    "memcached DHH eight Rubinius record memcached TOKIO Rubinius TOKIO DHH TOKIO eight IDIbc" with hamming distance of 36 against "I am not a big believer in fortune telling" is my best result so far. But never run my program for more than 20 minutes, just trying to get it faster for now.

    I'm using a C program with different optimizations. Hint: if you use in a smart way the fact you can postfix the words with the 5 char random string you'll get much more speed (2x at least).

    • tubby tubby says:

      Thanks for sharing. I've hit 41 now myself. Best so far for me. After about and hour. I think it's just blind luck. Speed and time help of course… may get luckier, faster. I'm a bit disappointed. I think there will be a lot of tied submissions… no clear winner.

    • MarkusQ MarkusQ says:

      I think several people here have independently found the optimization you allude to (e.g. see Angus's blog). There are also at least two other possibilities (see the questions being asked) but at this moment I've only got the "69 character answer" hack working myself.

      • Angus Angus says:

        FWIW, the optimization doesn’t need to use exactly 69 characters. From what I understand, and from a quick dip in the Nettle source to verify, SHA-1 only works on 64-byte blocks, so 69 characters get padded to 128 before the algorithm generates a digest. AFAIK there’s no computational advantage hashing padding instead of real data. However, hashing all 128 bytes each time is clearly slower than hashing 64 bytes once and then doing just the second 64 in an inner loop.

        • antirez antirez says:

          I'm using the same optimization indeed, but without to care about the 69 characters. If you save the state just before the final 5 chars update() call you can restore the structure and continue the computation.

          Of course it's wiser to avoid to go over the 1024 bits in total (128 bytes).

        • MarkusQ MarkusQ says:

          That optimization doesn't _need_ to use exactly 69, but if it does (or if it uses 68) the final 512 bits are knowable in advance and the second pass reduces to a pure function f(u160,u40) or f(u160,u32). If we are allowed to use repeated words in the preamble this reduces to g(u16,u32) or so, which (if we had the NSA's resources, and yet still cared about this contest) would be a solution right there.

          I don't say this will make much difference here(in fact, I suspect it won't), but in a real world problem being able to factor the space like that can be a huge win.

          – MarkusQ

  5. Adam Adam says:

    you don't actually need to get the binary value to get the hamming distance, you just need the integer value of the hash. (depending of course on how your calculate the hamming distance).

  6. Jordan Jordan says:

    So really.. getting a hamming of 0 would be so close to impossible its really just whoever randomly gets closest. Due ot the avalanche effect… "blah blah blah blah blah" has just as much chance of being a close hash as "holy cow holy cow holy cow holy cow" and "blAh blah blah blah"
    could make something totally different. The current complexity for finding a collision with sha1 is something like 2^52, but that is prolly assuming that any number/letter input combination could be used. It would be allot harder to find a combination that causes a collision with only 1000 words to choose from, if there even is a combination that could create a collision. It will be interesting to see how this turns out though :)

  7. N_B N_B says:

    I have determined the optimal algorithm for this contest:
    * Write a bot that constantly searches twitter for all posts @engineyard
    * Calculate the SHA1 of each and the hamming distance from the phrase
    * With only a few minutes to go, create five posts, each with the winningest phrase

    Unless someone posts a better answer in the closing five minutes or so, this will guarantee that you get in the lottery for the grand prize.

    And you scrubs are screwing around with dictionary parsing…heh.

    • Bob Bobson Bob Bobson says:

      "If the exact same string is submitted multiple times, only the first submission counts"

  8. antirez antirez says:

    My best entry so far, hamming distance=34 against the test entry: "This problems Often to with python servers client fixing Often to service 1YFWA", computed in around 10 hours using a single PC with four cores. I can compute 3.5 million hashes per second so basically this is the result of around 14M/s power. I hope to find at least another box before of the real entry.

  9. antirez antirez says:

    I mean "3.5 million hashes per second per core".

  10. MarkusQ MarkusQ says:

    My call four days before the contest: after doing some commute time mental math on it, I've concluded that I'll hit hamming distance 38±1 and the winner will be at 36±2. Anything <30 will amaze me.

    – MarkusQ

  11. antirez antirez says:

    I did some math, and I think this is the probability of two random strings having the hamming dinstance of 'D':

    (1/2^160)*(160!/((160-d)!*d!))

    This is why. For example imagine we want to compute the probability of an hamming distance of 2. I want the first bit to be equal, that occurs with a probability of .5, the seocond bit to be equal to, again .5, the others to not be equal, that is .5 each. Basically for any given pattern of equal/not equal we want there is a probability of .5^160 for a specific arrangement of equal/not-equal of the bits.

    But… with an hamming distance of 2 it is required that 2 are equals and two are not, in any kind of order. How many ways there are to arrange 160 items, two black and 158 white? 160!/(158!*2!), so I actually have to multiply the probability of a single arrangement for the number of possible arrangements.

    Ok, what are the practical implications of this? That with an optimized C program you can compute around 3.25 million sha1 hashes per second per core (this is what I got with my implementation). So if you have a server farm with 250 computers that happen to be quad cores, you need the following amount of hours to get a given hamming distance:

    6 hours for an h.d. of 33
    26 hours for an h.d. of 32
    108 hours for an h.d. of 31

    it looks like that university clusters can get us a solution with an HD of 32 or 31 with a bit of luck.

    People having just ten quadcore boxes can expect to reach an HD of 35 in 12 hours.

    • MarkusQ MarkusQ says:

      That was roughly my reasoning the other day (see my point in the middle of page 2), but I also applied a guesstimate of how much resources people will actually be willing to throw at it. People with 250 quad core servers generally have better things to do with them.

      – MarkusQ

  12. Henry Henry says:

    A 20,000 CPU Amazon EC2 cluster for one hour costs $2000.

  13. rkb rkb says:

    What time on the morning of the 20th? thanks!

  14. hosiawak hosiawak says:

    @N_B: I believe that the entries that enter the lottery are the ones with THE SAME HAMMING DISTANCE not the same text content. In an unlikely event of someone posting exactly the same message as someone else I think only whoever posted it first should be considered. Otherwise it doesn't make sense due to the cheat you just described.

  15. David Balatero David Balatero says:

    Is there a reason the dictionary is in a ghetto-ass RTF format? Why not plain-text…

  16. Tony Perrie Tony Perrie says:

    Assuming that printable ASCII charcters excludes space in the random string. So, printable chars are decimal 33 to 126?

  17. How many words long has to be the string? Can I repeat the dictionary words ?
    For example, supposing "one" is a word from the dictionary, can my submission be:

    "one one one"

    • remi remi says:

      Good question! EngineYard … do you have an answer?

      "a sequence of twelve words" … "from a 1,000 word dictionary we will provide"

      The rules don't seem to say, one way or another … :/

  18. And can the five random chars at the end of the string be numbers only?

    like:
    "My sentence is really random 12345"

    • remi remi says:

      Any US ASCII prinatable characters are allowed:

      "Only US ASCII printable characters in your custom five character string please (we really want to avoid Unicode rat-holes)"

  19. tubby tubby says:

    There is some cuda code and binaries in the Nvidia forum. So, anyone with a 8xxx or newer Nvidia graphics card can beat the pants of everyone else. No programing experience or clever sha1 tricks required… http://forums.nvidia.com/index.php?showtopic=1023...

    Well, there goes my CPU based C++ code. I got up to 2.5 mill a second, but that won't stand up to a a newer GPU and CUDA.

    • Bob Denver Bob Denver says:

      That’s pretty impressive. It’s ready to attack the contest now, so I think everyone with an NVidia board is just going to use it to beat all of us. 180M/sec is so much faster than my 2M/sec CPU, it’s pointless to compete.

      • Angus Angus says:

        There is one last chance for victory:

        1) Create web page with Javascript implementation of problem, and AJAX twitter client. Looking at Sunspider, should get about 100 hashes/sec if you're lucky.

        2) Post link everywhere with "Win a free iPhone, just keep this page open for 30 hours!"

        3) Go viral, get billions of people opening the page.

        4) Pocket your cool $2000 of cloud credit.

        ;)

      • Ivan Ivan says:

        It's not pointless actually. It's impossible to cover whole range (2^80) no matter how fast you can do computations, so luck factor is a key here. For example, I've atm about 600M hashes/sec with my ATI GPUs but after several hours I'm still on length == 36 while somebody above got 34 with CPU only being 100x times slower.

  20. duderino duderino says:

    Does anyone know what time the contest will start?

  21. can I only have 2 random chars at the end of my string? or if I opt by adding random chars, I have to add five?

    • tubby tubby says:

      My understanding is you may have 0 – 5 appended to the end of the twelve words.

      So, having twelve words with none or one or two or three or four or five chars appended to the end is fine. Six and above are bad. Got it?

  22. bantic bantic says:

    When are you going to post the challenge sha1?

  23. nickfrench engineyard says:

    We're posting the challenge at noon pacific time– as we mentioned in the blog post, you will have 30 hours to search before the 6pm cutoff tomorrow.

  24. [...] a few blades and other random desktop machines we had lying around. We’re throwing it at the Engine Yard Contest just to try out Pydra. The trial attempts have been good, but this was more about kicking the tires [...]

  25. [...] other day Engine Yard started an interesting contest (which is probably over, by the time that you’re reading [...]

  26. [...] I fail to submit my phrase to engineyard contest 23 07 2009 There is a programming contest which is conducted by engineyard. The contest is interesting so I planned to participate. I wrote [...]

  27. [...] other day Engine Yard started an interesting contest (which is probably over, by the time that you’re reading [...]

  28. [...] July 14th, hosting company Engine Yard posted a programming contest that was (presumably) to promote their services and generally create PR. First and second prize [...]