Blog

Programming Contest! Win iPhone 3GS & $2,000 Cloud Credit

By | July 14th, 2009 at 9:07AM

We’re kicking off a programming contest today that is sure to challenge even the most comp-sci heavy engineers out there, and we’re excited to see what you all come up with. With the difficulty of the challenge in mind, we’ve got some great prizes for the winner: an iPhone 3GS AND $2,000 of Cloud (Flex or Solo) credit. Now to jump right in and answer all your questions…

What is the contest?

You must tweet a sequence of twelve words that when hashed is bit-wise closest to a hash of a challenge phrase that we will announce the morning of July 20th.  All words must be from a 1,000 word dictionary we will provide at that same time. You are allowed to append up to five random characters to the end of your entry. We’re pretty confident you’ll want to write a program to automate the finding of close matches, so announcing this a week in advance should give you enough time to get your programs up and running.

How do I enter?

To enter the contest, follow @engineyard on Twitter and tweet your best candidate word sequence before 6pm Pacific Time on July 21st. This means you have about 30 hours between the availability of the challenge phrase and dictionary, and the entry submission cut-off time.

As previously mentioned, the winner of the contest will get an iPhone 3GS AND $2,000 of Cloud (Flex or Solo) credit). [You can also choose an alternative of load test credits worth $2,000 from browsermob]. Second prize is another iPhone 3GS.

So how does it work exactly? (Update! Example Now Clearer!)

Let’s take an example: a dictionary excerpt is: “Cloud, Ruby, DHH, one, eight, six, active, record, controller, data, rspec, mongrel, MySQL, postgresSQL, tokyo, MRI, jruby, rubinius, memcached, exception, metaprogramming, reflection.” Let’s also say we announce that the challenge phrase is I am not a big believer in fortune telling

To submit a contest entry, you would follow us on Twitter and tweet your best entry, e.g:
“@engineyard Rubinius one eight six active active record memcached exception JRuby DHH TOKYO sdfe3″

We will take the SHA-1 hash of this phrase: Rubinius one eight six active active record memcached exception JRuby DHH TOKYO sdfe3 which hashes to cd36b6dc8d4ed51b36dd7fce08f500392a7fb782 and compare it to the SHA-1 hash of I am not a big believer in fortune telling (which hashes to: 6cac827bae250971a8b1fb6e2a96676f7a077b60).

When we say “compare,” we mean that we will take the Hamming distance between the two hashes; the sum of the count of dissimilar bits when the hex hashes are converted to binary.

For example, here the binary of cd36...etc. is:
1100110100110110...etc.
and the binary of 6cac...etc. is:
0110110010101100...etc.

So calculating the Hamming distance is done as follows:
- first two bits (1 vs 0) don't match -> +1 to Hamming distance
- second two bits (1 vs. 1) do match -> no change to Hamming distance
- third two bits (0 vs. 1) don't match -> +1 to Hamming distance

etc.

In the case of the complete example hashes above, the Hamming difference is 74. If you are the submitter with the lowest Hamming distance, you win the prizes – it’s that simple ;)

Extra Prize: If you manage to achieve a Hamming distance of zero, we’ll throw in a MacBook Pro: you are either highly improbable, have mad algorithm cracking skills or you work for the NSA, any of which makes you cool enough to deserve random goodness and recognition. Note: we know the probability of anyone getting to a zero Hamming distance is truly vanishingly small, but we wanted to acknowledge anyone making it there!

There are some obvious brute force strategies to win this prize. We’d suggest building a really fast word permutation algorithm, and finding a fast SHA-1 hash algorithm. Then find a way to get your hands on a whole bunch of computation for the 24 hours that the contest will run (hmm… perhaps the cloud would be useful).

More details and conditions:

  1. Only US ASCII printable characters in your custom five character string please (we really want to avoid Unicode rat-holes)
  2. The words in your string must be single space separated; no other punctuation is allowed
  3. Spamming new entries as you find better ones is a fail whale; limit your contest tweets to a maximum of five
  4. The dictionary will be a Macintosh TextEdit file in RTF format, with each word on a separate line, and no white-space.
  5. In the case of a tied Hamming distance (entirely possible), the winner will be chosen by lottery among people with the best distance
  6. You may permute capitalization for the dictionary words (i.e. you may use Ruby, rUby, RUBY, and RUBy)
  7. Please scrub your custom five character string for the five words you can’t say on television
  8. If the exact same string is submitted multiple times, only the first submission counts
  9. Employees and contractors of Engine Yard, and their family members are not eligible

Okay, so maybe “It’s that simple” was a bit, well, over-simplified, but we’re confident we’ll get some great submissions. The Ruby community is nothing if not persistent, creative and intelligent — so show us what you’ve got!

Miscellaneous clarifications

1) When you convert the hexadecimal hash to binary — you need to convert the hex to the equivalent number in binary e.g. “c” = “12″ in decimal = “1100″ in (big endian) binary. Make sure that your binary conversion function is NOT treating “c” as ASCII letter “c” — which gets you the completely different answer of “63″ in decimal or “01100011″ in binary.
2) Be careful if you end up using string hashing functions in C. Remember that C strings are null terminated, and from reports we’re getting, at least some string functions out there take the Null string terminator ( “�”) as an input to the hash function. This will get you in trouble because we will not be including a null terminator when we calculate the hash of your tweets. Naturally, we will treat your tweet as a (sane) Ruby string.

Another example for folks

People have asked for another example to test their hashing algorithms so here it is:

Example challenge phrase #2:
What you write today will become legacy
which hashes to:
7f83e6b422af5ca4e3112486aea3e702e98a894e or in hex to binary (big-endian):
0111 1111 1000 0011 1110 0110 1011 0100 0010 0010 1010 1111 0101 1100 1010 0100 1110 0011 0001 0001 0010 0100 1000 0110 1010 1110 1010 0011 1110 0111 0000 0010 1110 1001 1000 1010 1000 1001 0100 1110

Example contest entry tweet #2:
@engineyard RuBy one eight six rspec mongrel MRI jruby jruby memcached exception reflection utf8E

We will take the hash of RuBy one eight six rspec mongrel MRI jruby jruby memcached exception reflection utf8E
which hashes to:
075a32acb1816b570607189475ebbbaccce8b79f or in hex to binary (big-endian):
0000 0111 0101 1010 0011 0010 1010 1100 1011 0001 1000 0001 0110 1011 0101 0111 0000 0110 0000 0111 0001 1000 1001 0100 0111 0101 1110 1011 1011 1011 1010 1100 1100 1100 1110 1000 1011 0111 1001 1111

The hamming distance between these two hashes (again, remember treating the hash as HEXADECIMAL, NOT ASCII) is 80.

Example dictionary file: sampledictionary

  • http://gocomics.com Samuel

    Wow, so now anyone who is searching for one of your 1,000 words in twitter will most likely be pointed towards Engine Yard.

    Well played.

    Samuel

  • Gedanken

    Cool Marketing!
    Cost = 1 x iPhone 3GS + *$2000*

  • Joc

    US only?

  • MrMr

    I get a totally different hash for the second phrase "I’m not a big believer in fortune telling" (without the quotes), the hash for the first sentence however is right.
    What could be the reason for this?

  • http://intensedebate.com/people/LuigiMontanez Luigi Montanez

    Already got my highly-optimized entry done:

    @engineyard Best Ruby on Rails cloud computing server hosting solution on-demand deployment management FTW!!

  • francois

    Yes, I'd like to know if Canada will be considered. Thanks!

  • http://intensedebate.com/people/wifelette Leah Silber

    @Joc:

    The contest is open to anyone, but the iPhone prize will be the standard North American version sold at Apple stores. Hope that works for you!

  • Michael Mullany

    Ooops. We changed the example phrase and forgot to update the hash. I will correct right now (sorry!)

  • http://intensedebate.com/people/wifelette Leah Silber

    Awwe.. thanks Luigi! We try :D

  • http://intensedebate.com/people/wifelette Leah Silber

    Canadians are just as awesome as the rest of us ;) Just keep in mind, as noted below, that we'll be purchasing the iPhone at an American Apple store, so you'll need to make sure that works for you :)

  • http://twitter.com/herrherr Chris

    By no means I am a good programmer nor a comp-sci heavy engineer, but I just tried to understand what this challenge is about since I am a curious guy.
    But when trying to reconstruct your example I don't get the same result as you did.
    Here is what I did:
    - generate a SHA1 hash out of the two sentences
    - convert the hash into binary
    - calculate the hamming distance
    But I always get 114 as the distance.

    Would be great if someone has a clue what I did wrong.
    http://pastie.org/545774

  • http://intensedebate.com/people/LuigiMontanez Luigi Montanez

    I'm confident that come contest time, that entry will coincidentally have a Hamming distance of not just 0, but of negative infinity.

  • Henry

    If it were $1M like the Netflix Prize, I would have tried. Not for as little as $2000. yes it is a Marketing ploy.
    Someone can go setup Amazon's Elastic MapReduce system
    http://aws.amazon.com/elasticmapreduce/
    fire up thousands of images and do it.
    SHA-1 digest is so random that even if you change one character from lowercase to uppercase the results will be very different and unpredictable.

    seems like a brute force attack will work. it doesn't need much *grey matter* to solve a brute force problem. it's an scaling problem.

  • Darnel

    What format will the dictionary be in?

  • Michael Mullany

    The dictionary will be a Macintosh TextEdit file with each valid word on a separate line with no white-space.

  • chris

    as far as i can tell,

    "I'm not a big believer in fortune telling" SHA1=c89afb8107a21d49d76e2d6e7c426a3658bf5255

    Hamming distance = 36

  • Mark

    What's the tie-breaker in case of two submissions with the same hamming distance, or (a special case of that) two identical submissions?

  • Dan

    In the case of a tied Hamming distance (entirely possible), the winner will be chosen by lottery among people with the best distance

  • Mark

    Bah, I see it now. Lottery. Would have been better if it was first submission wins, imho. But you get different dynamics – with the lottery option, people will use up all their time before posting in the last minute or so. Otherwise they would have spaced their postings over the contest period. So maybe lottery is what you want, if you're going for one burst at the end.

  • Michael Mullany

    Hi Chris,

    The second hash (as pointed out by MrMr) was incorrect
    Corrected hash is: d325e29428175a863b105555f086d0fd08bc5a27 (no quotes, no whitespace).

    – michael

  • Dan

    Is the string limited to using a single space between words or can multiple spaces be used between words? Can leading and trailing spaces be used?

    Thanks

  • http://intensedebate.com/people/wifelette Leah Silber

    Hi Dan,

    Single space between words only :)

  • http://holloway.me Josh Holloway

    I'm just in the process of learning programming, so I don't have the prowess to tackle this problem. However, it seems the actual algorithm for solving the problem is relatively simple, so won't this just come down to who has the most processing power at their disposal? I understand that 30 hours is not nearly enough to reasonably expect anyone to get a Hamming distance of 0, but with enough time, it's inevitable, right? Someone please correct me if I'm wrong.

  • http://gilesbowkett.blogspot.com Giles Bowkett

    There are seven words you can't say on television (not five).

  • Marcin Raczkowski

    Congratulations on great marketing plan :) I'm sure many will follow :) Not many people from ruby comunity will be able to partivipate If their secondary language is not assembler, c or scheme

  • http://intensedebate.com/people/wifelette Leah Silber

    Indeed — if I learned anything from George Carlin it was that ;)

    However… two of the seven words are over five characters, and thus cannot be used for the five character string for that reason alone.

  • http://twitter.com/herrherr Chris

    For me it is:
    042edf75c0438c8faa11b1d7189464d9076ac200
    I guess the problem is the _'_ .

  • Nanda

    yep, you are bound to get close to hamming distance of 0 if you work for NSA…its because u can have access to Huge clusters and/or supercomputers. :)

  • http://intensedebate.com/people/wifelette Leah Silber

    Hi Josh,

    The key is that your solution has to be efficient enough to make it happen on the cheap – which will lead to creative and innovative solutions.

    Yes, you could spin up a ton of EC2 instances, or rent time on your local NASA mainframe (okay, probably not, but you get it), but what's the fun in that, and would those really be good investments? It's an algorithms problem — even if you do do distributed computing, how to you efficiently avoid duplicating work? How do you write the program efficiently?

    Focus on the joys of writing a crafty program — it should be an enjoyable challenge :)

  • http://test.com Mars

    Gr8 marketing idea

  • JAH

    The apostrophe in "I’m not a big believer in fortune telling" is actually the Unicode right quotation mark (#8217).

  • JAH

    The apostrophe in "I'm" is actually the Unicode right quotation character (#8217).

  • http://intensedebate.com/people/wycats wycats

    Two of them are longer than 5 characters ;)

  • chris

    right-o, is that on purpose or automatic conversion from the blog software?

  • Dan

    The point is that finding yourself a cluster of machines, building a clever algorithm (hint: you can probably do better than totally random search), and automating and parallelizing the algorithm and then testing the hell out of it will take up any programmer's free time for a week – at least.

  • @sunneach

    confirmed your result.
    However they do not use <<'>> ascii(39), they are using <<’>> which has a code 8217 in some representation. How about the "simple" ASCII rule here???

  • Ben

    Even with the correct hash I get 114 as the distance using that conversion tool.

  • Ben

    Sorry, I mean 116.

  • aslam

    "….We will take the SHA-1 hash of this phrase: “Rubinius one eight six active active record memcached exception JRuby DHH TOKYO sdfe3″ which hashes… "
    - whats the sdfe3 at the end of the phrase for. It doesn't seem to be a part of the example dictionary excerpt…

  • Cinnyb

    Nicely played EY Marketing. They should all get iPhone 3Gs for this teaser….

  • Michael Mullany

    Aslam — you're allowed to postpend 5 random US ASCII printable characters to your tweet that's what the example shows

  • Michael Mullany

    To avoid confusion, I'm going to redo that section with code formatting and in US ASCII only.

  • http://fb-contrib.sf.net MeBigFatGuy

    aslam: You are allowed to append up to five random characters to the end of your entry

  • Michael Mullany

    Hi Chris, Thanks for pointing out the wisdom of avoiding unicode conversion rat-holes. I just updated the example with US ASCII only examples and showed you an example of how to calculate Hamming distance. Good luck with the contest!

  • http://footle.org Brad G.

    Does it have to be 12 *different* words from the dictionary?

  • http://paulhart.ca Paul Hart

    If you are of an academic persuasion, the current state-of-the-art for finding SHA-1 collisions is 2^52. This was only announced in the last couple of months.

    Paper available at http://eprint.iacr.org/2009/259.pdf

  • http://www.ahadamdani.com Ahad L. Amdani

    It doesn't seem to specify any technology use. Meaning, we may write our automation application in any language/platform of choice and simply 'tweet' the result?

  • Mike W.

    Has anyone gotten the same hamming distance of 74 in the example. I can't seem to get that result.

  • brainopia

    You posted wrong binaries. The correct ones are:

    cd36…etc. – 0110001101100100…etc.

    6cac…etc. – 0011011001100011…etc.

    You can check these values in ruby with unpack('B*') or online via http://www.geek-notes.com/tools/17/text-to-binary…

    Furthermore, the correct Hamming distance is 110. (http://gist.github.com/147217)

  • Mike W.

    That is what I am getting. Wasn't sure if I was doing something wrong.

  • http://intensedebate.com/people/wifelette Leah Silber

    Pretty sure the problem is that you're using unpack on the SHA1 String, which is ASCII. So what you're getting is the result of converting the ASCII string, rather than the hex String, to binary. The easiest way to get the binary solution in Ruby (that I know of) is "%0160b" % sha1.to_i(16)

    Hope that makes sense :)

  • http://intensedebate.com/people/wifelette Leah Silber

    Yep!

  • Michael Mullany

    Repeats are fine.

  • brainopia

    gotcha, thanks for clarification ;)

  • http://www.netcrema.com/?p=7522 Programming Contest! Win iPhone 3GS & $2,000 Cloud Credit | Engine Yard Blog « Netcrema – creme de la social news via digg + delicious + stumpleupon + reddit

    [...] Programming Contest! Win iPhone 3GS & $2,000 Cloud Credit | Engine Yard Blogengineyard.com [...]

  • Sara

    Any bonus points for a sentence that makes grammatical sense? e.g. "The quick brown fox jumped over the lazy dog and liked it" Seems like that should be worth a tie-breaker…

  • http://blog.tracefun.com Jamie Macey

    String#hex is even easier

  • Knodi

    Can't get my sha1 to match in C but matchs in ruby.
    http://pastie.org/546246

  • peele

    and don't forget you can '8'.to_s(2) too

  • Michael Mullany

    Just a guess, but perhaps your C SHA-1 implementation is taking the C string terminator ()as input?

  • http://weatherfinder.info Jeremy

    I agree with brainopia that their example's Hamming distance is 110

  • http://intensedebate.com/people/wifelette Leah Silber

    Hi Sara —

    Not in the current rules, but we'll keep it in mind for future contests :)

  • http://intensedebate.com/people/wifelette Leah Silber

    Hi Jeremy,

    I'm guessing you're making the same miscalculation as some other folks: pretty sure the problem is that you're using unpack on the SHA1 String, which is ASCII. So what you're getting is the result of converting the ASCII string, rather than the hex String, to binary. The easiest way to get the binary solution in Ruby (that I know of) is "%0160b" % sha1.to_i(16)

    Hope that makes sense :)

  • http://intensedebate.com/people/wifelette Leah Silber

    Hi Ben,

    Pretty sure the problem is that you're using unpack on the SHA1 String, which is ASCII. So what you're getting is the result of converting the ASCII string, rather than the hex String, to binary. The easiest way to get the binary solution in Ruby (that I know of) is "%0160b" % sha1.to_i(16)

    Does that work it out for you?

  • http://intensedebate.com/people/wifelette Leah Silber

    The post has been updated to correct for this :)

  • Bill

    Can target words be repeated? The example suggests so because "active" is repeated.

  • medina

    To get the sha1sums posted, avoid appending extra chars to your input:

    $ echo -n 'I am not a big believer in fortune telling' | sha1sum
    6cac827bae250971a8b1fb6e2a96676f7a077b60 –

    $ echo -n 'Rubinius one eight six active active record memcached exception JRuby DHH TOKYO sdfe3' | sha1sum
    cd36b6dc8d4ed51b36dd7fce08f500392a7fb782 -

  • http://www.theskolor.net Skolor

    To verify: Will the phrase to get to be one of the sets in the dictionary? Does the phrase to match fit into the requirements for the contest?

  • http://www.mashd.cc/ James Harton

    Hamming requires both strings to be the same length. I'm currently zero-padding the shorter string to the correct length, but this could be very wrong. What are others doing?

  • http://www.blik.it hosiawak

    Are we limited to any particular technology/programming language or is anything allowed ?

  • Tienshiao

    You should be hamming the SHA1 hash, which is always the same length.

  • Slava

    James Harton: I guess others are comparing hashes instead of source strings.

  • http://www.mashd.cc/ James Harton

    I'm comparing the hashes, but nettle seems to give me different length hashes. Odd:

    > hash("foo");
    (1) Result: "beec7b5ea3ffdbc95ddd47f3c5bc275da8a33"
    > hash("fooff");
    (2) Result: "2576bf457ccd4d1ed275be6973c447a9878f334"
    > hash("bar");
    (3) Result: "62cdb72ff920e5aa642c3d406695dd1f01f4d"
    > hash("barf");
    (4) Result: "dd88611b3ac9bf96c2d0f9828b87f2cb857fac2e"

  • http://intensedebate.com/people/SaxtonHale SaxtonHale

    Are duplicate words permitted, such as in "six tokyo one six controller rubinius controller DHH data tokyo Cloud controller", where "six", "Tokyo", and "controller" are repeated?

  • http://afreshcup.com/2009/07/15/double-shot-496/ Double Shot #496 « A Fresh Cup

    [...] Programming Contest! Win iPhone 3GS & $2,000 Cloud Credit – Engine Yard is planning a contest that is going to make Twitter even more annoying for a while. [...]

  • Koraktor

    Is it allowed to use words from the dictionary multiple times?
    Example: Would "ruby ruby Ruby RUBY ruby ruby Ruby RUBY ruby ruby Ruby RUBY" be allowed?

  • pete g

    A question: do we hash the strings including a null-terminating byte, or without?

  • http://intensedebate.com/people/wifelette Leah Silber

    In the case of identical submissions, the first one counts.

  • http://intensedebate.com/people/wifelette Leah Silber

    Nope :) If you can build it, you can use it!

  • http://intensedebate.com/people/wifelette Leah Silber

    Yep! Dupes are fine :)

  • http://intensedebate.com/people/wifelette Leah Silber

    No issues with that submission, except that it's hard to say five times fast ;)

  • Michael Mullany

    We won't be using a termination byte when we hash, so make sure your C functions are not string hash functions that take the null-terminating byte as an input (aka we're treating the tweet as a Ruby etc. string, not a C String)

  • http://www.brianlane.com bcl

    Here's some example python code, and discussion at Hacker News – http://news.ycombinator.com/item?id=705249

  • Anthony

    Is there only 1 prize and only the person who entered the lowest hamming sentence quickest get the prize. Or anyone who got the lowest hamming distance gets it, i.e., there could be multiple winners and time isn't the factor as long as you submit the answer by the deadline?

  • Anthony

    nvm, it says lottery draw.

  • tubby

    The 5 random chars at the end are optional, right? So long as I have 12 words from the list, I'm OK?

  • Anthony

    Are spaces allowed for the 5 random chars at the end? e.g. a_cde, ab_de.
    I suppose the only space not allowed is right at the beginning or right at the end of the 5 random chars?

  • http://engineyard.com Tom Mornini

    Yes, tubby, that is correct, the 5 chars at the end are entirely optional.

  • Michael Mullany

    Any printable US ASCII character is good for the last five spacess

  • Dan

    nope, i'm pretty sure a brute force attack won't work. There are 1953840414726664053684327000 different hashes that an 1000-word dictionary with 12 choices can make. Even if you check a trillion hashes / sec, this would take over 62 million years to brute force.

    Go ahead, Henry.

    I just love it when people have no clue what they're talking about.

  • http://www.intensedebate.com/people/wifelette Leah Silber

    Hi Anthony,

    The winner receives an iPhone + $2000 in Cloud Credit, and the runner up receives an iPhone. Unfortunately, there can only be one winner and one runner up :)

    That said, if multiple people end up with the same lowest hamming distance, there will be a lottery.

  • Anthony

    Can we get an sample dictionay file (for instance with the words given above), in the format that will be supplied for the competition (ie, a textedit rtf file). I want to make sure I have a program to read and convert it to a usable form. Thanks.

  • @sunneach

    seen that – case dismissed thx

  • http://intensedebate.com/people/pjonesdotca pjonesdotca

    I notice that there is a difference in the reported binary ouput (this one has spaces for each hex digit) where as the first example was a long stream of the binary output.

    Will spaces be considered in the hamming distance?

  • grant

    The examples have a space between the dictionary words and the random string. Is that mandatory? Does it count against the 5 random characters (does not appear to in examples)?

  • http://intensedebate.com/people/charlieok39431 charlieok

    I'm wondering how else to attack this — I mean, if you could easily reverse a hash like this, that would basically undermine anything out there that depends on SHA1 for security, yes?

  • Jochem

    For the people reporting that the EngineYard's hashes or binaries are incorrect, first try this in Ruby: http://pastie.org/547790
    The expression results match the examples of EngineYard.

  • anon

    Then there is the obvious problem that you can simply observe what is submitted on twitter and resubmit the entry with the best score under your own name…

  • tubby

    What's the lowest score anyone has gotten in testing? Best I've gotten is 48 after 12 hours of testing? Anyone lower than that?

  • http://projectgus.com/?p=38 Project Gus » Blog Archive » Computer-powered lottery tickets : The Engineyard programming contest

    [...] first read about the Engineyard Programming Contest yesterday and I thought it was a silly contest, winnable only through the application of raw brute [...]

  • http://www.projectgus.com/ Angus

    41 after about 30 minutes with a single thread in C, checking ~2.5 million hashes/second.

    Improvements in "best value" seem to drop off pretty quickly after mid-40s, I think that's the threshold between "easy" guesses and exponentially harder ones.

    I blogged some observations and comments from messing around with this: http://projectgus.com/?p=38

  • http://invece.org antirez

    "memcached DHH eight Rubinius record memcached TOKIO Rubinius TOKIO DHH TOKIO eight IDIbc" with hamming distance of 36 against "I am not a big believer in fortune telling" is my best result so far. But never run my program for more than 20 minutes, just trying to get it faster for now.

    I'm using a C program with different optimizations. Hint: if you use in a smart way the fact you can postfix the words with the 5 char random string you'll get much more speed (2x at least).

  • tubby

    Thanks for sharing. I've hit 41 now myself. Best so far for me. After about and hour. I think it's just blind luck. Speed and time help of course… may get luckier, faster. I'm a bit disappointed. I think there will be a lot of tied submissions… no clear winner.

  • tubby

    Thanks! I got 41 now as well. I enjoyed reading your article and agree 100% with what you wrote. The winner will have the most CPU horsepower. I'm doing C++ on an 2.5 year old AMD dual core box. I can do about 2.5 million a second. Someone who can do 20 million or more a second will blow me out of the water. I still might get lucky though… as it is a lottery ;)

  • http://adamhunter.me Adam

    you don't actually need to get the binary value to get the hamming distance, you just need the integer value of the hash. (depending of course on how your calculate the hamming distance).

  • http://intensedebate.com/people/wifelette Leah Silber

    Thanks Jochem!

  • Jordan

    So really.. getting a hamming of 0 would be so close to impossible its really just whoever randomly gets closest. Due ot the avalanche effect… "blah blah blah blah blah" has just as much chance of being a close hash as "holy cow holy cow holy cow holy cow" and "blAh blah blah blah"
    could make something totally different. The current complexity for finding a collision with sha1 is something like 2^52, but that is prolly assuming that any number/letter input combination could be used. It would be allot harder to find a combination that causes a collision with only 1000 words to choose from, if there even is a combination that could create a collision. It will be interesting to see how this turns out though :)

  • http://intensedebate.com/people/wifelette Leah Silber

    Hi Anthony,

    Working on this, should have something soon :)

  • eric

    41 here also, after ~45 mins or so.

  • http://intensedebate.com/people/MarkusQ MarkusQ

    Without algorithmic improvement each successive bit takes ~3.25 times as long, so to get two more bits (on average) you'll need an order of magnitude more resources. That means it's still got a large element of chance unless you do something tricky.

  • http://intensedebate.com/people/MarkusQ MarkusQ

    I think several people here have independently found the optimization you allude to (e.g. see Angus's blog). There are also at least two other possibilities (see the questions being asked) but at this moment I've only got the "69 character answer" hack working myself.

  • N_B

    I have determined the optimal algorithm for this contest:
    * Write a bot that constantly searches twitter for all posts @engineyard
    * Calculate the SHA1 of each and the hamming distance from the phrase
    * With only a few minutes to go, create five posts, each with the winningest phrase

    Unless someone posts a better answer in the closing five minutes or so, this will guarantee that you get in the lottery for the grand prize.

    And you scrubs are screwing around with dictionary parsing…heh.

  • http://www.projectgus.com/ Angus

    FWIW, the optimization doesn’t need to use exactly 69 characters. From what I understand, and from a quick dip in the Nettle source to verify, SHA-1 only works on 64-byte blocks, so 69 characters get padded to 128 before the algorithm generates a digest. AFAIK there’s no computational advantage hashing padding instead of real data. However, hashing all 128 bytes each time is clearly slower than hashing 64 bytes once and then doing just the second 64 in an inner loop.

  • http://www.projectgus.com/ Angus

    I reckon you’re on the money, Markus. I think there’s also a good chance of an N-way draw.

  • http://invece.org antirez

    I'm using the same optimization indeed, but without to care about the 69 characters. If you save the state just before the final 5 chars update() call you can restore the structure and continue the computation.

    Of course it's wiser to avoid to go over the 1024 bits in total (128 bytes).

  • http://invece.org antirez

    My best entry so far, hamming distance=34 against the test entry: "This problems Often to with python servers client fixing Often to service 1YFWA", computed in around 10 hours using a single PC with four cores. I can compute 3.5 million hashes per second so basically this is the result of around 14M/s power. I hope to find at least another box before of the real entry.

  • http://invece.org antirez

    I mean "3.5 million hashes per second per core".

  • http://www.intensedebate.com/people/MarkusQ MarkusQ

    My call four days before the contest: after doing some commute time mental math on it, I've concluded that I'll hit hamming distance 38±1 and the winner will be at 36±2. Anything <30 will amaze me.

    – MarkusQ

  • http://invece.org antirez

    I did some math, and I think this is the probability of two random strings having the hamming dinstance of 'D':

    (1/2^160)*(160!/((160-d)!*d!))

    This is why. For example imagine we want to compute the probability of an hamming distance of 2. I want the first bit to be equal, that occurs with a probability of .5, the seocond bit to be equal to, again .5, the others to not be equal, that is .5 each. Basically for any given pattern of equal/not equal we want there is a probability of .5^160 for a specific arrangement of equal/not-equal of the bits.

    But… with an hamming distance of 2 it is required that 2 are equals and two are not, in any kind of order. How many ways there are to arrange 160 items, two black and 158 white? 160!/(158!*2!), so I actually have to multiply the probability of a single arrangement for the number of possible arrangements.

    Ok, what are the practical implications of this? That with an optimized C program you can compute around 3.25 million sha1 hashes per second per core (this is what I got with my implementation). So if you have a server farm with 250 computers that happen to be quad cores, you need the following amount of hours to get a given hamming distance:

    6 hours for an h.d. of 33
    26 hours for an h.d. of 32
    108 hours for an h.d. of 31

    it looks like that university clusters can get us a solution with an HD of 32 or 31 with a bit of luck.

    People having just ten quadcore boxes can expect to reach an HD of 35 in 12 hours.

  • http://intensedebate.com/people/MarkusQ MarkusQ

    That optimization doesn't _need_ to use exactly 69, but if it does (or if it uses 68) the final 512 bits are knowable in advance and the second pass reduces to a pure function f(u160,u40) or f(u160,u32). If we are allowed to use repeated words in the preamble this reduces to g(u16,u32) or so, which (if we had the NSA's resources, and yet still cared about this contest) would be a solution right there.

    I don't say this will make much difference here(in fact, I suspect it won't), but in a real world problem being able to factor the space like that can be a huge win.

    – MarkusQ

  • Bob Bobson

    "If the exact same string is submitted multiple times, only the first submission counts"

  • http://intensedebate.com/people/MarkusQ MarkusQ

    That was roughly my reasoning the other day (see my point in the middle of page 2), but I also applied a guesstimate of how much resources people will actually be willing to throw at it. People with 250 quad core servers generally have better things to do with them.

    – MarkusQ

  • Michael Mullany

    Those spaces are just the way that the <code> html tags format binary. They are not real spaces

  • Michael Mullany

    space is mandatory. does not count against 5 random characters

  • Henry

    A 20,000 CPU Amazon EC2 cluster for one hour costs $2000.

  • http://www.hack.net rkb

    What time on the morning of the 20th? thanks!

  • http://www.blik.it hosiawak

    @N_B: I believe that the entries that enter the lottery are the ones with THE SAME HAMMING DISTANCE not the same text content. In an unlikely event of someone posting exactly the same message as someone else I think only whoever posted it first should be considered. Otherwise it doesn't make sense due to the cheat you just described.

  • David Balatero

    Is there a reason the dictionary is in a ghetto-ass RTF format? Why not plain-text…

  • Henry

    thank you very much for your calculation. your big number is for a zero hamming distance, so please do not brag about it. with randomly trying with 20000 EC2 cores for 1 hour (which is possible if you want to take the risk, and spend 2000$ for it) you can pretty much reach a hamming distance to win the prize. and for how SHA-1 works, it's just a damn brute attack to lower the hamming distance. common sense huh?

  • tubby

    Select all and copy into a plain text file… problem solved, right?

  • http://involution.com Tony Perrie

    Assuming that printable ASCII charcters excludes space in the random string. So, printable chars are decimal 33 to 126?

  • http://regedor.com Miguel Regedor

    How many words long has to be the string? Can I repeat the dictionary words ?
    For example, supposing "one" is a word from the dictionary, can my submission be:

    "one one one"

  • http://regedor.com Miguel Regedor

    And can the five random chars at the end of the string be numbers only?

    like:
    "My sentence is really random 12345"

  • tubby

    There is some cuda code and binaries in the Nvidia forum. So, anyone with a 8xxx or newer Nvidia graphics card can beat the pants of everyone else. No programing experience or clever sha1 tricks required… http://forums.nvidia.com/index.php?showtopic=1023…

    Well, there goes my CPU based C++ code. I got up to 2.5 mill a second, but that won't stand up to a a newer GPU and CUDA.

  • http://remi.org remi

    Any US ASCII prinatable characters are allowed:

    "Only US ASCII printable characters in your custom five character string please (we really want to avoid Unicode rat-holes)"

  • http://remi.org remi

    Good question! EngineYard … do you have an answer?

    "a sequence of twelve words" … "from a 1,000 word dictionary we will provide"

    The rules don't seem to say, one way or another … :/

  • Bob Denver

    That’s pretty impressive. It’s ready to attack the contest now, so I think everyone with an NVidia board is just going to use it to beat all of us. 180M/sec is so much faster than my 2M/sec CPU, it’s pointless to compete.

  • http://www.projectgus.com/ Angus

    There is one last chance for victory:

    1) Create web page with Javascript implementation of problem, and AJAX twitter client. Looking at Sunspider, should get about 100 hashes/sec if you're lucky.

    2) Post link everywhere with "Win a free iPhone, just keep this page open for 30 hours!"

    3) Go viral, get billions of people opening the page.

    4) Pocket your cool $2000 of cloud credit.

    ;)

  • http://golubev.com Ivan

    It's not pointless actually. It's impossible to cover whole range (2^80) no matter how fast you can do computations, so luck factor is a key here. For example, I've atm about 600M hashes/sec with my ATI GPUs but after several hours I'm still on length == 36 while somebody above got 34 with CPU only being 100x times slower.

  • Josh

    I'm sitting at 28 as my lowest, but thats using some random phrases and alternate words for everything and a clever algorithm, I figured it was all random pretty much anyhow. I'm wondering if Amatch::Hamming#match is returning the proper results. No, I don't plan on entering but it was fun exercise to write in about 5 minutes.

  • duderino

    Does anyone know what time the contest will start?

  • http://regedor.com Miguel Regedor

    not sure, but I think it should be two hours from now.

  • http://regedor.com Miguel Regedor

    can I only have 2 random chars at the end of my string? or if I opt by adding random chars, I have to add five?

  • tubby

    My understanding is you may have 0 – 5 appended to the end of the twelve words.

    So, having twelve words with none or one or two or three or four or five chars appended to the end is fine. Six and above are bad. Got it?

  • http://regedor.com Miguel Regedor

    yes, it makes sense. thank you.

  • http://intensedebate.com/people/bantic bantic

    When are you going to post the challenge sha1?

  • http://intensedebate.com/people/engineyard engineyard

    We're posting the challenge at noon pacific time– as we mentioned in the blog post, you will have 30 hours to search before the 6pm cutoff tomorrow.

  • http://intensedebate.com/people/wifelette Leah Silber

    Noon pacific time – so, very soon!

  • http://intensedebate.com/people/wifelette Leah Silber

    Duplicate words are indeed allowed :) See the second sample provided for a usage example.

  • http://intensedebate.com/people/wifelette Leah Silber

    Hi hosiawak,

    You're correct. If the exact same phrase is submitted, only the first entry gets in.

  • http://intensedebate.com/people/wifelette Leah Silber

    Noon!

  • Michael Mullany

    Well, technically space is a printable ascii character, so go for it.

  • http://blogs.osuosl.org/kreneskyp/2009/07/20/pydra-status-update-july-09/ Pydra Status Update July ‘09 at Free Range

    [...] a few blades and other random desktop machines we had lying around. We’re throwing it at the Engine Yard Contest just to try out Pydra. The trial attempts have been good, but this was more about kicking the tires [...]

  • http://ejohn.org/blog/web-workers/ John Resig – Computing with JavaScript Web Workers

    [...] other day Engine Yard started an interesting contest (which is probably over, by the time that you’re reading [...]

  • http://prajwalaa.wordpress.com/2009/07/23/why-i-fail-to-submit-my-phrase-to-engineyard-contest/ Why I fail to submit my phrase to engineyard contest « Flame’s blog

    [...] I fail to submit my phrase to engineyard contest 23 07 2009 There is a programming contest which is conducted by engineyard. The contest is interesting so I planned to participate. I wrote [...]

  • http://www.dogfeeds.com/?p=457 Computing with JavaScript Web Workers « Dogfeeds IT Telescope

    [...] other day Engine Yard started an interesting contest (which is probably over, by the time that you’re reading [...]

  • http://fnord.phfactor.net/2009/07/23/paul-wins-an-iphone/ Fnord. » Blog Archive » Paul wins an iPhone!

    [...] July 14th, hosting company Engine Yard posted a programming contest that was (presumably) to promote their services and generally create PR. First and second prize [...]

  • http://www.freelottosystems.com/ Charlotte Thomas

    This sounds interesting..but how about an interested person from Philippines? Would it be possible?

  • Anonymous

     iPhone owners are the proudest techno owners of the latest technology gizmos.
    iPhone 5