Blog

Rubinius wants to help YOU make Ruby better

By | August 30th, 2010 at 4:08PM

It is a great time to be a Rubyist. This year we have already seen IronRuby 1.0, JRuby 1.5, with Ruby 1.9 due to be released shortly. Ruby is simply becoming better and faster on every platform. And, wherever Ruby is, Rails is sure to be nearby. Rails 3 looks more awesome each day.

Recently, our very own Rubinius officially joined the ranks with a 1.0 release. We are excited to see folks trying it out. All the feedback and issues reported have been a great help. Many people are reporting that their apps “just work”.

With all this great news, the Ruby world looks rosy indeed. However, we can make Ruby even better. To do so, we need your help. You may not realize this, but the quality of the Ruby code you write can have a significant impact on how great we can make Ruby. I’d like to share some tips about how you can improve your Ruby code while helping us make Ruby better too.

0. Rubinius

Rubinius is a completely new implementation of Ruby. When Evan Phoenix started Rubinius, he put some stakes in the sand. Rubinius has a modern, bytecode virtual machine, a cutting-edge garbage collector, a just-in-time (JIT) compiler utilizing the awesome LLVM project, and a Ruby core library and bytecode compiler written in Ruby. We are only just getting started with 1.0. We have a whole list of features coming, including support for Windows and Ruby version 1.9, as well as improvements to the JIT compiler that should make Ruby several times faster, and removal of the global interpreter lock (GIL) so that your threads will execute Ruby code concurrently. Rubinius does a lot of things differently than MRI under the covers. As Rubinius has grown up, we’ve definitely seen a wide cross-section of Ruby code while working on features and compatibility. The tips for writing better Ruby code below are based on some of the challenges we have faced.

1. Sending Messages

Rubinius is unique among the various Ruby implementations in that it implements the Ruby core library primarily in Ruby. Even the primitive methods, operations implemented in C++ that must access the virtual machine directly, appear to other Ruby code as normal Ruby methods. Importantly, calling these primitive methods from Ruby code is like calling any other Ruby method.

Early on in the Rubinius project, a lot of attention was focused on the idea of Ruby in Ruby. This was a good idea for several reasons, one of which being that Ruby is a more elegant and expressive language than C or Java, and that Ruby programmers tend to understand Ruby code pretty well. This familiarity with Ruby makes Rubinius easier to develop and maintain, and more approachable for many Ruby developers. The validity of these reasons has been demonstrated in the life of the project. However, there are two other very important reasons that don’t attract quite as much attention.

The first of these is performance. As Evan often points out, Ruby is the currency of the Rubinius VM. It understands Ruby inside and out. The VM knows how to find a Ruby method, how to look up a constant, and what it means for an object to reference another object. The Rubinius VM operates on a special representation of Ruby code. This representation is often referred to as bytecode and is essentially a stream of instructions for the virtual machine. The JIT compiler, which can significantly improve Ruby performance, also operates on bytecode. What this means is that to the JIT, your program and the Ruby core library look an awful lot alike. So much, in fact, that the JIT compiler can mix them all together, which gives the optimizer much greater opportunity to generate really fast code.

The second reason is the consistency and elegance of an object-oriented language. When the Ruby core library is written in Ruby, you call a Ruby method, well, by calling a Ruby method. That may sound redundant, but I assure you, it is not. In MRI, for example, with the Ruby core library written in C, the code will often call directly to a C function rather than dispatching normally through Ruby method calls. What this means for you is that MRI may invoke “Ruby” functionality without engaging you in the conversation at all. That inconsistency may prevent you from using simple and elegant object-oriented code that extends the functionality of core classes.

In contrast, when functionality is invoked through normal Ruby dispatch, your code can be elegant and participate in the process. However, this is a significant double-edged sword, as we have become painfully aware of in Rubinius. When we implement all the complex behavior of the core library in Ruby, it’s quite possible to do something crazy, like remove all the Ruby methods we need to make an object work! That is pretty crazy, right? Fortunately, in this coding wild west, there is a very important principle that can lend some law and order.

2. Liskov Substitution Principle

You may have heard this term tossed around in discussions. If you haven’t, don’t worry, we’ll delve into this fairly intuitive idea. If you have, I hope to renew your commitment and respect for this principle.

So, what are we talking about here? Barbara Liskov and her collaborators were concerned with how to write reliable object-oriented software. As you know, one of the principle ideas in class-based object-oriented languages is inheritance, or the relationship between a class and its subclasses. What sort of rules should govern this relationship? What should we expect when we use a subtype in place of a supertype in our program? These are the questions that Barbara Liskov and others were pondering.

What they proposed is referred to as the Subtype Requirement, which they defined as:
Let q(x) be a property provable about objects x of type T. Then q(y) should be true for objects y of type S where S is a subtype of T.
(see Behavioral Subtyping Using Invariants and Constraints, by Barbara H. Liskov and Jeannette M. Wing.)

Let’s consider this in terms of some Ruby code. Suppose you have this class in your program:

  class FancyArray < Array
    def initialize(size)
       # ...
    end
  end

What is wrong with this picture? Well, in my Ruby code, I can do x = Array.new. But what happens when I attempt to use the FancyArray class in place of Array? If I do x = FancyArray.new, I will surely get an ArgumentError exception because FancyArray requires that I pass one argument when calling the new method.

Let’s phrase this in terms of the Subtype Requirement: Let x be an instance of Array. Then q(x) = the arity of the initialize method is -1. Let y be an instance of FancyArray, which is a subclass of Array. Then q(y) = arity of the initialize method is -1 by the Subtype Requirement.

Now let’s relate the above to Ruby code and check if the Subtype Requirement holds:

irb(main):001:0> x = Array.instance_method(:initialize).arity
=> -1
irb(main):002:0> y = FancyArray.instance_method(:initialize).arity
=> 1
irb(main):003:0> x == y
=> false

It is clear from this that FancyArray does not conform to the Subtype Requirement. Consequently, code that expects to use an Array will not function correctly when a FancyArray is substituted. It’s important to also note that the Subtype Requirement applies to any observable property of the object. The example used in the paper is of a Stack and Queue. Both classes may provide push and pop methods, but the semantics of the methods are quite different between the two classes.

Now, you may say, “But, I have a very good reason for requiring an argument to new.” Well then, I would venture to say you have an important reason to consider the difference between composition and inheritance for designing your program.

3. Composition versus Inheritance

Of the three object-oriented principles—inheritance, encapsulation, and polymorphism—inheritance has been so abused there could be a 12-step program devoted entirely to it. Fortunately, the remedy for inappropriate use of inheritance is quite simple: compose your objects of other objects.

Inheritance models an is a relationship, while composition models a has a relationship. If your object is a String, then it will do all the normal String things just as a String would do them. This is very important. It needs to do String things not just externally, when you call the methods, but internally, when the other String methods call each other. Is your FancyTemplate class really a String? Then, for example, I should always be able to request its length. However, your FancyTemplate instance probably doesn’t have a length when it is being built. Therefore, String methods that may be employed during the construction phase could be highly confused. In such case, I suggest your FancyTemplate has a String internally, and it can be urged to give you a representation of that String at some point in time. Yet, it is not a String from the perspective of inheritance and conforming to the Liskov Substitution Principle.

Only you can tell whether your model is best represented by inheritance or composition. When designing your classes, be sure to consider the view from inside and out. If you are contorting your methods to act like the class you are inheriting from, perhaps your class only has one of those things, rather than being one of them. Most importantly, remember that you are not the only kid on the playground.

4. Playing Nicely

This is more about general advice than specific admonitions. We are lucky to have such a powerful, expressive language in Ruby. Opening a core class to patch a method is tremendously useful and powerful. However, remember that with great power, comes great responsibility.

First and foremost, simply be conscious of what you are asking Ruby to do for you. I used this example earlier, and I’m going to repeat it because in Rubinius we have encountered this more times that we can count. Ruby is an object-oriented language. You cause computation to occur by sending messages to an object. How can the object work if it has no methods? (I say with my best Zoolander impersonation). If your code does:

  class SomeClass
    instance_methods(false).each { |m| undef_method m }
  end

you are (most likely) doing it wrong. There are many variations on this theme, but they all share the same problem: the assumption that those methods you are removing are as superfluous as Johnny’s appendix. I assure you, we don’t randomly add methods to classes in Rubinius. Again, your code may work fine in MRI when you do this because MRI calls C functions on that object behind your back with impunity. But, we do want to have nice things, right? If you ever wonder what consequences your code may have, just drop into the #rubinius channel on freenode. We will happily discuss it with you.

A related problem occurs when code inherits from a core Ruby class and redefines one of the core methods. When the core classes are implemented in Ruby, the methods may depend on one another to perform their tasks. For example, in Hash it would not be entirely unreasonable for each_value to be implemented in terms of each. Well, not unreasonable, that is, until you try to run REXML in the Ruby Standard Library. REXML has an Attributes class that inherits from Hash. The Attributes class then implements an each_attribute method. For good measure, it overrides each to use each_attribute. And each_attribute calls each_value. Waiter, I believe there’s a StackError in my Attributes. The moral of the story: the two edges on this wonderful Ruby sword are sharp. It does take extra work to consider how methods on a particular class interact with one another; to some extent, this is an implementation detail. However, it’s something to be aware of when you write code. Of course, you can always browse the Ruby implementation of the core classes in Rubinius.

Playing nicely is more than being conscientious about how you write your own code. It’s also important to consider how you use code others have written. Your code should not depend on implementation details of the classes and libraries you use. However, it’s often hard to know what those implementation details are. Often the dependency will be subtle and implicit. Your code will appear to work fine in MRI but break in one of the alternative implementations. There is no general solution to this problem, but you can usually avoid it by checking the assumptions your code makes about the other code it uses. One example of this is mutating a collection in the block passed to an iterating method. Consider the following code:

some_hash.each { |key, value| some_hash.delete(key) if fancy_test(value) }
Hash is a fairly complex data structure and this bit of code can have very different behavior depending on how Hash is implemented. Thankfully, Matz has explicitly said this behavior is undefined.

5. Neighborly C Extensions

While playing nicely in Ruby code is important, it’s also very important when writing C extensions. These are programs typically written in C/C++ that directly access the C functions that MRI uses to implement Ruby. You probably regularly use one or more gems or libraries that are partially implemented by a C extension. C extensions are often used to access native libraries from Ruby, for example, when writing database adapters.

C extensions are not the only way to access native libraries from Ruby. There are also the FFI and DL libraries. Rubinius was the first implementation to popularize the use of the foreign-function interface (FFI) library for accessing native code. In fact, vital pieces of the core library in Rubinius are implemented via FFI, which is a modern implementation of DL, the dynamic load library that MRI has included for years. There are now quality implementations of FFI available on both JRuby and MRI.

FFI is generally the preferred way to interface with native libraries. The benefits include not needing a C compiler and being able to harness the speed or power of a native library while writing pure Ruby code. However, there are still two core use cases for C extensions: 1) when the data marshaling through the FFI layer imposes too large a performance cost; or 2) when your code already relies on an existing C extension. These use cases are hard to get around. Fortunately, we have put a lot of effort into getting C extensions working quite well on Rubinius. In fact, many C extensions just work.

However, there is one particular problem with some C extensions that limits our ability to support them: some have explicit dependencies on MRI data structures, for example, RHash. Depending on a data structure your code does not control makes your program vulnerable to breaking if the other code changes its implementation. Unfortunately, the C programming language doesn’t do much to enforce good practices here. If the C compiler can see a structure or function in a header file, you are free to use it in your program. Yet, just because you can, does not mean you should. Instead, you should always use a function interface (also known as an API) to access the data. Treat data structures that are not your own as opaque.

Of course, that is the ideal world. MRI cannot foretell every use case that a C extension may have. So some of these problems are simply the result of people being more creative than the MRI developers imagined, which is mostly a good thing. In version 1.9, MRI is enforcing the use of API’s over raw struct access. For example, rather than using RSTRING(obj)-&gt;ptr, your code should do RSTRING_PTR(obj) instead. Since Rubinius is compatible with MRI version 1.8.7, we still support both forms in this case. However, to make your code robust and portable, you should use the RSTRING_PTR API.

One thing Rubinius does not support is code like RHASH(obj)-&gt;tbl that accesses the RHash struct directly. This is partially because, in Rubinius, Hash is implemented entirely in Ruby. However, most C extension code needs to do something like iterate over the entries rather than just access the structure. In this case, the rb_hash_foreach function is available, so it’s quite easy to change a C extension so it will run on Rubinius. In fact, a number of C extensions have already been updated in this way. If you encounter a problem with a C extension, please file an issue for it.

We understand there are valid use cases for writing C extensions. While Rubinius is implemented very differently than MRI, we want your C extensions to be able to run in Rubinius and we have worked hard to ensure that most C extensions do run. If you encounter cases where there is no function API to work with MRI data, let us know. We can collaborate with Matz and the MRI developers to add such APIs. That way, you can help us help you to make Ruby better for everyone. Win!

Ruby is a terrific language and with your help, it can be even better. Do you have any tips for writing better Ruby code? Please, let us know.

If you are new to Rubinius, you may find these previous posts informative:
  • cnutter

    Rubinius has done a great job blazing a trail with regards to native libraries and native extensions, and we're very happy we've been able to support the same FFI and C extension APIs in JRuby. That makes Brian's comments about "Neighborly C extensions" even more important, since JRuby suffers from the same limitations that Rubinius does when it comes to writing C extensions to MRI's API. RHASH and RSTRING->ptr are just the tip of the iceberg.

    Do not expect to have access to class method tables (we both implement them differently).

    Do not try to find the various MRI runtime structures (like the AST, frames, or scope objects)…we have completely different execution models.

    Do not expect C extension code to run concurrently (on JRuby now or on Rubinius when it supports concurrent execution). Because of the global nature of many C extensions' state, we both must ensure only one thread hits C code at a time.

    And above all know that because JRuby and Rubinius must isolate C code from the locations of Ruby objects in memory, C extensions with very fine-grained APIs (e.g. every method on every class does a call-out to C) are going to perform *much* worse on JRuby and Rubinius due to the extra copying and indirection required. Design your APIs well and with these limitations very much in mind.

    (I would also be lying if I said JRuby's C extension support were the recommended path forward. It will work well as a stopgap measure, but the ultimate solution will usually be to port to Java or find an equivalent Java library to use that doesn't suffer from performance and concurrency penalties.)

    That said, we still want C extensions to work as well as possible just like the Rubinius guys, and we'll stand with them to educate people (and to work with CRuby's core team!) on how to write fast, reliable extensions.

    (And of course the remaining points Brian makes about good OO design apply to any environment…I wholeheartedly agree!)

  • Pradeep

    I believe your example for LSP(Liskov Substitution Principle) is flawed. LSP is about being able to replace objects in a program with instances of their subtypes without affecting the correctness of the program. It only applies after you've created objects not when you're creating them. If your code expected an Array object and you supplied a FancyArray object, you'd expect your new object to respond to same messages. An example would be if FancyArray did #remove_method on "length" method.

  • Brian Ford

    The application of LSP to this case is correct. Classes are Objects in Ruby. You can assign them to a variable and call them like you call any object. The example is from actual problems that we have encountered. See this Rubinius commit: http://bit.ly/b4iNEV.

    All observable behaviors of objects need to respect LSP if we are to have reliable and robust object-oriented software.

  • Pradeep

    Makes sense now. Thanks for the clarification.

  • Brian Ford

    You're welcome. Thanks for the question. Ruby is such a wonderful and powerful language. It's truly a joy to work with. Sometimes I think LSP can come across like it's trying to steal some of our fun, but really, it's a powerful ally that can help us write really good software.

  • Theo

    I think the confusion here stems from the peculiarity of Ruby's .new and #initialize methods. Pradeep is right if we just consider the .new method, because it belongs to FancyArray, which is a subtype of Class, not Array. However, .new is intrinsically coupled to the #initialize method, which belongs to objects of type FancyArray. In practice it's not very different from most other languages with class based inheritance, but in theory it makes what you say correct, especially since you put an emphasis on the #initialize method.

    It is an odd thing to argue though. If we consider the .new method instead your argument would not hold — classes and the objects that are created from their .new method are unrelated in the eyes of the LSP.

  • Brian Ford

    I think the commit I reference fully illustrates why LSP applies to this case and why it is especially important. Your subclass *expects* to do initialization in #initialize, which as you correctly point out is coupled with .new. Calling self.class.new is the correct way to create a subclass instance. You should expect that when methods operate on your object, which may be a subclass, that you get back a subclass instance. The only way to do this robustly is to respect LSP. It's an odd thing to argue against LSP. :)

  • Joeri Samson

    Object#initialize does not take any parameters, so under your reasoning it would not be possible to have any class requiring parameters to initialize/new without breaking the Liskov Substitution Principle.

    I don't think that's very realistic.

  • Mason

    You could always just supply a default value to the parameters eg. def initialise(size = 0)

  • http://identi.ca/pcproschool Elisha Ellanson

    The lord IS Great

  • http://DatabaseTypemysql Steve Harvard

    Not too long ago, I did not give plenty of thought to making comments on site page reports and have left comments even less. Reading by way of your enjoyable posting, will support me to do so sometimes.

  • http://www.beatmusicmaking.com Ayana Kloster

    SoundClick.com – You will find all of the beats you need there, millions to choose from. Most are for free download or are good prices to lease. Look for Johnny Juliano’s

  • http://www.americansoundconnection.com Lakenya Wittler

    It is nice that you blogged about this. I found you on google and I had been searching for info about this. Nice blog, thank you for the info.

  • http://citizen428.net/archives/1062 Rubinius has a new fan « citizen428.blog()

    [...] to use the weekend for emptying out my Instapaper account a little. Doing so I finally read “Rubinius wants to help YOU make Ruby” better on the Engine Yard blog. This reminded me that it’s been over a year since I [...]

  • http://smokelesscigarettesonline.wordpress.com Galen Keats

    I was doing some research and came across this blog site. Have to say that this information is great! Keep it up. I will be following your sites