Complex
Ruby concepts
simplified

Matt Aimonetti
RubyConf 2011

Ruby App: Cross Section

Matt Aimonetti's Ruby cross section

90%
of our work
is on the surface

that's OK!

why should you care?

Source: Abhisek Sarda

Matt Aimonetti as Sam Wheat

Source: Sam Wheat

We won't cover

what you should know

  • Tools
  • Testing
  • Model layer
  • Controller layer
  • View layer
  • Yoga
  • How to be successful & good looking

We will cover

what you don't have to know

  • Lexer / Tokenizer / Parser
  • Virtual Machine
  • C Extensions
  • Concurrency
  • Memory Management
Ruby 1.9 only!
Source: Joeph Ducreux
Source: vxla
Source Code:
puts 'Hello RubyConf'
Tokenized representation:
["puts", " ", "'", "Hello RubyWorld", "'"]
Lexed representation:
[[[1, 0], :on_ident, "puts"],
 [[1, 4], :on_sp, " "],
 [[1, 5], :on_tstring_beg, "'"],
 [[1, 6], :on_tstring_content, "Hello RubyWorld"],
 [[1, 21],:on_tstring_end, "'"]]
              

Lexer format: [line number, column], type, token]

Tokenizer / Lexer

Lexed representation:
[[[1, 0], :on_ident, "puts"],
 [[1, 4], :on_sp, " "],
 [[1, 5], :on_tstring_beg, "'"],
 [[1, 6], :on_tstring_content, "Hello RubyWorld"],
 [[1, 21],:on_tstring_end, "'"]]
              
Abstract Syntax Tree (AST):
[:program,
 [[:command,
   [:@ident, "puts", [1, 0]],
   [:args_add_block,
    [[:string_literal,
      [:string_content, [:@tstring_content, "Hello RubyWorld", 
                                                          [1, 6]]]]],
    false]]]]

Parser

Implementation

ruby syntax parser
  keywords        : reserved keywords
    -> lex.c      : automatically generated
  parse.y
    -> parse.c    : automatically generated
            

Look at how Ruby does it

require 'ripper'
require 'pp'
src = "puts 'Hello RubyWorld'"
puts "source: #{src}"
puts "tokenized:"
pp Ripper.tokenize(src)
puts "Lexed:"
pp Ripper.lex(src)
puts "Parsed:"
pp Ripper.sexp(src)
            

Ruby uses Lex (lexer) & Bison (parser generator) but exposes the steps via Ripper

Virtual Machine

Koichi Sasada (@Ko1)

Source: Tim Bray

vm: compiler

Evaluates the AST and creates bytecode

vm: interpreter

Executes the bytecode

Also
handles concurrency
& extension libraries

vm Implementation

  compile.c
  eval.c
  eval_error.c
  eval_jump.c
  eval_safe.c
  insns.def           : definition of VM instructions
  iseq.c              : implementation of VM::ISeq
  thread.c            : thread management and context switching
  thread_win32.c      : thread implementation
  thread_pthread.c    : ditto
  vm.c
  vm_dump.c
  vm_eval.c
  vm_exec.c
  vm_insnhelper.c
  vm_method.c

  opt_insns_unif.def  : instruction unification
  opt_operand.def     : definitions for optimization 

VM Implementation (suite)

  dmyext.c
  dmydln.c
  dmyencoding.c
  id.c
  inits.c
  main.c
  ruby.c
  version.c

  gem_prelude.rb
  prelude.rb
  

Parts making the Ruby implementation

C extensions

Most common case:

Use existing C libs (libxml, DB driver..)

Define Ruby world objects

#include <ruby.h>
VALUE mNokogiri ;
VALUE mNokogiriXml ;
VALUE mNokogiriXmlSax ;

mNokogiri         = rb_define_module("Nokogiri");
mNokogiriXml      = rb_define_module_under(mNokogiri, "XML");
mNokogiriXmlSax   = rb_define_module_under(mNokogiriXml, "SAX");

rb_const_set(mNokogiri, rb_intern("LIBXML_ICONV_ENABLED"), Qfalse);
            

Expose C functions as Ruby methods

#include <ruby.h>
VALUE rConf = rb_define_module("RubyConf");
rb_define_singleton_method(rConf, "bonjour", c_bonjour, 0);

static VALUE c_bonjour(VALUE self) {
  return rb_str_new2("bonjour RubyConf!");
}
            

Teach Ruby how to manage memory

rb_define_alloc_func(cMysql2Client, allocate);

static VALUE allocate(VALUE klass) {
  VALUE obj;
  mysql_client_wrapper * wrapper;
/* current class, C data type, function pointer call when marked,
* function pointer called with when freed, pointer to the data 
*/
  obj = Data_Make_Struct(klass, mysql_client_wrapper, 
   rb_mysql_client_mark, rb_mysql_client_free, wrapper);
  wrapper->encoding = Qnil;
  wrapper->active = 0;
  wrapper->reconnect_enabled = 0;
  wrapper->closed = 1;
  wrapper->client = (MYSQL*)xmalloc(sizeof(MYSQL));
  return obj;
}

static void rb_mysql_client_mark(void * wrapper) {
  mysql_client_wrapper * w = wrapper;
  if (w) {
    rb_gc_mark(w->encoding);
  }
}

static void rb_mysql_client_free(void * ptr) {
  mysql_client_wrapper *wrapper = (mysql_client_wrapper *)ptr;
  nogvl_close(wrapper);
  xfree(ptr);
}

Challenges

concurrency

execute code in parallel. i.e:
1 process -> many concurrent web requests

Example:
simple client/server

Dummy client

class Client
  def query(id)
    Server.dispatch(self, id)
  end
  def reply(id, response)
    print "Response: #{id} | "
  end
end

Server implementation

module Server
  module_function
  def dispatch(client, id)
    client.reply(id, fake_response(id))
  end
  def fake_response(id)
    response = ""
    n = id.even? ? id+1 : id*999
    n.times do
      response << ("a".."z").to_a[rand(26)]
    end
    response
  end
end

10.times{|n| Client.new.query(n) }

# Response: 0 | Response: 1 | Response: 2 | Response: 3 | Response: 4 |
# Response: 5 | Response: 6 | Response: 7 | Response: 8 | Response: 9 |

Sequential responses

What's a thread?

Threaded server

Switch the Server#dispatch method from

def dispatch(client, id)
  client.reply(id, fake_response(id))
end

to:

def dispatch(client, id)
  Thread.new { client.reply(id, fake_response(id)) }
end

Client responses:

10.times{|n| Client.new.query(n) }
Thread.list.last.join

# Response: 0 | Response: 2 | Response: 1 | Response: 4 | Response: 6 | 
# Response: 8 | Response: 3 | Response: 7 | Response: 9 | Response: 5 |

Threaded responses

Threads aren't magical

A cpu can only execute 1 instruction at a time

Context switching

Ruby fair scheduler

Green threads
vs
native threads

Pros of green threads

Cons of green threads

Pros of native threads

Cons of threads

Fibers/Continuations

Manual scheduling of code execution

Why aren't threads more popular with Ruby developers?

Reasons:

Why?

Should we remove the GIL?

Other ways to achieve concurrency

memory management

Object declaration
==
object allocation


100.times{ "RubyConf" }

Allocates 100 string objects

{"location" => "New Orleans"}

Allocates 1 hash and 3 strings

class Foo; end; Foo.new

Allocates 1 node, 2 classes, 1 object

Garbage Collection prior to Ruby 1.9.3

What if:
there are no available slots?

What if:
the Garbage Collector can't free slots?

Garbage Collector in 1.9.3

Lazy sweeping

What if:
the freelist is empty?

What if:
all slots are marked after a full scan?

Different types of GCs, in the case of C Ruby's:

Multi-threaded, generational GC

Example: MacRuby

Stop the world, precise, moving, generational GC

Example: Rubinius

Reference counting

Example: CPython

Hybrid solution

Example: JVM

Tricks

Concrete effect on daily code

Generating the RDoc documentation takes about 80 seconds on my machine. 30% of that time is spent on GC.
Narihiro Nakamura
We burn 20% of our front-end CPU on garbage collection.
Evan Weaver, Twitter, Oct 2009

See for yourself using GC::Profiler & ObjectSpace.count_objects

GC::Profiler.enable
# your code
puts GC::Profiler.result
Index    Invoke Time(sec)       Use Size(byte)     Total Size(byte)         Total Object                    GC Time(ms)
    1               0.005               110600               393216                 9816         0.24300000000000016032
    2               0.006               110560               393216                 9816         0.40999999999999975353
    3               0.008               110680               393216                 9816         1.00400000000000133582
Total Object                    GC Time(ms)
        9816         0.24300000000000016032
        9816         0.40999999999999975353
        9816
GC.disable
ref = ObjectSpace.count_objects[:T_STRING]
10_000.times{|n| 'test' }
count = ObjectSpace.count_objects[:T_STRING] - ref
puts "#{count} new strings added to memory"
puts ObjectSpace.count_objects.inspect
"10026 new strings added to memory"
{:TOTAL=>24526, :FREE=>308, :T_OBJECT=>8, :T_CLASS=>478, 
:T_MODULE=>21, :T_FLOAT=>7, :T_STRING=>16295, :T_REGEXP=>24, 
:T_ARRAY=>985, :T_HASH=>16, :T_BIGNUM=>3, :T_FILE=>9, 
:T_DATA=>398, :T_MATCH=>108, :T_COMPLEX=>1, :T_NODE=>5846, 
:T_ICLASS=>19}

GC Stats Rack middleware

https://github.com/mattetti/GC-stats-middleware

GC run, previous cycle was 255 requests ago.
GC 40 invokes.
Index    Total Object                    GC Time(ms)
    1          101432        14.47700000000007314327
    2          101432        13.95699999999999718625
    3          101432        13.84699999999994268762
    4          101432        14.65799999999983782573
    5          101432        15.47099999999979047516
    6          101432        14.96900000000001007550
    7          101432        17.90399999999969793407
    8          101432        15.38599999999989975663
    9          101432        15.29500000000005854872
   10          101432        16.75899999999996836664
   11          101432        14.70199999999977080734

[60%] 14414 freed strings.
[12%] 2927 freed arrays.
[9%] 2268 freed big numbers.
[2%] 564 freed hashes.
[1%] 373 freed objects.
[5%] 1351 freed parser nodes (eval usage).

Questions?


Matt Aimonetti

Slides: http://rubyconf2011.merbist.com

Twitter: @merbist

Blog:

Google Profile

Resources

Garbage Collector

Resources

C extensions

Concurrency