Jesse Storimer's Blog

August 16, 2013

Industrial strength signal handling in Ruby

This is a video clip from the latest edition of my Unix fu workshop, an online course about Unix programming for Ruby folk. Give it a watch to see part of the fun where we do industrial-strength signal handling (deferred signal handlers and the self-pipe trick) for a pre-forking web server we built. If you like it, sign up for the waiting list for the next edition the workshop.



 •  0 comments  •  flag
Share on Twitter
Published on August 16, 2013 04:50

July 8, 2013

Unix fu workshop: August edition + giveaway

I just opened registration for an August edition of my online Unix system programming course. I  announced the first edition of the class a few weeks ago, but apparently I scheduled it right in the middle of summer vacation time. Thanks to everyone who got on the mailing list and let me know, this is for you!


The August edition is happening August 12 & 13 from 1pm to 5pm ET on both days. Early bird tickets are $459 until August 1.


To mark the occasion, I'm giving away a free ticket to the workshop + a copy of a great Unix book.


Update: The contest is now closed. Thanks to everyone who entered!


The grand prize

There will be one grand prize winner who will receive:


1 free ticket to the Unix fu workshop: This is my online Unix programming course for Ruby developers. All of the information is on the website, so head there to learn more about it.



A free copy of The Linux Programming Interface: This is a fantastic Unix system programming reference manual, and my favourite source of Unix lore. I've always got my copy handy right on my desk. I'll ship a hard-copy right to the winner's door.


What if I already have a ticket?

If you've already signed up for the class and you win the big prize, I'll happily refund you. Same goes for you folks who want to get a seat in the class. No need to wait and see if you win the contest, get your spot now and if you win, I'll give you your money back.


Good luck!



 •  0 comments  •  flag
Share on Twitter
Published on July 08, 2013 01:35

June 20, 2013

Nobody understands the GIL - Part 3: Thread Safety

There are some misconceptions in the Ruby community about this question surrounding MRI's GIL. If you only take one thing away from this article today, let it be this: The GIL does not make your Ruby code thread-safe.



But you shouldn't take my word for it.


This series started off just trying to understand what the GIL is at a technical level. Part 1 explained how race conditions could occur in the C code that's used to implement MRI. Yet, the GIL seemed to eliminate that risk, at least for the Array#<< method we looked at.


Part 2 confirmed that the GIL did, in fact, make MRI's native C method implementations atomic. In other words, these native implementations were free from race conditions. These guarantees only applied to MRI's native C functions, not to the Ruby that your write. So we were left with a lingering question:



Does the GIL provide any guarantee that your Ruby code will be thread-safe?



Read the full article on RubyInside.com.


 •  0 comments  •  flag
Share on Twitter
Published on June 20, 2013 22:12

June 19, 2013

[Screencast] Faster Rails test runs...with Unix!

Anybody who works on a moderate size Rails app can probably tell you: it takes forever to run a single test. I've definitely experienced this with some of the big Rails projects I've worked on, and it sucks! Goodbye productivity.


Some smart people have been advocating ways around this. But if you've got an existing app, seeing the full benefit of these techniques involves some pretty serious rearchitechting. Not an option for most of us.


Today I'm going to show you a technique to speed up those test runs that doesn't involved touching your app code. In this screencast, I took the rubygems.org codebase, a big Rails app, and cut the time to run a unit test in half. When I started, it took a little over 5 seconds to run the UserTest. With the tool that I build in 20 mins in the screencast, the same test run took just under 2 and a half seconds. Yay for Unix!


If you want to avoid my blathering at the beginning and skip right to the code, jump to the 2:30 mark.



Random bit of trivia. If you listen right to the end, around the 20:45 mark you can hear me stumble on a word. If you turn the volume up, you'll hear the coyotes howling that I was hearing outside my window!


The code from the screencast is available  in a gist.


Registration just opened for my Unix fu online workshop. Want to level up your system programming and learn more of the awesome things you saw in this video? Join me in class on July 18-19.


 •  0 comments  •  flag
Share on Twitter
Published on June 19, 2013 07:54

June 14, 2013

Nobody understands the GIL - Part 2: Implementation


Last time, I began wanting to take you on a deep dive into MRI to see how the GIL is implemented. But first, I wanted to make sure I was asking the right question. Part 1 formulated the question, but today we'll look for answers inside MRI itself. We'll go looking for that elusive creature they call the GIL


In the first draft of this post, I really emphasized the C code that underlies the GIL, showing as much of it as I could. But after a while, the message was drowning in the details. I went back and reworked things so that you now see less C code with more explanations and diagrams. But for the source code divers, I'll at least mention the C functions, so you can go track them down.


Last time on...

Part 1 left off asking these two questions:



Does the GIL guarantee that array << nil is atomic?
Does the GIL guarantee that your Ruby code will be thread-safe?

The first question is answered by looking at the implementation, so let's start there.


This snippet was spinning us around last time:



array = []

5.times.map do
Thread.new do
1000.times do
array << nil
end
end
end.each(&:join)

puts array.size


If you assume that Array is thread-safe, the expected result would be an Array with 5000 elements. Since Array isn't thread-safe, implementations like JRuby and Rubinius produce an unexpected result; something less than 5000. It's the interaction, and switching between, multiple threads that corrupted the underlying data.


MRI produces the expected result, but is it a fluke or a guarantee? Let's begin our technical deep dive with a snippet of Ruby code to study.



Thread.new do
array << nil
end


From the top 

To study what happens with this snippet, we need to look at how a thread is spawned inside MRI. We'll mostly be looking at functions from the thread*.c files in MRI. There's lots of indirection in these files to support both the Windows and Posix thread APIs, but all of the functions we look at come from those source files.


The first underlying operation of Thread.new is to spawn a new native thread to back the Ruby thread. The C function that becomes the body of the new thread is called thread_start_func_2. Here's a look at this function at a high-level.


 


There's a lot of boilerplate code here that you don't need to see. I highlighted the parts of the function that we care about. Near the top, this new thread acquires the GIL. Remember that this thread remains idle until it actually owns the GIL. Near the middle of the function, it calls the block that you pass to Thread.new. After wrapping things up, it releases the GIL and exits the native thread.


In our snippet, this new thread is spawned from the main thread. Given this, we can assume that the main thread is currently holding the GIL. This new thread will have to wait until the main thread releases it before it can continue.


Let's look at how that happens when this new thread tries to acquire the GIL.



static void
gvl_acquire_common(rb_vm_t *vm)
{
if (vm->gvl.acquired) {
vm->gvl.waiting++;
if (vm->gvl.waiting == 1) {
rb_thread_wakeup_timer_thread_low();
}

while (vm->gvl.acquired) {
native_cond_wait(&vm->gvl.cond, &vm->gvl.lock);
}



This is a snippet from the gvl_acquire_common function. This function is called when our new thread tries to acquire the GIL.


First, it checks if the GIL is currently acquired. If it is, then it increments the waiting attribute of the GIL. With our snippet, this value should now be 1. The very next line checks to see if waiting is 1. It is, so the next line triggers a wakeup of a timer thread.


This timer thread is the secret sauce that keeps MRI's threading system humming along, and keeps any one thread from hogging the GIL. But before we jump too far ahead, let's illustrate the state of things with the GIL, then introduce this timer thread.



I've said a few times that an MRI thread is backed by a native OS thread. This is true, but this diagram suggests that each MRI thread is running in parallel on its native thread. The GIL prevents this. We need to draw in the GIL to make this more realistic.


 



When a Ruby thread wants to execute code in its native thread, it must first acquire the GIL. The GIL mediates access between a Ruby thread and its underlying native thread, severely reducing parallelism! In the previous diagram, the Ruby threads could execute code in their native threads in parallel. This second diagram is closer to reality for MRI, only one thread can hold the GIL at any given time, so parallel execution of code is completely disabled.


 



For the MRI core team, the GIL protects the internal state of the system. With a GIL, they don't require any locks or synchronization around the internal data structures. If two threads can't be mutating the internals at the same time, then no race conditions can occur.


For you, the developer, this will severely limit the parallelism you get from running your Ruby code on MRI.


The timer thread

I said that the timer thread is what keeps one thread from hogging the GIL. The timer thread is just a native thread that exists internally in MRI; it has no equivalent Ruby thread. The timer thread is started up when MRI starts up with the rb_thread_create_timer_thread function.


When MRI boots up and only the main thread is running, the timer thread sleeps. But remember, once one thread is waiting on the GIL, it wakes up the timer thread.



This is closer to how the GIL is implemented in MRI. Harking back to the snippet that kicked this off, the thread on the far right is the one we just spawned. Since it's the only thread waiting for the GIL, it wakes up the timer thread.


This timer thread is what keeps a thread from hogging the GIL. Every 100ms, the timer thread sets an interrupt flag on the thread currently holding the GIL using the RUBY_VM_SET_TIMER_INTERRUPT macro. It's important to note the details here because this will give us the clue as to whether or not array << nil is atomic.


If you're familiar with the concept of timeslices, this is very similar.


Every 100ms the timer thread sets an interrupt flag on the thread currently holding the GIL. Setting an interrupt flag doesn't actually interrupt the execution of the thread. If that were the case, we could be certain that array << nil was not atomic.


Handling the interrupt flag

Deep in a file called vm_eval.c is the code for handling Ruby method calls. It's responsible for setting up the context for a method call and calling the right function. At the end of a function called vm_call0_body, right before it returns the return value of the current method, these interrupts are checked.


If this thread's interrupt flag has been set, then it stops execution on the spot, before returning its value. Before executing any more Ruby code, the current thread will release the GIL and call sched_yield. sched_yield is a system function prompting the thread scheduler to schedule another thread. Once this has been done, this interrupted thread attempts to re-acquire the GIL, now having to wait for another thread to release it.


Oh hey, this answers our question. array << nil is atomic. Thanks to the GIL, all Ruby methods implemented in C are atomic.


So this example:



array = []

5.times.map do
Thread.new do
1000.times do
array << nil
end
end
end.each(&:join)

puts array.size


is guaranteed to produce the expected result every time when run on MRI.


But keep in mind that this guarantee isn't spelled out in the Ruby code. If you take this code to another implementation with no GIL, it will produce an unexpected result. It's good to know what the GIL guarantees, but it's not a great idea to write code that depends on it. In doing so, you basically put yourself in a vendor lock-in situation with MRI.


Similarly, the GIL is not a public API. It has no documentation and no specification. There's Ruby code out there that implicitly depends on the GIL, but the MRI team has talked before about getting rid of the GIL or changing its semantics. For these reasons, you certainly don't want to be writing code that depends on the current behaviour of the GIL.


Non-native methods

All I've said so far is that array << nil is atomic. This is an easy one because you have the Array#<< method taking a parameter that is a constant. There's only one method invocation in this expression and it's implemented in C. If it's interrupted during its course, it will simply continue until finished, then release the GIL.


What about something like this?



array << User.find(1)


Before the Array#<< method can execute, it must evaluate the expression on the right hand side so it can pass its value as a parameter. So User.find(1) has to be invoked first. And as you know, User.find(1) will call lots of other Ruby code inside its implementation.


So, methods implemented with Ruby code have no atomicity guarantees on MRI. Only methods implemented with native C code have this guarantee.


So, does that mean Array#<< is still atomic in the above example? Yes, but only once the right hand side has been evaluated. In other words, the User.find(1) method will be invoked with no atomicity guarantee. Then it's return value will be passed to Array#<<, which retains its atomicity guarantee.


Update: @headius wrote an excellent comment that expands on what guarantees the GIL provides. If you've read this far, consider it required reading!


What does it all mean?

The GIL makes method invocations atomic. What does this mean for you?


In part 1, I showed what could happen to an example C function when a context switch happened right in the middle of it. With a GIL, that situation is no longer possible. Rather, if that context switch were to happen, the other thread would remain idle waiting for the GIL, giving the current thread a chance to continue uninterrupted. This behaviour is only applicable when MRI implements the Ruby method in C.


This behaviour eliminates a host of race conditions that would otherwise happen inside MRI and need to be guarded against. From this perspective, the GIL is strictly an internal implementation detail of MRI. It keeps MRI safe.


But there's still a lingering question that wasn't answered. Does the GIL provide any guarantee that your Ruby code will be thread-safe?


This is an important question for anyone using MRI, and if you're familiar with multi-threaded programming in other environments, you probably already know that the answer is a resounding no. But this article is long enough, I address this question more thoroughly in part 3.


 •  0 comments  •  flag
Share on Twitter
Published on June 14, 2013 07:14

June 12, 2013

Nobody understands the GIL

Throughout most of my time in the Ruby community, MRI's infamous GIL has been an elusive creature for me. This is a story about thread safety, and finally catching that elusive creature to get a good look at it.



The first time I heard mention of the GIL, it had nothing to do with how it worked, what it did, or why it existed. I only heard about that it was silly because it restricted parallelism or that it was great because it made my code thread-safe. In time, I've gotten more comfortable with multi-threaded programming, and realized that the world is more complicated than that.


I wanted to know, at a deep technical level, how the GIL worked. Only, there's no specification for the GIL, and no documentation. It's essentially unspecified behaviour; an MRI implementation detail. The Ruby core team makes no promises about how it will work or what it guarantees.


But I may be getting ahead of myself.


If you're completely unfamiliar with the GIL, here's my 30 second explanation:


MRI has something called a global interpreter lock (GIL). It's a lock around the execution of Ruby code. This means that in a multi-threaded context, only one thread can execute Ruby code at any one time.

So if you have 8 threads busily working on a 8-core machine, only one thread and one core will be busy at any given time. The GIL exists to protect Ruby internals from race conditions that could corrupt data. There are caveats and optimizations, but this is the gist.

The problem

Back in 2008, Ilya Grigorik gave me a high-level understanding of the GIL with Parallelism is a Myth in Ruby. Even as I learned more about multi-threaded programming in Ruby, that high-level understanding stuck with me. Heck, I recently wrote a book about multi-threading in Ruby, and this high-level understanding of the GIL is all I had.


The problem with having just this high-level understanding is that I can't answer the deep technical questions. Specifically, I want to know if the GIL provides any kind of guarantee about the thread-safety of certain Ruby operations. Let me demonstrate.


Appending to arrays is not thread-safe

Very few things are implicitly thread-safe in Ruby. For example, appending to arrays:



array = []

5.times.map do
Thread.new do
1000.times do
array << nil
end
end
end.each(&:join)

puts array.size


Here there's 5 threads sharing one Array object. Each thread pushes nil into the Array 1000 times. So in the end, there should be 5000 elements in the Array, right?



$ ruby pushing_nil.rb
5000

$ jruby pushing_nil.rb
4446

$ rbx pushing_nil.rb
3088


:(


Even this trivial example exposes an operation in Ruby that's not implicitly thread-safe. Or is it? What really happened here?


Notice that MRI produced the correct result, 5000. Both JRuby and Rubinius produced an incorrect result. If you were to run it again, you would probably see MRI give the correct result again, with JRuby and Rubinius giving different incorrect results.


These inconsistent results are because of the GIL. Because MRI has a GIL, even when there are 5 threads running at once, only one thread is active at a time. In other words, things aren't truly parallel. JRuby and Rubinius don't have a GIL, so when you have 5 threads running, you really have 5 threads running in parallel across the available cores.


On the parallel Ruby implementations, the 5 threads are stepping through code that's not thread-safe. They end up interrupting each other and, ultimately, corrupting the underlying data.


How multiple threads can corrupt data

How is this possible? I thought Ruby had our back, right? In lieu of preferring technical details over high-level explanations (that seems to be the theme here) I'll show you how this is technically possible.


Whether you're using MRI, JRuby, or Rubinius, the Ruby language is implemented on top of another language. MRI is implemented in C, JRuby in Java, and Rubinius in a mixture of Ruby and C++. So when you have this single operation in Ruby:



array << nil


that actually expands to dozens or hundreds of lines of code in the underlying implementation. For example, here's the implementation of Array#<< from MRI:



VALUE
rb_ary_push(VALUE ary, VALUE item)
{
long idx = RARRAY_LEN(ary);

ary_ensure_room_for_push(ary, 1);
RARRAY_ASET(ary, idx, item);
ARY_SET_LEN(ary, idx + 1);
return ary;
}


Notice at least 4 distinct underlying operations.



Get the current length of the Array.
Check if there's room for another element in the Array.
Append the element to the Array.
Set the length property of the Array to the old value + 1.

Each of these operations calls yet other functions or macros. The reason I'm bringing this up is to show you how multiple threads can corrupt data. In a single-threaded context, you can look at this short C function and easily trace the path of the code.


In other words, we're used to stepping through code in a linear fashion and reasoning about the state of 'the world'. That's often how we write code.


When multiple threads are involved, this is no longer possible. It's as if the rules of physics change. When there are two threads, each thread is tracing its own path through the code. Now you have to maintain 2 (or more) 'pointers' to where each thread is currently executing. Since threads share the same memory space, both threads can be mutating the state of 'the world' at the same time.


It's possible for one thread to interrupt another, change the state of things right from underneath its feet, then have the original proceed, completely unaware that the state of things has changed.


This is the reason that some of the Ruby implementations produced incorrect results when simply appending to an Array. A situation like the following happened.


Here's the base state of our little system here.



There are two active threads, both about to enter this function. Consider steps 1-4 to be pseudo-code for the implementation of Array#<< from MRI you saw earlier. Once both threads enter this function, here's a possible sequence of events, beginning with Thread A.



This looks more complicated, but just follow the directional arrows to take you through the flow of what happens here. I've added little labels at each step to reflect the state of things from the perspective of each thread.


This is just one possible sequence of events.


So what happens is that Thread A starts down the usual sequential path of this function, but when it gets to step 3, it hits a context switch! This pauses Thread A right where it is. Then Thread B takes over and it runs through the entire function, appending its element and incrementing the length attribute.


Once Thread B is done, Thread A is resumed. It picks up exactly where it left off. Remember, Thread A was paused right before it incremented the length attribute, so it goes ahead and increments the length attribute. Only, it doesn't know that Thread B came and changed the state of things right out from under it.


So Thread B set the length to 1, then Thread A set the length to 1, even though both appended their element to the Array. This data has become corrupted. That's why there's a little lightning strike beside the last step in Thread A.


But I thought Ruby had our back?

This sequence of events is what could lead to incorrect results as shown with JRuby or Rubinius from the trivial example way back. 


Except, in the case of JRuby and Rubinius, things are even more complicated because the threads can actually run in parallel. In this diagram, one thread is paused while the other progresses, with true parallelism, both threads can progress at the same time.


If you actually ran the example from earlier against one of these implementations, you would've seen that the incorrect result is always different. The context switch that happens here is non-deterministic, it's not predictable. It's possible that it happens earlier in the function, or later, or not at all. The next section will tell you more about this.


So, why doesn't Ruby shield us from this madness? For the same reason that core collections in other programming languages don't offer thread safety guarantees: it's expensive. It's possible for all of the Ruby implementations to provide thread-safe data structures, but that requires extra overhead that would make single-threaded code slower.


Rather than make this tradeoff, the onus is on you, the developer, to provide the thread safety guarantees when you need them.


This leaves two open questions for me, and we still haven't dived into the technical details of the GIL.



If this context switch is possible, why does MRI still produce the correct answer?
What the heck is this context switch?

Question #1 is the reason that I started writing this article. A high-level understanding of the GIL can't answer this question. That high-level understanding makes it clear that only one thread will be executing Ruby code at one time. But can a context switch still happen in the middle of a function underlying Ruby? What are the GIL semantics?


But first...


It's the scheduler's fault!

The context switch comes from the operating system thread scheduler. In all of the Ruby implementations that I showed output from, one Ruby thread is backed by one native, operating system thread. The operating system has to ensure that no single thread can hog all of the available resources, like CPU time, so it implements scheduling so that each thread has fair access to the resources.


This manifests as a series of pauses and resumes. Each thread gets a turn to consume resources, then it's paused in its tracks so another thread can have a turn. In time, this thread will be resumed, on and on again.


This is efficient for the operating system, but introduces a degree of randomness and a new angle of correctness to your programs. For example, the Array#<< operation now needs to be aware that it may be paused at any time, and another thread may perform the same operation in parallel, changing the state of 'the world' right under its feet.


What to do? Make key operations atomic.


If you want to be sure that this kind of interruption between threads can't happen, you should make the operation atomic. By making it atomic, you ensure that it can't be interrupted until it's finished. This would prevent our example from being interrupted in step #3 and, ultimately, prevent it from corrupting the data when it resumes at step #4.


The simplest way to make an operation like this atomic is to use a lock. This code is guaranteed to produce the correct result every time on MRI, JRuby, and Rubinius.



array = []
mutex = Mutex.new

5.times.map do
Thread.new do

mutex.synchronize do
1000.times do
array << nil
end
end

end
end.each(&:join)

puts array.size


It's guaranteed to be correct because it uses a shared mutex or lock. Once a thread enters the block of code inside the mutex.synchronize block, all other threads must wait until it's finished before they can enter the same code. If you think back to the beginning, I said that many lines of C code may underlie this operation, and the thread scheduler could schedule a context switch between any two such lines.


By making an operation like this atomic, you guarantee that if a context switch occurs inside this block, other threads will not be able to proceed into the same code. The thread scheduler will see this and switch again to another thread. This also guarantees that no other thread can come along and change the state of 'the world' from right underneath its feet. This example is now thread-safe.


The GIL is a lock too

I've just shown how you can use a lock to make something atomic and provide a thread-safety guarantee. The GIL is also a lock, so does it guarantee that all of your Ruby code is thread-safe? Does the GIL make array << nil atomic?


This article has gotten longer enough for one sitting. In part 2 I look deeper into MRI's GIL implementation to answer that question.


 •  0 comments  •  flag
Share on Twitter
Published on June 12, 2013 09:38

June 11, 2013

New release of Working With Ruby Threads!

I'm happy to report that I've made a BIG update to Working With Ruby Threads. Previous buyers always get free updates (you should have already received an email from me with the news and instructions).



The update weighs in at about 40 pages of new content, with more diagrams and charts, and lots of clarifications and small improvements.

Here's a laundry list of the changes, starting with the most important. If you want to pick up a copy, make sure you scroll to the bottom and check out the discount available this week only.




New chapters on real world code. I wasn't happy with the hands-on chapter at the end of the book because it lacked a real-world edge. So that chapter's been removed and replaced by two new chapters that walk through real-world multi-threaded code from the Puma web server and the Sidekiq project. 
Updated chapter on mutexes. This chapter got two new additions to introduce you to some deeper aspects of synchronization. One covers the concept of memory visibility between threads, which wasn't covered in the original release. The other addition is a section about deadlocks, what they look like and how you can avoid them.
New appendices on advanced topics. The book now includes two appendices, one about atomic updates and the other about immutability and how it relates to thread-safety. Neither of these approaches are supported in Ruby proper, but both are really powerful techniques that are taking hold in other concurrency-friendly languages, like Clojure for instance. But there's nothing stopping us from doing the same things in Ruby. Check these out when you feel you've got a solid understanding of the basics.
Better code examples. Many of the code examples in the original release were based on code that you would never write in your day-to-day work. Where I could, I replaced these with slightly more realistic examples.
The example code is now tested against JRuby 1.7.4, Rubinius 2.0.0-rc1, and MRI 2.0.0-p195.
Introduced the concept of atomicity earlier in the book, in the first discussion of thread-safety.
Fixed chart rendering issues on both the .epub and .mobi versions. The .epub now has real code formatting too.
Updated the dedication to include my second daughter! She was born shortly after the book was released.

Head over to the product page to buy a copy!


To celebrate the new release, I'm offering the same discount that I offered at launch time: if you buy the middle package (the one with the expert interviews), I'll give you a free copy of Working With Unix processes. This discount is good until Thursday (June 13). Click here to take advantage of this deal.


 •  0 comments  •  flag
Share on Twitter
Published on June 11, 2013 06:49

June 4, 2013

Self publishing AND traditional publishing?

Today my friend and fellow self-publisher Pat Shaughnessy announced that his excellent self-published title will be re-released in the fall backed by a publisher.


After waving the self-publishing eBook banner last year, I’ve gone over to the dark side. This Spring I decided to work withNo Starch Press to bring Ruby Under a Microscope to print. We plan to have it available online and in bookstores sometime this Fall.

I appreciate his comment about the dark side, but, these days, there's lots of overlap between self-publishing and traditional publishing, with lots of books existing in both worlds. This is a good thing. I've seen many examples where the two blend nicely, including my own experience.



@jstorimer self-publishing gave me the time and freedom to figure out what i wanted to do


— Pat Shaughnessy (@pat_shaughnessy) June 4, 2013


My experience

I've self-published three books now. You can buy them from me, but you can also buy them from PragProg. PragProg is a great publisher with a great reputation and an audience bigger than mine. I'm lucky to work with them.


Though I self-published, I still get to reap some of the benefits of a traditional publisher. After a few months of self-publishing I contacted PragProg to see if they were interested in distributing my work. The best part about this? I didn't go to them with an idea and predictions, I went to them with a finished manuscript and sales numbers to prove that I really had something valuable. 


Sure, I took the risk. Nobody paid me an advance, but this gave me the freedom to write the book how I wanted, on my schedule, and do some major tweaking based on feedback after the launch. All of these things are historically tricky with a traditional publisher. PragProg even offered the services of an editor to work on the book with me, but I declined so I could focus on other projects.


The best part about all this is that PragProg has non-exclusive rights my work, so I've continued to sell directly and do my self-publishing thing while getting the benefits of being listed as a PragProg title.


I didn't have to choose between self-publishing and traditional publishing, I got benefits from both.


Other Examples

Pat's book is the latest example of a niche tech book that started as a self-published title, then moved into the world of traditional publishing.


Learn You a Haskell For Great Good and Learn You Some Erlang For Great Good were both published online for free to begin with before No Starch worked with them to provide dead-tree and ebook editions.


Rails Test Prescriptions and 3D Game Programming For Kids were both available as self-published titles before being made available as official PragProg titles. The Prags have also distributed a number of self-published works without re-branding.


Chris Guillebeau self-published a number of titles before writing two books with a publisher.


I'm sure there are more, let me know in the comments which I forgot.


Not every book will find a home in both worlds, but some will. Other times, self-publishing can lead to a traditional publishing deal. Self-publishing by no means confines you from crossing the line into traditional publishing, you don't have to choose.


 •  0 comments  •  flag
Share on Twitter
Published on June 04, 2013 11:42

May 30, 2013

Is lock-free logging safe?

This is a bit of an odd question without some context.


Recently, I saw the mono_logger project. It claims to provide "a lock-free Logger for Ruby 2.0". When I saw it I thought "Cool! I want to see what tricks they used to implement this". It looked pretty straightforward at first. When I looked closer and compared it to Logger found in Ruby's stdlib, I saw that it was the exact same, minus the mutex.


This led me to asking the question, is lock-free logging safe? Ruby probably put a mutex there for a reason. What if multiple threads try to log a message at the same time? Will their log messages be interleaved like this?



A programmer had a problem. He thought to himself, "I know, I'll solve it with threads!". has Now problems. two he


— Davidlohr Bueso (@davidlohr) January 8, 2013




When I started looking around for the answer to this question, I realized that the overarching questions I was asking was: are calls to write(2) atomic? In other words, if two threads or two processes write to the same file at the same time, are the writes atomic? Or can they be interleaved?


Unix beard too short...

This is a deep Unix question. When I posed this question on the mono_logger bug tracker, I admitted that my Unix beard wasn't long enough to know the answer. So I turned to my favourite source of Unix lore: The Linux Programming Interface.



It had the answers I was looking for. There are a few bits of knowledge that lead to the answer to this question.


For starters, all system calls are atomic. From TLPI:


"All system calls are executed atomically. By this, we mean that the kernel guarantees that all of the steps in a system call are completed as a single operation, without being interrupted by another process of thread."

Great! So if two threads or two processes write to a file at the same time, we can be sure their data won't be interleaved. This confirms that we don't really need a mutex around the call to write(2) to ensure that it's atomic, the kernel already guarantees that.


But there's one problem. Typically when writing to a file (from C-land), you need to seek to the location in the file where you want to write, then perform the actual write to the file descriptor. Unfortunately, this involves two system calls, lseek(2) and write(2). Individually they're atomic, but as two operations that's not the case. Let me demonstrate.


As pseudo-code, this might look like:



lseek(fd, 0, SEEK_END); // seek to the end of the file
write(fd, "log message", len); // perform the write


This isn't safe in a multi-threaded situation. Since there's no atomicity guarantees across these two system calls, the following situation is possible:



Thread A seeks to the end of the file
-- the thread scheduler decides to switch context to Thread B --
Thread B seeks to the end of the file
Thread B writes its data
-- thread scheduler switches context back to Thread A --
Thread A has already done its lseek, so it doesn't do it again, it just performs its write

d'oh, Thread A is no longer pointing to the true end of the file, it's pointing to the previous end of the file, the location where Thread B wrote its message. When Thread A performs its write, it overwrites whatever Thread B wrote.


So with this approach, it's possible to lose data.


But there's a solution!



In order for the seek and the write to happen atomically, you can set the O_APPEND flag when opening the file. Then any writes performed will atomically be appended to the file. This is exactly what we want in the case of a shared logfile. Both Logger and MonoLogger use this option when opening the logfile.



open(filename, (File::WRONLY | File::APPEND))


So MonoLogger should be able to append to the logfile atomically without a mutex.


Update: @jacknagel raised some good points on the MonoLogger issue tracker. Most notably that write(2) doesn't guarantee that all of the data you pass it will be written. It returns the number of bytes written. This is for cases where the call is interrupted by the arrival of a signal. 


Update #2: Eric Wong got in touch and confirmed that MonoLogger's behaviour is safe. His messages are posted to the usp.ruby email list.


PIPE_BUF_MAX?

In the github issue, I raised a point about something called PIPE_BUF_MAX. I won't say much about it here, because it's not relevant when writing to files. Essentially, a write(2) of any size is guaranteed to be atomic when the target is a file. However, when writing to a pipe, your write(2) is only guaranteed to be atomic if it's less than PIPE_BUF_MAX.


I'm interested to see what happens with MonoLogger in the wild. Anyone out there using it yet?


 •  0 comments  •  flag
Share on Twitter
Published on May 30, 2013 07:33

May 26, 2013

How many threads is too many?

This is a sample chapter from my ebook Working With Ruby Threads. If you're curious about all the talk in the community about multi-threaded concurrency, this book will give you a gentle introduction so you can join the conversation.



This question is relevant whether you're whipping up a quick script to scrape some websites, trying to speed up a long-running calculation, or tuning your multi-threaded webserver.


I hope you're not surprised, but the answer is: well, that depends.


The only way to be sure is to measure and compare. Try it out with 1 thread per CPU core, try it out with 5 threads per CPU core, compare the results, move forward.


However, there are some heuristics you can use to get started. Plus, if you're new to this, it's good to have a ballpark idea of what's sane. Is 100 threads too many? How about 10,000?


ALL the threads

How about we start by spawning ALL the threads? How many threads can we spawn?



1.upto(10_000) do |i|
Thread.new { sleep }
puts i
end


This code attempts to spawn 10,000 sleeping threads. If you run this on OSX you'll get about this far:



2042
2043
2044
2045
2046
ThreadError: can't create Thread (35)


On OSX, there's a hard limit on the number of threads that one process can spawn. On recent versions, that limit is somewhere around 2000 threads.


However, we were able to spawn at least that many threads without the system falling down around us. As mentioned earlier, spawning a thread is relatively cheap.


If I run this same bit of code on a Linux machine, I'm able to spawn 10,000 threads without blinking. So, it's possible to spawn a lot of threads, but you probably don't want to.


Context Switching

If you think about it for a minute, it should be clear why you don't want to spawn 10,000 threads on a 4-core CPU.


Even though each thread requires little memory overhead, there is overhead for the thread scheduler to manage those threads. If you only have 4 cores, then only 4 threads can be executing instructions at any given time. You may have some threads blocking on IO, but that still leaves a lot of threads idle a lot of the time, requiring overhead to manage them.


But even in the face of increased overhead for the thread scheduler, it does sometimes make sense to spawn more threads than the number of CPU cores. Let's review the difference between IO-bound code and CPU-bound code.


IO-bound

Your code is said to be IO-bound if it is bound by the IO capabilities of your machine. Let me try that again with an example.


If your code were doing web requests to external services, and your machine magically got a faster network connection, your program would likely be sped up as well.


If your code were doing a lot of reading and writing to disk, and you got a new hard drive that could seek faster, your code would likely be sped up as well.


These are examples of IO-bound code. In these situations, your code typically spends some time waiting for a response from some IO device, and the results aren't immediate.


In this case, it does make sense to spawn more threads than CPU cores.


If you need to make 10 web requests, you're probably best off to make them all in parallel threads, rather than spawning 4 threads, making 4 requests, waiting, then making 4 more requests. You want to minimize that idle time spent waiting.


Let's make this concrete with some code.


In this example, the task is to fetch http://nytimes.com 30 times. I tested this using different numbers of threads. With one thread, 30 fetches will be performed serially. With two threads, each thread will need to perform 15 fetches, etc.


I tested different numbers of threads in order to find the sweet spot. Make no mistake, there will be a sweet spot between utilizing available resources and context switching overhead.



require 'benchmark'
require 'open-uri'

URL = 'http://www.nytimes.com/'
ITERATIONS = 30

def fetch_url(thread_count)
threads = []

thread_count.times do
threads << Thread.new do
fetches_per_thread = ITERATIONS / thread_count

fetches_per_thread.times do
open(URL)
end
end
end

threads.each(&:join)
end

Benchmark.bm(20) do |bm|
[1, 2, 3, 5, 6, 10, 15, 30].each do |thread_count|
bm.report("with #{thread_count} threads") do
fetch_url(thread_count)
end
end
end


I plotted the results across the studied Ruby implementations to get this graph illustrating the results:



I ran this benchmark on my 4-core Macbook Pro.


As expected, all of the implementations exhibited similar behaviour with respect to concurrent IO.


The sweet spot for code that is fully IO-bound is right around 10 threads on my machine, even though I tested up to 30 threads. Using more than 10 threads no longer improves performance in this case, just uses more resources. For some cases, it actually made performance worse.


If the latency of the IO operations were higher, I might need more threads to hit that sweet spot, because more threads would be blocked waiting for responses more of the time. Conversely, if the latency were lower, I might need less threads to hit the sweet spot because there would be less time spent waiting, each thread would be freed up more quickly.


Finding the sweet spot is really important. Once you've done this a few times, you can probably start to make good guesses about how many threads to use, but the sweet spot will be different with different IO workloads, or different hardware.


CPU-bound

The flip side of IO-bound code is CPU-bound code. Your code is CPU-bound if its execution is bound by the CPU capabilities of your machine. Let's try another example.


If your code needed to calculate millions of cryptographic hashes, or perform really complex mathematical calculations, these things would be bound by how much throughput your CPU could muster in terms of CPU instructions.


If you upgraded to a CPU with a faster clock speed, your code would be able to do these calculations in a shorter time period.


Let's re-use the example from the last chapter that calculated digits of pi. That's certainly a CPU-bound bit of code. Here's the benchmark code:



require 'benchmark'

require 'bigdecimal'
require 'bigdecimal/math'

DIGITS = 10_000
ITERATIONS = 24

def calculate_pi(thread_count)
threads = []

thread_count.times do
threads << Thread.new do
iterations_per_thread = ITERATIONS / thread_count

iterations_per_thread.times do
BigMath.PI(DIGITS)
end
end
end

threads.each(&:join)
end

Benchmark.bm(20) do |bm|
[1, 2, 3, 4, 6, 8, 12, 24].each do |thread_count|
bm.report("with #{thread_count} threads") do
calculate_pi(thread_count)
end
end
end


This graph illustrates the results:



Note: JRuby seems to have an issue with calculating that many digits of pi. Its results weren't comparable, so I left them out.


This graph has a bit of a different shape than the last one. Again, the absolute values aren't important here; the shape is the important part.


For MRI, there's no curve this time. Performance isn't impacted with the introduction of more threads. This is a direct result of the GIL. Since the GIL is a lock around the execution of Ruby code, and in this case we're benchmarking execution of Ruby code, more threads has no effect.


For Rubinius, the curve was lowest when running with 4 threads. I ran this example locally on my 4-core CPU. That's no coincidence.


CPU-bound code is inherently bound by the rate at which the CPU can execute instructions. So when I have my code running in parallel across 4 threads, I'm utilizing all of the power of my 4-core CPU without any overhead.


Once you introduce more than 4 threads, it's still possible for you to fully utilize all of your CPU resources, but now there's more overhead. That's why we see the curve going back up, the thread scheduler is incurring overhead for each extra thread.


The takeaway from all this is that each thread you spawn incurs overhead to manage it. More threads isn't always necessarily faster. On the other hand, introducing more threads improved performance in these two examples by anywhere between 100% and 600%. Finding that sweet spot is certainly worth it.


So... how many should you use for your code?

I showed some heuristics for code that is fully IO-bound or fully CPU-bound. In reality, your application is probably not so clear cut. Your app may be IO-bound in some places, and CPU-bound in other places. Your app may not be bound by CPU or IO, it may be memory-bound, or simply not maximizing resources in any way.


A Rails application is a prime example of this. Between communicating with the database, communicating with the client, and calling out to external services, there's lots of chances for it to be IO-bound. On the other hand, there are things that will tax the CPU, like rendering HTML templates, or transforming data to JSON documents.


So, as I said at the beginning of this chapter, the only way to a surefire answer is to measure. Run your code with different thread counts, measure the results, and then decide. Without measuring, you may never find the 'right' answer.


 •  0 comments  •  flag
Share on Twitter
Published on May 26, 2013 14:19