Saturday, January 24, 2009

To Lock or Not to Lock?

Over the weekend, I was at CUSEC 2009 and an interesting computer science problem occured to me. Let me know what you think.

Consider a multi-threaded application with (at least) two threads: the main thread and a message-based worker thread.

Consider also an indexed ADT and consider a synchronized iteration through the elements of this enumeration (shown for clarity in Java without loss of generalization), running in the main thread:

List list = getList();
synchronized (list) {
    for (int i = 0; i < list.size(); i++) {
        T element = list.get(i);
        process(element);
    }
}

And consider a process(T element) method that sends a message to the worker thread to process this element. This worker thread, in turn, processes the element by accessing the original enumeration in a synchronized fashion. For example:

private void processInWorkerThread(T element) {
    List list = getList();
    sychronize (list) {
        // For example, get the index
        // within the list:
        // int index = list.indexOf(element);
        // setResult(index);
        respondToMainThread();
    }
}

When the enumeration is run, this algorithm will cause a deadlock. The main thread is waiting for the background thread to unblock, which will only happen when the main thread releases its lock on the list, which will never happen. This is deadlock.

Spotting a deadlock is one thing. Fixing a deadlock is another. And fixing a deadlock in a clean, future-proof, backwards-compatible way is practically an extinct species.

Clearly there is something wrong with the above design. Point out the flaws, what the options are, how you would fix them, and why you chose the solution that you chose. I welcome clarifying questions regarding the context or scope of the application.

Let the commenting begin!

3 comments:

Anonymous said...

Hey man, I dig your blog. Chantale sent me the link.

Anyway, in this specific example, the "list" referred to by your main method is different than the "list" in processInWorkerThread() is declared locally and allocated separately. So, they don't refer to the same thing and should not deadlock.

I actually don't know much about Java, but have had to deal with my share of syncronization problems with the Linux kernel. In TLK, they are *very* conscious of which functions are allowed to take which locks, and if a function must take multiple locks, they are done in a very specific order (for example, process A will not try to take locks x and y, while process B tries to take locks y and x). This all seems a bit clunky. I'm sure there's a nicer way to do it. I wonder what others have to say about it.

-dsc
david.scott.clawson AT gmail

Eric Savoie said...

I'm not sure I understand the problem...

What is the point of having a worker thread when it is never allowed to run concurrently with the main thread anyway?

I would just put all that work in a single thread thus avoiding any possibilities of deadlock.

Catalin Patulea said...

@dsc:
Thanks for the compliment -- hopefully what this blog lacks in quantity I make up in quality ;-)

In fact, "list" by definition refers to the same object -- otherwise the code wouldn't semantically make sense.

Experiment with this Eclipse project: http://vv.carleton.ca/~cat/code/locker.zip

@Eric:
I like your thinking. In any real situation the first thing I ask myself is: is there a simpler way to do X? Keep it simple, stupid!

However, let's assume for the sake of discussion that collapsing the algorithm into the main thread is not an option. For example, we may want to be able to take advantage of multicore processors by launching multiple worker threads. Perhaps, even, this application is distributed on a cluster and the "lock" is actually a distributed lock.

In retrospect, I think I forgot to ask myself one question when I wrote this code: what is the purpose of synchronization in this case? I didn't give it much thought past "when in doubt, use a lock!"

It's helpful to think of this problem as a readers-writers problem. It becomes clear then, that the lock being acquired by the main thread as well as the worker threads is a read lock. Since the main thread has already acquired a read lock on the list, there is no reason the worker thread (a reader as well) should have to wait to, well, read the list.

I won't go into details on the actual implementation of the fix, because so many have done so before me. I think what's important is to recognize well-known problems in our own work and not reinvent the wheel (often incorrectly).