Sunday, August 23, 2009

Single-Process Google Chrome

I was running out of RAM at work one day and realized that Google Chrome actually takes up a considerable amount of memory. Most of this is due to creating several renderers, sometimes resulting in one process per tab. The Chromium core supports running all renderers in one thread (simply pass --single-process at the command line) but unfortunately this is an unsupported and disabled mode in Google Chrome (see this message on the chromium-dev mailing list). But this wasn't going to stop me...

The first thing to do was to figure out how --single-process is implemented and how it is disabled in Google Chrome official builds. The first link in the mailing list posting points us in the right direction:

From src/chrome/app/chrome_dll_main.c:
  bool single_process =
#if defined (GOOGLE_CHROME_BUILD)
// This is an unsupported and not fully tested mode, so don't enable it for
// official Chrome builds.
false;
#else
parsed_command_line.HasSwitch(switches::kSingleProcess);
#endif
if (single_process)
RenderProcessHost::set_run_renderer_in_process(true);

What we want to do is change the value of single_process after it is initialized to false but before the if. Unfortunately, here this is impossible because the compiler has optimized away that variable! This snippet of code actually does not even appear in chrome.dll since the compiler has also optimized away the call to RenderProcessHost::set_run_renderer_in_process.

We are lucky though, because single_process is never used again so if we can emulate a call to RenderProcessHost::set_run_renderer_in_process(true), we're golden. Digging deeper, that method does the following:

Google Code Search finds src/chrome/browser/renderer_host/render_process_host.h:
  static void set_run_renderer_in_process(bool value) {
run_renderer_in_process_ = value;
}

..with run_renderer_in_process_ a static bool in RenderProcessHost. This means we'll find it like a global variable with storage in .data.

Now, of course, we don't have symbols for chrome.dll so we have to figure out a way of finding the variable's location in memory. This can be done by looking for other references to this variable, fingerprinting those and using them to find run_renderer_in_process_. In this particular case, we need to look for the accessors of this field, which the compiler has also inlined.

There are a few references scattered across the codebase, but remember that we are looking for ones that are easy to fingerprint and unlikely to change across versions. This means uncommon constants, code sequences, etc. I chose two places: BrowserRenderProcessHost::BrowserRenderProcessHost and RenderProcessHost::ShouldTryToUseExistingProcessHost.

First, let's look at BrowserRenderProcessHost::BrowserRenderProcessHost:
BrowserRenderProcessHost::BrowserRenderProcessHost(Profile* profile)
: RenderProcessHost(profile),
[...]
if (run_renderer_in_process()) {
// We need a "renderer pid", but we don't have one when there's no renderer
// process. So pick a value that won't clash with other child process pids.
// Linux has PID_MAX_LIMIT which is 2^22. Windows always uses pids that are
// divisible by 4. So...
static int next_pid = 4 * 1024 * 1024;
next_pid += 3;
SetProcessID(next_pid);
}

Now this is a very nice snippet because we will have the constant (4 * 1024 * 1024) somewhere in .data and a reference to it from an ADD instruction with "3" as the second operand. This is fairly uncommon so we'll start looking for this one. (Open up IDA.) We come up with this (in chrome.dll 3.0.195.6 11,844,080 bytes):

.text:021749A8                 cmp     byte_2659CEC, 0
.text:021749AF jz short loc_21749C5
.text:021749B1 add dword_2639A58, 3

Here run_renderer_in_process_ is at byte_2659CEC.

Now I wanted to be on the safe side and confirm this with another reference. One of the references to run_renderer_in_process_ is in RenderProcessHost itself:

From src/chrome/browser/renderer_host/render_process_host.cc:
bool RenderProcessHost::ShouldTryToUseExistingProcessHost() {
size_t renderer_process_count = all_hosts.size();
[...]
return run_renderer_in_process() ||
(renderer_process_count >= GetMaxRendererProcessCount());
}

Remember that that getter is inlined. In fact, GetMaxRendererProcessCount is inlined too. Let's look at the source for that as well to get an idea of what to expect.

From chrome/browser/renderer_host/renderer_process_host.cc:
size_t GetMaxRendererProcessCount() {
[...]
static const size_t kMaxRenderersByRamTier[] = {
3, // less than 256MB
6, // 256MB
9, // 512MB
12, // 768MB
14, // 1024MB
[...]
};

static size_t max_count = 0;
if (!max_count) {
size_t memory_tier = base::SysInfo::AmountOfPhysicalMemoryMB() / 256;
if (memory_tier >= arraysize(kMaxRenderersByRamTier))
max_count = chrome::kMaxRendererProcessCount;
else
max_count = kMaxRenderersByRamTier[memory_tier];
}
return max_count;
}

Going back to the original boolean expression in ShouldTryToUseExistingProcessHost, we expect to see a CMP on run_renderer_in_process_, followed at some point by an indexed access into kMaxRenderersByRamTier. The reason I chose this is kMaxRenderersByRamTier is very easy to find and unlikely to change across versions. Anyway, the assembly ends up looking like this:

.text:02038775                 cmp     byte_2659CEC, 0
.text:0203877C push esi
.text:0203877D mov esi, dword_2636D08
[...]
.text:020387A9 loc_20387A9: ; CODE XREF: sub_2038775+2Dj
.text:020387A9 mov eax, ds:dword_25CF518[eax*4]

So if we find kMaxRenderersByRamTier (dword_2636D08 here), we can find the reference to it, then look in its neighborhood for a "CMP m32,0".

And of course it would be nice if a computer could do this automatically for us :) Luckily there's an awesome Python module called pydbg that lets you write debugging scripts in Python.

Check out the finished product: chromehack.py.

So.. what have we learned?
  • Beware of compiler optimizations
  • If you cannot cause your target to execute a particular piece of code, try to emulate all the side-effects of that code
  • Don't give up! Anything and everything is possible in assembly :)
Until next time..

-Cat

Wednesday, June 10, 2009

Google Chrome Rocks! (or: Facebook Message Forensics)

There's a little trick I sometimes pull when my mouse accidentally slips on the back button while I was writing something on a web page (don't ask). Before I show you a demo, I want to explain what's going on behind the scenes so you can understand what I'm doing.

When you navigate away from a page, behind the scenes one of two things happen: either the browser frees the memory associated with the page, or it keeps the rendered page in a cache in case you decide to revisit it. Even when the memory is freed, the data usually stays around for a little while longer, at least until there have been enough free()'s to call the OS allocator. (Remember that the CRT is an allocation cache on top of the OS allocator.) Either way, if you've got a long enough pattern, you can just search for it in the process' memory space. It's crude, but it sometimes works. And here's how it's done:

First Step: Get the page's PID from Chrome's about:memory page
This is why this post is called "Google Chrome Rocks!" Normally this step is painful with Internet Explorer and Firefox because all pages are in the same (main) process, so they all become unresponsive when I debug the process. Chrome allows me to debug a single page, while still browsing in the other tabs.

(Yes, I have a lot of tabs open.)

Second Step: Fire up WinDbg and attach to that process
If you don't have WinDbg installed, use Process Explorer to suspend the process (right-click, Suspend). You could even get Visual Studio attached in the interim, but you will have to also suspend with Process Explorer during the hand-off to WinDbg.

Go to File | Attach to a Process... (F6) and choose the process:
Once WinDbg has attached, you're safe -- the process is suspended so its memory won't be reclaimed by the OS.

Third Step: Look for a pattern in what you were writing
It can be as simple as a word like "gliding". Of course, this will entirely depend on what you were writing.

For example, use the WinDbg command "s -a".

In my case, I can see that the last match is the right one. Use "db" to work your backwards to the beginning of the string.

Remember that your string might be stored in Unicode and try different patterns.

Fourth Step: Extract the recovered string
The text is too long for "da", but WinDbg has a ".printf" command. Don't be afraid to use the manual to come up with crazy techniques like this one. Sadly, the comma is not documented.

Fifth Step: Paste into notepad

Once this is done, you can release the process from the debugger. Make sure to use Debug | Detach Debuggee so you don't kill the process.

Then simply hit the Forward button and paste your text back into the editor :)

What would be really fun is writing a driver that can freeze the entire system and search physical memory for a particular pattern. This would catch even memory freed by the OS. Unfortunately, under Windows, VirtualFree'd memory is zero'ed by a background kernel thread, so the window of opportunity could be quite small.

Until next time,

Good night.

Saturday, April 11, 2009

Missing Ingress Qdisc

If you're trying to set up shaping on your Linux box (either manually through "tc qdisc add dev ppp0 handle ffff: ingress", or with The Wonder Shaper), you may run into this error:
RTNETLINK answers: No such file or directory

It means that you haven't compiled the "ingress" queueing discipline in your kernel (either built-in or module). Here's where to find it:

  • Option name: CONFIG_NET_SCH_INGRESS
  • menuconfig: Networking support ---> Networking options ---> QoS and/or fair queueing ---> Ingress Qdisc
  • Module location: net/sched/sch_ingress.ko

That is all.

Wednesday, March 04, 2009

Teach Google a Lesson!

Google is a fantastic search engine. The first thing I do in the morning is Gmail. Without Google Calendar, my life would be a mess. But sometimes, one human can do a better job than CONFIDENTIAL computers.

The PIC16F688 has been the focus of my attention for the last few months. In particular, I needed to read about its EUSART (aka UART) peripheral for a school project. Who do I ask? Google of course.

Sure enough, the second search result is the datasheet. But here's what Google didn't tell you: it's not the latest revision of the datasheet. This sucked particularly hard in my case because I had already printed a hardcopy of the relevant part of the documentation.

Luckily, Google has recently introduced a feature that lets me amend that. It's called SearchWiki and it rocks! I was tempted to also click on the "X" button to tell Google that that's the wrong answer to my query. I tried to tell Google what the right answer was by searching with my original keywords ("pic16f688"), plus some keywords that bring up the correct answer ("41203D"):

I clicked the green "Promote" icon, but alas, that result doesn't show up in the result page for just the original keywords ("pic16f688").

Nevertheless, SearchWiki lets me make sure that I never make that same mistake again. At least until Microchip releases a new revision of the datasheet... ;-)

-Cat

Sunday, February 15, 2009

Glassfish and Intermediate SSL Certificates

About a week ago, EngSoc bought a wildcard SSL certificate for our online properties. We've been running with self-signed certificates for a few years now and our users have had to put up with those security warnings.

One of the properties we wanted to secure with this certificate was a Glassfish installation that the engineering society uses for elections. I installed the certificate and confirmed it worked in Internet Explorer and Chrome -- at the time I didn't have Firefox installed on my computer so I didn't give it much thought.. but when I started getting reports that the warning was still popping up in Firefox, I had to investigate.

There are a couple of aggravating circumstances that caused this to happen -- Murphy must have been having a field day.
  1. Internet Explorer and Chrome access the same set of root CA certificates, as managed by the Windows Crypto API. Firefox uses its own certificate store.
  2. The server cert we bought is issued by an intermediate CA. This means that the server cert is not directly certified by the most commonly distributed root CA certs. Any server using this cert must send the intermediate cert along with the server cert, or else the browser cannot build a trust path from the server CA to one of the root CAs.
  3. For some reason, the Windows CA store happens to have the intermediate CA cert, while the Firefox store does not. For example, Windows could simply have a newer set of certs, but the reason is immaterial for this discussion. The consequence is that IE/Chrome do not show a warning, while FF does:


  4. Apache's mod_ssl can be configured to send this intermediate CA using the SSLCertificateChainFile directive. Glassfish does not have documentation for this, though it is possible. Keep reading.
First, we start off with a keystore.jks that contains your private key and CA-issued cert. Presumably you got to this point by generating a key pair, sending a certificate request and importing the certificate into your store.

# keytool -keystore keystore.jks -storepass changeit -list -v

Keystore type: JKS
Keystore provider: SUN

Your keystore contains 1 entry

Alias name: https
Creation date: Jan 30, 2009
Entry type: PrivateKeyEntry
Certificate chain length: 1
Certificate[1]:
Owner: CN=*.engsoc.org, OU=EngSoc Project, O=Carleton University, L=Ottawa, ST=Ontario, C=CA
Issuer: CN=DigiCert Global CA, OU=www.digicert.com, O=DigiCert Inc, C=US
Serial number: 780a3e6ee2948ad9a36c504fcf2c4c1
Valid from: Wed Jan 28 19:00:00 EST 2009 until: Tue Feb 02 18:59:59 EST 2010
Certificate fingerprints:
         MD5:  23:35:C5:7F:E8:77:55:4E:CE:47:FA:8E:18:E8:F0:9C
         SHA1: 9E:8E:E1:44:C8:02:4A:07:2B:6E:E9:59:34:B9:46:7B:56:AC:CE:E3
         Signature algorithm name: SHA1withRSA
         Version: 3

[...]

Note "Certificate chain length: 1". This is the server cert only and is not sufficient for Firefox to generate a trust path to one of its root CA certs. We can see that Glassfish indeed sends only this cert during the SSL handshake:

Our goal is to add the intermediate CA to this key entry and make the chain length be 2: the server cert and the intermediate CA cert.

We first need to create an X.509 file that contains both certs. Luckily, this is as simple as concatenating the two certs:
# cat certs/DigiCertCA.crt certs/star_engsoc_org.crt > certs/DigiCertCA+star_engsoc_org.crt

And now we import them both, overwriting the cert associated with the key entry:
# keytool -keystore keystore.jks -storepass changeit -importcert -alias https -file certs/DigiCertCA+star_engsoc_org.crt -noprompt
Certificate reply was installed in keystore

Now we look at the key entry again:
# keytool -keystore keystore.jks -storepass changeit -list -v

Keystore type: JKS
Keystore provider: SUN

Your keystore contains 1 entry

Alias name: https
Creation date: Feb 15, 2009
Entry type: PrivateKeyEntry
Certificate chain length: 2
Certificate[1]:
Owner: CN=*.engsoc.org, OU=EngSoc Project, O=Carleton University, L=Ottawa, ST=Ontario, C=CA
Issuer: CN=DigiCert Global CA, OU=www.digicert.com, O=DigiCert Inc, C=US
Serial number: 780a3e6ee2948ad9a36c504fcf2c4c1
Valid from: Wed Jan 28 19:00:00 EST 2009 until: Tue Feb 02 18:59:59 EST 2010
Certificate fingerprints:
         MD5:  23:35:C5:7F:E8:77:55:4E:CE:47:FA:8E:18:E8:F0:9C
         SHA1: 9E:8E:E1:44:C8:02:4A:07:2B:6E:E9:59:34:B9:46:7B:56:AC:CE:E3
         Signature algorithm name: SHA1withRSA
         Version: 3

[...]

Certificate[2]:
Owner: CN=DigiCert Global CA, OU=www.digicert.com, O=DigiCert Inc, C=US
Issuer: CN=Entrust.net Secure Server Certification Authority, OU=(c) 1999 Entrust.net Limited, OU=www.entrust.net/CPS incorp. by ref. (limits liab.), O=Entrust.net, C=US
Serial number: 4286aba0
Valid from: Fri Jul 14 13:10:28 EDT 2006 until: Mon Jul 14 13:40:28 EDT 2014
Certificate fingerprints:
         MD5:  FB:14:1E:91:00:CA:CB:77:D8:01:62:D8:8C:B8:84:48
         SHA1: 25:B7:8E:B9:36:A4:00:CE:34:13:1D:9A:6D:E8:BE:A0:4B:34:76:07
         Signature algorithm name: SHA1withRSA
         Version: 3

[...]

Indeed, there are now two certs for this key and Glassfish will now send them both during the SSL handshake:


Note also that, after connecting to the server once, Firefox added the intermediate CA cert (DigiCert Global CA) to its cert store:


Things get even more interesting when you have multiple servers using this cert. Our Apache server was properly configured to send both certs, so when a Firefox user went to an Apache site, Firefox would add the intermediate CA cert to its store. Subsequent visits to the Glassfish server would show no warnings. However, if someone visited the Glassfish server first, they would get the warning. This behaviour is reproducible by simply deleting the "DigiCert Global CA" cert from Firefox's store (it's under "Entrust.net").

Well, I hope this helped. It definitely solved our problem.

-Cat

Saturday, January 24, 2009

To Lock or Not to Lock?

Over the weekend, I was at CUSEC 2009 and an interesting computer science problem occured to me. Let me know what you think.

Consider a multi-threaded application with (at least) two threads: the main thread and a message-based worker thread.

Consider also an indexed ADT and consider a synchronized iteration through the elements of this enumeration (shown for clarity in Java without loss of generalization), running in the main thread:

List list = getList();
synchronized (list) {
    for (int i = 0; i < list.size(); i++) {
        T element = list.get(i);
        process(element);
    }
}

And consider a process(T element) method that sends a message to the worker thread to process this element. This worker thread, in turn, processes the element by accessing the original enumeration in a synchronized fashion. For example:

private void processInWorkerThread(T element) {
    List list = getList();
    sychronize (list) {
        // For example, get the index
        // within the list:
        // int index = list.indexOf(element);
        // setResult(index);
        respondToMainThread();
    }
}

When the enumeration is run, this algorithm will cause a deadlock. The main thread is waiting for the background thread to unblock, which will only happen when the main thread releases its lock on the list, which will never happen. This is deadlock.

Spotting a deadlock is one thing. Fixing a deadlock is another. And fixing a deadlock in a clean, future-proof, backwards-compatible way is practically an extinct species.

Clearly there is something wrong with the above design. Point out the flaws, what the options are, how you would fix them, and why you chose the solution that you chose. I welcome clarifying questions regarding the context or scope of the application.

Let the commenting begin!