Thursday, December 29, 2005

Bug Hunting

Instead of completing the paging implementation, I got distracted with a couple of crashing bugs that showed up while testing Haiku, and they kept me busy for the last few days. Strangely enough, I could reproduce each bug easily only on different systems.

At least, I could now run Haiku with BitmapDrawing and Pulse in the background for over an hour (after which I shut it down myself). While playing around with it, I found some weird behaviour with the Backgrounds application which I am currently working on. If those problems can be fixed quickly, I will have a look at enabling Deskbar add-ons and replicants under Haiku.

After this hiatus, I will continue to work on the virtual memory manager. I hope that I'll can finish the bigger part of the work this year, so that I can complete it easily after my official employment period ends (only 2 days are left!).

Monday, December 19, 2005

Back To The Kernel

It's not that the app_server is ready and polished or anything close - but it's in an acceptable state. For now, my main focus is back in the kernel, although I'll come back to the app_server from time to time in the next days and weeks.

I am currently looking into getting paging support for Haiku. That's the feature you know by the term "virtual memory" or "swapping". Plain and simple it makes Haiku support more memory than you have installed in your computer. When the RAM is full Haiku will utilize the hard disk as an additional backing store.

But why would I start working on this before Haiku even runs stable? One reason is indeed to increase the system stability: currently, it's almost impossible to let the system run out of memory. Am I contradicting myself here? It looks like I do, granted, but let me try to explain what I mean. No, you don't have to know. If you like, you can just skip to the next paragraph. Yeah, you don't have to read that one either if you don't like :-))

Anyway, when an area of memory is allocated, the memory is not really taken from the system memory - it's just reserved. Only the memory you are really using is actually taken from the system page pool (where one page is an architecture dependent amount of memory, usually it's 4096 bytes), we call this "committing memory". Especially with binaries, the areas created for them are usually much larger than what finally makes its way into the memory, certain functionality that you don't use of your web browser or debug info aren't loaded into memory to safe space and time.
So theoretically, we could always promise more memory than we can actually deliver. Just think about the main stack area in BeOS: it's a 16 MB area per application. You don't need that many fingers to figure out how many applications you could run if the system would be entirely honest with you (well, at least a few years ago you would have been successful doing so). So yes, it's lying. If every application would actually need its whole stack, the system would need to stop them before memory becomes really tight. This technique is known as "overcommitting" - the system pretends to have what you might need, because it assumes that you won't need it.
Therefore, it shouldn't lie to you that often, it should choose these occasions wisely. Haiku only uses this for stacks. For everything else it makes sure it can deliver the memory it had previously promised to you. This can result in "out of memory" situations even though there are plenty of free pages left - the problem with those pages is that they are promised to someone else. They may still get used for system caches and the like, but they are unavailable to applications.

And that's where the swap file comes in: having some extra space in there, the system can promise much more memory, and thus, can actually use up those pages for real, it can really have no free pages left. In other words, to run out of memory (to be able to test the kernel in these situations) it needs to at least think it has a swap file.

The Haiku swap file implementation will be anything but spectacular, but it'll hopefully work good enough for our target audience - the usual desktop user don't have that heavy requirements there. On the other hand, it will probably work better than the one in BeOS - at least I can hardly imagine how it could run that badly :-)

Wednesday, December 14, 2005

MTRR

Sure, you too! Since Stephan made a BDirectWindow based version of our app_server that directly uses the hardware frame buffer and acceleration features, we noticed that it felt much faster there than on real hardware. How could that be?

The reason is actually very simple. Parts of our rendering pipeline like text output isn't optimized to use 32/64-bit memory access - that means it doesn't make full use of the memory bus. While we'd like to change this for the future, Intel introduced a functionality called write-combining in something like 1998 that is supposed to optimize write access to something like a frame buffer. Instead of directly writing the bytes back to the buffer instantly, the CPU waits until you have written 32 sequential bytes, and then writes them back at once, in a single burst. Enabling write-combining is therefore even a good idea if you already have a optimized your graphics output, although the effect is less noticeable in that case.

This brings us back to MTRR, "memory type range register", just in case you asked whatever that would be :-) Using them, you can specify that the CPU should access a part of memory in a specific way - like write-combining, but there are other options, too. In BeOS and Haiku, they can only be specified in the map_physical_memory() call (via the B_MTR_* flags). Graphics drivers usually try to map their frame buffer in write-combining mode, at least all of ours do that, so they are directly benefiting from the new functionality.

MTRR is a CPU dependent feature that is programmed using the machine state registers. Luckily Intel and AMD uses the exact same mechanism here, and thus, we support it for processors of both vendors. We'll make sure that it is supported for other brands like VIA or Transmeta as well.

Even though the app_server has lots of potential optimizations left, it already feels pretty well now. You can still manage to lock it up, but those problems should go away soon.

Monday, December 05, 2005

app_server progress

In the last couple of days, I reworked the workspace and modal/floating window code of the app_server. But that work got interrupted for the weekend: you know, I don't work on the weekend. Nah, that's not it. Actually, Stephan Aßmus finished prototyping the new clipping code for the app_server.

That leads to some interesting changes, and should gain a noticeable amount of speed, especially on multi CPU machines. Before, all objects on the screen, and even the screen itself had a common base class (Layer) and were managed by the Layer subclass RootLayer.
While the original idea of having a common base class sounds nice, it doesn't work out in reality. RootLayer used to be the mother of everything, and hence, was a quite bloated piece of code. When you moved a window, the RootLayer would have locked the screen, and then computed the new clipping for all affected layers on the screen, updated the window borders (Decorators), and triggered a redraw of the BViews. During that time, no other window could draw on the screen. When you moved a window around the screen, you would have been able to experience lots of frame drops in a window playing some video (if that would have worked yet :-)).

With the new clipping code, the RootLayer is completely gone, and the Desktop thread takes over its job (I already started this transition before, but it's about perfect now). And it only cares about the windows visible on screen, the WindowLayers - it doesn't really know what a ViewLayer is (that's now the counterpart of a BView in an application). When you now move a window, the Desktop will lock the screen, compute the visible regions of the affected windows (note, only the windows, not the views they contain), and will notify the window that it must be updated. A video running in another window is halted for such a short time that you should not notice any frame drops during that time.

The window can then recalculate the visible regions of its layers and trigger a redraw of them. And, you guessed it, the Desktop and the windows run in different threads. That means (unlike before) that every window can draw whatever they have to do without affecting other windows at all. Since windows are now more independent, it also noticeably simplifies the code.

In the last few days, we made good progress integrating the prototype Stephan had written (in src/tests/servers/app/newerClipping), and we'll probably complete that work in the next couple of days.