[vbox-dev] Docs on how page fusion works?

Tue Apr 17 18:39:53 GMT 2012

Our current page fusion logic involves knowledge from within the guest 
as to what can be fused. Instead of pegging through the entire guest 
memory using some-sort of daemon/service and maintaining hashes and 
last-touch times of pages and comparing them, our guest additions 
(currently page fusion implemented only for Windows guests) gives hints 
about which pages are most likely candidates for fusion.

This saves a lot of time than sweeping the memory but it also means we 
will not be squeezing out every last bit. We made this trade-off 
decision because we felt this is a good approach for the fulfilling the 
objective.

A daemon on the guest runs which locates common system files/dlls/ro 
kernel memory etc. paged-in on the guest and reports the physical pages 
that can be deduplicated. We don't scan the guest memory actively 
looking for fusion candidates. If the guest touches the pages for write 
access that'll be marked as no longer a candidate. Because of 
contextually knowledge from within the guest, VirtualBox's page fusion 
identifies only long term fusion candidates that are very unlikely to be 
touched often.

That's just the broad overview.

Regards,
Ram

On 04/16/12 05:23 PM, Alexey Eromenko wrote:
>>
>> What kind if obstacles would I face if I tried to implemented the
>> same behavior (Scan processes) for Linux guests? I plan on scanning every
>> process then check the memory maps from /proc/<pid>/maps. If the permissions
>> are set to r only or rx then I'll register the pages with the host. This
>> wouldn't cover the process it self, but a major portion of
>> the wasted memory.
>>
>> Sounds simple (everything does these days) and I plan to work it. However, I
>> just need an expert to give me the go-ahead since this is all new to me.
>
> I think before undertaking such a massive effort, it pays off to
> compare existing (Open-Source) technologies: Linux KSM vs. VBox
> PageFusion.
>
> Why ?
> KSM *avoids* the need of developing guest-side drivers altogether.
> With KSM all mem dedup logic is done host-side-only, so all legacy and
> future OSes work out-of-the-box.
> VBox PageFusion requires GuestAdditions, which means developing and
> testing (!) drivers for lots of guest OSes and OS versions.
> Developing KSM-equivalent for VBox may pay off better than extending
> VBox PageFusion to Linux guests (this will require writing new Linux
> kernel drivers).
> KSM itself is Linux-host-only, so cannot be used directly. (While VBox
> PageFusion is Windows-guest-only ATM)
> KSM-like system will require only host-side development and testing.
>
> What needs to be considered ? (KSM-like approach vs. VBox PageFusion approach)
> 1. performance - how much CPU usage does it takes ?
> 2. speed convergence [related to 1.] - how much time does it take to
> find 1 GiB of RAM and dedup it ?
> 3. efficiency - how many pages were actually shared ?
> 4. any other advantages/disadvantages of both approaches.
>
> Disclaimer: I have NOT tested either solution. Just my 2 cents.