top of page

Multiple processes using the same dll: Tips and tricks for loading and unloading dynamic libraries

opovidooseg


I'm trying to figure out how an operating system handles multiple unrelated processes loading the same DLL/shared library. The OSes I'm concerned with are Linux and Windows, but to a lesser extent Mac as well. I presume the answers to my questions will be identical for all operating systems.




multiple processes using the same dll



1.) Do unrelated processes load the same DLL redundantly (that is, the DLL exists more than once in memory) instead of using reference counting? ( IE, into each process's own "address space" as I think I understand it )


if the DLL is unloaded as soon as a process is terminated, that leads me to believe the other processes using exact same DLL will have a redundantly loaded into memory, otherwise the system should not be allowed to ignore the reference count.


2.) if that is true, then what's the point of reference counting DLLs when you load them multiple times in the same process? What would be the point of loading the same DLL twice into the same process? The only feasible reason I can come up with is that if an EXE references two DLLs, and one of the DLLs references the other, there will be at least two LoadLibrar() and two FreeLibrary() calls for the same library.


The shared library or DLL will be loaded once for the code part, and multiple times for any writeable data parts [possibly via "copy-on-write", so if you have a large chunk of memory which is mostly read, but some small parts being written, all the DLL's can use the same pieces as long as they haven't been changed from the original value].


It is POSSIBLE that a DLL will be loaded more than once, however. When a DLL is loaded, it is loaded a base-address, which is where the code starts. If we have some process, which is using, say, two DLL's that, because of their previous loading, use the same base-address [because the other processes using this doesn't use both], then one of the DLL's will have to be loaded again at a different base-address. For most DLL's this is rather unusual. But it can happen.


The point of referencecounting every load is that it allows the system to know when it is safe to unload the module (when the referencecount is zero). If we have two distinct parts of the system, both wanting to use the same DLL, and they both load that DLL, you don't really want to cause the system to crash when the first part of the system closes the DLL. But we also don't want the DLL to stay in memory when the second part of the system has closed the DLL, because that would be a waste of memory. [Imagine that this application is a process that runs on a server, and new DLL's are downloaded every week from a server, so each week, the "latest" DLL (which has a different name) is loaded. After a few months, you'd have the entire memory full of this applications "old, unused" DLL's]. There are of course also scenarios such as what you describe, where a DLL loads another DLL using the LoadLibrary call, and the main executable loads the very same DLL. Again, you do need two FreeLibrary calls to close it.


If your DLL creates a named MemoryMappedFile (in memory or on disk) then the two applications can share the memory created by the DLL. Each application will have a different pointer to the shared memory but the memory will actually be shared. You have to use the same name for the shared memory and you're on your own as far being thread safe between processes. (Named semaphores or mutexes will work, CriticalSection will not.)


You can achieve what you're asking for by making use of OS services to explicitly set up a shared memory area that can be accessed by multiple processes. In Windows, this can be done by creating named shared memory objects, using a name that is known in advance by all participants. You can then typecast that memory block to a structure type and read and write fields in that memory area, and all processes that have a view onto that shared memory will see the same data.


Since multiple processes are running concurrently, you will also need to think about how the data in the shared memory area is updated. If multiple processes need to update a counter field or whatnot in the shared memory area then you need to implement thread safe practices around read and write of that data, such as interlocked increment or using a named mutex object as an exclusive access lock.


A potential disadvantage to using DLLs is that the application is not self-contained; it depends on the existence of a separate DLL module. The system terminates processes using load-time dynamic linking if they require a DLL that is not found at process startup and gives an error message to the user. The system does not terminate a process using run-time dynamic linking in this situation, but functions exported by the missing DLL are not available to the program.


Copy-on-write protection is an optimization that allows multiple processes to map their virtual address spaces such that they share a physical page until one of the processes modifies the page. This is part of a technique called lazy evaluation, which allows the system to conserve physical memory and time by not performing an operation until absolutely necessary.


For example, suppose two processes load pages from the same DLL into their virtual memory spaces. These virtual memory pages are mapped to the same physical memory pages for both processes. As long as neither process writes to these pages, they can map to and share, the same physical pages, as shown in the following diagram.


When multiple instances of the same Windows-based application are loaded, each instance is run in its own protected virtual address space. However, their instance handles (hInstance) typically have the same value. This value represents the base address of the application in its virtual address space. If each instance can be loaded into its default base address, it can map to and share the same physical pages with the other instances, using copy-on-write protection. The system allows these instances to share the same physical pages until one of them modifies a page. If for some reason one of these instances cannot be loaded in the desired base address, it receives its own physical pages.


DLLs are created with a default base address. Every process that uses a DLL will try to load the DLL within its own address space at the default virtual address for the DLL. If multiple applications can load a DLL at its default virtual address, they can share the same physical pages for the DLL. If for some reason a process cannot load the DLL at the default address, it loads the DLL elsewhere. Copy-on-write protection forces some of the DLL's pages to be copied into different physical pages for this process, because the fixes for jump instructions are written within the DLL's pages, and they will be different for this process. If the code section contains many references to the data section, this can cause the entire code section to be copied to new physical pages.


DLLs will not be shared. If two processes use the same DLL each process has a own copy and the stored data of variabled declared in the DLL will not interfere the values of the dll of the other process.


For EXEs, there is no concern about two EXEs overlapping since they would never be loaded into the same process. There would be nothing wrong with loading the first instance of an EXE at 0x400000 and the second instance at 0x500000, even if the image is larger than 0x100000 bytes. Windows just chooses to share code among multiple instances of a given EXE.


Observe that even if a Windows system must ensure that multiple instances of one DLL or EXE all get loaded at the same base address, the system need not keep track of the base address once the last instance of the DLL or EXE is unloaded. If the DLL or EXE is loaded again, it can get a fresh base address.


!peb lists the process environment block, and with it it lists all the assemblies and where they are loaded from, so if you see the same assembly multiple times in this list, you should think about strong naming them and putting them in the GAC.


When using 3rd party libraries, sometimes they let out a new version. And sometimes that new version breaks something that used to work in the old version, but also adds features you need.So you found yourself having to use one method from the old version and another from the new version of the same library.


However, I now have some customers that complain that every new version has higher memory requirements. Moving to DLL's might solve some of these problems, because I think that Windows only loads DLL's once in memory (so if you have 50 processes all using the same DLL, the DLL only takes space once in physical memory).


On the other side, if I have 50 processes all using the same EXE file, I would expect that Windows does also share this EXE file for multiple processes. But I have the impression that Windows doesn't do this (and only does it for DLL files).Is this observation true?


DLL and EXE code is absolutely shared - there is only one copy of the code in RAM, regardless of how many processes are using the EXE or the DLLs. (ANd, not all of the code will necessarily be in RAM - only that which has been recently referenced.)


Processes do not, per se, have stacks. Threads have stacks. (One could say that a process with only one thread has "a" stack for that process, but really, the stack is an attribute of the thread, not the process.) But DLLs are not "loaded in a processes' stack" (nor in a thread's stack for that matter). DLLs and EXEs are mapped into the shareable - not private - virtual address space of a process. It is true that this is done for each process using the DLL or EXE, but these multiple instances are for the virtual memory mappings. Since this is shareable virtual address space, there's still just one copy of the code in RAM. 2ff7e9595c


1 view0 comments

Recent Posts

See All

Commenti


bottom of page