Reducing the heap size of GC programs
Several programs in Vesta use the Borhm garbage collector library. We have had some problems with garbage collected programs growing to extremely large amounts of memory. Mostly this has been the evaluator and the cache server, though with complicated instructions the rpelicator has been known to have this problem as well.
This page lists a variety of options for attacking this problem as well as some notes on what we've already done.
Actually free heap blocks in GC pograms
The basics library provides a delete operator which simply drops all heap blocks on the floor. However the garbage collector does provide the ability to explicitly free heap blocks. We could make the delete operator call this. This would cause memory to be reclaimed earlier, reducing the rate of heap growth.
The problem with doing this is that the current code base has developed under the assumption that the delete operator doesn't actually free anything. We would need to audit every use of the delete operator in every application using garbage collection (including all the libraries they use) to make sure that there is no possibility of additional references to the same block. In cases where there may be additional references, the delete operator could simply be removed.
This may be difficult in shared code (such as that in libraries) which needs to work both with and without garbage collection. In such cases, we may need to have the library alter its code at compile time based on whether garbage collection is enabled, or check at run-time whether garbage collection is being used.
Help the GC by clearing pointers
Since the garbage collector searches for pointers, any time we can clear a pointer we no longer need it can help keep the garbage collector from treating garbage as non-garbage.
When exiting a stack frame, local pointer variables can be cleared. Stack-allocated instances will have their destrucotrs called, so it would be good if they cleared instance member variables which are pointers. (For example, the ListT template class in the evalautor, which is the actualy type of the commonly used Context typedef, could have a destructor which cleared its list and last member variables.)
Use the garbage collector's explicitly-typed allocation interface
By default the garbage collector assumes htat any bytes in an allocated heap block may contain pointers. This allows data to be mistaknely treated as pointer references to heap blocks which would otherwise be treated as garbage.
We took steps to make this less likely by changing the garbage collectors options to keep it from treating pointers to any position within a heap block as valid references (disabling its ALL_INTERIOR_POINTERS configuration option). However this only reduces the propbability of mis-identifying data as pointers. It does not completely eliminate the problem.
The garbage collector allows a caller to provide an explicit specification of where the pointers are within a heap block. This makes it possible to avoid mis-identifying data as pointers in blocks allocated using this method.
This is unfortunately not a simple matter, especially for C++ classes. Some initial work has been done with this. See the currently abandoned branches:
And the tracker entry Mark more memory as pointer-free with garbage collector.
Explicitly collect garbage during idle periods
Collecting garbage requires pausing the entire application. Normally the garbage collector tries to automatically determine when it should do this. It tries to avoid doing it very often, as while the applicaiton is collecitng garbage it's not getting any "real" work done.
Some of the GC applications have periods where they're mostly idle anyway. It would be possible to explicitly invoke the gabrage collector during these periods.
For example, when the vrepl is waiting for the repository to copy an immutable version from a peer repository, it's completely idle until the RPC to the repository returns. When the evaluator is performing a call the cache server, it is simiarly stalled until the RPC to the cache server returns.
However, implementing this would require a fair amount of care. A multi-threaded program would have to have all threads waiting for a response from a server before it would really be idle. An evaluator waiting for a tool invocation to complete is not idle, as it still needs to service incoming ToolDirectoryServer requests. The SRPC layer would have to signal some other code that it is blocked waiting for a response (perhps with a condition variable) , and similarly signal when it is no longer blocked waiting for a reponse. When receiving such a signal that a thread is blocking, it would not be appropriate to start garbage collection immediately, as the SRPC layer could only send such a signal before it begins waiting.
GC Tuning Parameters
The garbage collector has some parameters which can give us some additional control over how it works which may allow us to improve the situation.
One parameter is GC_free_space_divisor which controls how the garbage collector decides whether to expand the heap or perform a garbage collection. To quote the documentation:
If the total amount of allocation since the last collection is less than the heap size divided by GC_free_space_divisor, we try to expand the heap. Otherwise, we initiate a garbage collection. This ensures that the amount of garbage collection work per allocated byte remains constant.
GC_free_space_divisor defaults to 3. Increasing it would make garbage collections occur more frequently and heap expansions occur less frequently. This could help keep the heap size from growing as large as it does, and may be warranted given how much garbage the evaluator generates. However, there would be some performance cost as it would make the evaluator spend more time collecting garbage and less time doing "real work".
Another setting is GC_max_retries which is:
The maximum number of GCs attempted before reporting out of memory after heap expansion fails.
The default value is 0, though we now initialize it to 1. It's not clear whether values higher than 1 would have any effect.
Another potentially interesting setting is GC_dont_expand, described as follows:
Dont expand heap unless explicitly requested or forced to.
I've never tried using this, but my impression is that it would make the garbage collector always try to collect garbage before trying to expand the heap. It sounds theoretically equivalent to setting GC_free_space_divisor to infinity (i.e. always try to perform a garbage collection before resorting to expanding the heap).
We've now made these settable with environment variables.
There are also other settings in gc.h. Also see README.environment.
We might want to adjust some of these settings as th heap increases in size, essentially making them a function of the heap size. For example, if we increased GC_free_space_divisor as the heap got larger, we could favor garbage collection over heap expansion more. Another possibility would be setting GC_dont_expand after the heap size crosses some threshold.
GC Incremental Mode
The garbage collector has an incremental mode which allows garbage collection to proceed in parallel with other work. It can be enabled by setting the GC_ENABLE_INCREMENTAL environment variable (though it can also be explicitly enabled by a function call to GC_enable_incremental. In theory, this could reduce heap expansion by collecting more garbage sooner. This could also be combined with other approaches, for example by calling GC_collect_a_little when waiting for a response from a server.
Unfortunately, tests with the evaluator have shown that this feature actually increases memory use. In small tests (builds of publicly-available models like Vesta release builds typically using 10s of MB of memory), we found these approximate differences:
- Total memory size at process exit: 19% increase
- Resident memory size at process exit: 46% increase
- GC heap size at process exit: 39% increase
In larger tests (builds at Intel typically using a small number of GB of memory) the results were even worse:
- Total memory size at process exit: 67% increase
- Resident memory size at process exit: 48% increase
- GC heap size at process exit: 43% increase
Based on these results, we have left incremental garbage collection mode disabled.
Completed Changes
Disable ALL_INTERIOR_POINTERS in the Garbage Collector Configuration
Some time ago, we disabled the garbage collector's ALL_INTERIOR_POINTERS configuration option. This makes it less likely for data to be mis-identified as a pointer and cause heap blocks to be kept when they're really garbage. (Note that this only reduces the propbability, it does not completely eliminate the problem.)
See:
Use Atom more and Text less in the Evaluator
The Atom class in generics library is a subclass of the Text class from the basics library. Atom uses a global string table to avoid storing the same string more than once.
Applications which manipulate text strings can use this to avoid the need to store the same string multiple times.
With eval/80 and cache/50, the evaluator uses Atom rather than Text in a few key places:
Member variables in subclasses of ExprC (which are used for the in-memory representation of parsed SDL code): NameEC, ApplyOpEC, ApplyUnOpEC, FileEC, PrimitiveEC, FuncEC, BaseTEC
The elements of the dependency paths. ArcSeq in Dep.H is actully an FV2::T from FV2.H in the cache package, which is now a subclass of AtomSeq from generics.
The "name" member variable of the AssocVC data type (which is used for both assigned variables and binding elements)
Thes changes were started in cache/49.Atom_FV and eval/79.Atom.
Don't accumulate a string into a buffer in memory just to fingerprint it
In a few places the evaluator will compute a fingerprint by printing some text to a stream:
FuncEC::FingerPrint (in Expr.C)
FingerPrint(Expr expr) (in Expr.C)
It used to used Basics::OBufStream, accumulating the test in heap blocks and then fingerprinting it once done. Now it uses the FP::FPStream class which fingerprints bytes as they are written.
FP::FPStream was introduced in fp/17 and the evaluator started using it in eval/80.
Make More Garbage Collector Settings Accessible
To make it easier to experiment with the garbage collector's settings, basics/33 made a few more settable with environment variables:
- GC_MAX_RETRIES
Sets GC_max_retries, accepts values between 1 and 10
- GC_FREE_SPACE_DIVISOR
Sets GC_free_space_divisor, accepts values between 3 and 100
- GC_DONT_EXPAND
If set (to any value), sets GC_dont_expand to 1
- GC_ENV_VERBOSE
If set, the code in the basics library produces output about the settings being used. (This does not affect the code in the garbage collector library itself which uses other environment variables.)
Set GC_max_retries to 1
To make sure the garbage collector doesn't give up immediately when it fails to expand the heap, we set GC_max_retries to a minimum of 1 as of basics/33.
Eliminate the use of ostringstream
Some implementations of the standard ostringstream class (notably the gcc implementation) cause a lot of heap allocations (and are generally inefficient). These implementations tend to allocate just enough space for each successive write, copying the bytes form the previous heap block and freeing the previosu heap block. Since the basics library provides a delete operator which simply drops all heap blocks on the floor, this creates a lot of allocations and garbage. (Creating less garbage would keep the rate of heap growth down.)
The basics library provides an alternative class (Basics::OBufStream) which performs far fewer allocations and thus produces less garbage in general.
SimonPeffers: I looked at all uses of ostringstream that I could find in Vesta and found that all uses were for printing error messages. I don't think fixing these would reduce the heap size much.
KenSchalk: I'm not sure when we completely eliminated it, but ostringstream isn't used in the Vesta code any more.
List of programs using GC
Some of the changes outlined above could have a pervasive effect on all programs which use garbage collection. When making such a change, every program which uses garbage collection should be checked to make sure it still works.
Here's a hopefully complete list of programs using the garbage collector:
- All programs in the eval package:
vesta (the evaluator)
PickleStats
- All programs in the repltools package:
vrepl (the replicator)
- vmaster
- vglob
- vcheckagreement
- All programs in the cache server package:
VCache (the cache server)
VCacheMonitor
ChkptCache
FlushCache
VCacheStats
PrintCacheVal
- PrintMPKFile
PrintCacheLog
PrintGraphLog
TestCache
TestCacheRandom
TestCacheSRPC
TestBitVector
TestCompactFV
- TestPKPrefix
TestPrefixTbl
TestTimer
TestFlushQueue
- All programs in the perftools package:
CountShortIds
ShortIdSetDiff
- The test programs in the fp package:
- TestFP
- TestFPPerf
TestUniqueId
- TestFPTable
- The performance data analysis programs in the repos package:
- read_timing
- read_lock_timing
- All programs in the weeder package:
VestaWeed
QuickWeed
PrintWeederVars
- Some of the test programs in the basics package:
- TestTextGC
- TestAllocatorGC
- Some of the test programs in the generics package:
TestAtom
- TestIntSTbl