Ticket #361 (closed Defect: fixed)

Opened 4 years ago

Last modified 4 years ago

Python's logging package causes large memory leaks in "in process" MGET tools under ArcGIS 9.3

Reported by: jjr8 Owned by: jjr8
Priority: Medium Milestone: 0.7
Component: Core - ArcGIS Interop Version:
Keywords: Cc:

Description

I discovered two large memory leaks caused by the logging package when it is used by "in process" tools under ArcGIS 9.3. These leaks will not occur for typical Python users, but occur in Arc 9.3 due to how it runs Python tools. When it runs a tool, it does not create a brand-new Python interpreter, run the tool, and deallocate the interpeter. Instead it appears to create a new "sub-interpreter" by calling Python's Py_NewInterpreter, run the tool, and then (maybe) deallocate the sub-interpreter (by calling Py_EndInterpreter).

ESRI probably chose this design to maximize performance. It is probably much quicker to initialize and tear down a sub-interpreter than a full interpreter. They also may have discovered that they were affected by a problem noted in the Python documentation for the Py_Finalize function: "Some extensions may not work properly if their initialization routine is called more than once; this can happen if an application calls Py_Initialize() and Py_Finalize() more than once."

Sub-interpreters appear to be tricky. Consider all of issues raised in the documentation for the Py_NewInterpreter function:

PyThreadState?* Py_NewInterpreter( )

Create a new sub-interpreter. This is an (almost) totally separate environment for the execution of Python code. In particular, the new interpreter has separate, independent versions of all imported modules, including the fundamental modules builtin, main and sys. The table of loaded modules (sys.modules) and the module search path (sys.path) are also separate. The new environment has no sys.argv variable. It has new standard I/O stream file objects sys.stdin, sys.stdout and sys.stderr (however these refer to the same underlying FILE structures in the C library).

The return value points to the first thread state created in the new sub-interpreter. This thread state is made in the current thread state. Note that no actual thread is created; see the discussion of thread states below. If creation of the new interpreter is unsuccessful, NULL is returned; no exception is set since the exception state is stored in the current thread state and there may not be a current thread state. (Like all other Python/C API functions, the global interpreter lock must be held before calling this function and is still held when it returns; however, unlike most other Python/C API functions, there needn't be a current thread state on entry.)

Extension modules are shared between (sub-)interpreters as follows: the first time a particular extension is imported, it is initialized normally, and a (shallow) copy of its module's dictionary is squirreled away. When the same extension is imported by another (sub-)interpreter, a new module is initialized and filled with the contents of this copy; the extension's init function is not called. Note that this is different from what happens when an extension is imported after the interpreter has been completely re-initialized by calling Py_Finalize() and Py_Initialize(); in that case, the extension's initmodule function is called again.

Bugs and caveats: Because sub-interpreters (and the main interpreter) are part of the same process, the insulation between them isn't perfect -- for example, using low-level file operations like os.close() they can (accidentally or maliciously) affect each other's open files. Because of the way extensions are shared between (sub-)interpreters, some extensions may not work properly; this is especially likely when the extension makes use of (static) global variables, or when the extension manipulates its module's dictionary after its initialization. It is possible to insert objects created in one sub-interpreter into a namespace of another sub-interpreter; this should be done with great care to avoid sharing user-defined functions, methods, instances or classes between sub-interpreters, since import operations executed by such objects may affect the wrong (sub-)interpreter's dictionary of loaded modules. (XXX This is a hard-to-fix bug that will be addressed in a future release.)

Also note that the use of this functionality is incompatible with extension modules such as PyObjC and ctypes that use the PyGILState_* APIs (and this is inherent in the way the PyGILState_* functions work). Simple things may work, but confusing behavior will always be near.

The leaks caused by logging result from circular object references that are created every time a new sub-interpreter is initialized and the logging modules are used. When the sub-interpreter is shut down, the circular references prevent the memory associated with that sub-interpreter from being deallocated.

Problem 1: logging creates leak by calling atexit.register

This leak is exposed by creating a geoprocessing tool that has the following single line of Python code:

import logging.config

The problem is ultimately from this this code in logging/__init__.py:

#Let's try and shutdown automatically on application exit...
try:
    import atexit
    atexit.register(shutdown)
except ImportError: # for Python versions < 2.0
    def exithook(status, old_exit=sys.exit):
        try:
            shutdown()
        finally:
            old_exit(status)

    sys.exit = exithook

And from atexit.py:

def register(func, *targs, **kargs):
    """register a function to be executed upon normal program termination

    func - function to be called at exit
    targs - optional arguments to pass to func
    kargs - optional keyword arguments to pass to func
    """
    _exithandlers.append((func, targs, kargs))

Nothing seems to free the _exithandlers; there is no explicit code in atexit to do this, nor does it appear that they are released by some other mechanism such as garbage collection when the sub-interpreter shuts down. As a result, the list entry made when logging was imported is leaked.

Problem 2: logging fails to free handler if the handler's flush() raises an exception

Here is the code from logging/__init__.py:

def shutdown(handlerList=_handlerList):
    """
    Perform any cleanup actions in the logging system (e.g. flushing
    buffers).

    Should be called at application exit.
    """
    for h in handlerList[:]:
        #errors might occur, for example, if files are locked
        #we just ignore them if raiseExceptions is not set
        try:
            h.flush()
            h.close()
        except:
            if raiseExceptions:
                raise
            #else, swallow

The problem is that if h.flush() raises an exception, h.close() is never called, and it is h.close() that removes the handler from the module global _handlerList:

class Handler(Filterer):

    ...

    def close(self):
        """
        Tidy up any resources used by the handler.

        This version does removes the handler from an internal list
        of handlers which is closed when shutdown() is called. Subclasses
        should ensure that this gets called from overridden close()
        methods.
        """
        #get the module data lock, as we're updating a shared structure.
        _acquireLock()
        try:    #unlikely to raise an exception, but you never know...
            del _handlers[self]
            _handlerList.remove(self)
        finally:
            _releaseLock()

Normally this is not an issue. But when GeoEco's Logging.ini file initializes a StreamHandler for sys.stdout, that sys.stdout does not have a flush method when the code is run from within ArcGIS 9.3. I does appear to have a flush method when run outside of ArcGIS 9.3. Thus, StreamHandler.flush hits an AttributeError when it tries to run self.stream.flush()under Arc 9.3:

class StreamHandler(Handler):

    ...

    def flush(self):
        """
        Flushes the stream.
        """
        self.stream.flush()

Solution to both problems

When a GeoEco tool is run from ArcGIS, the tool's function is called by the function ExecuteMethodFromCommandLineAsArcGISTool in GeoEco.ArcGIS. That function will now execute the following code before exiting:

def _ShutdownLoggingToAvoidArcGIS93MemoryLeak():
    try:
        logging.raiseExceptions = 0
        logging.shutdown()
        while len(logging._handlerList) > 0:
            del logging._handlerList[0]
        import atexit
        for i in range(len(atexit._exithandlers)):
            if atexit._exithandlers[i][0] == logging.shutdown:
                del atexit._exithandlers[i]
                break
    except:
        pass

This code manually executes the logging.shutdown function, empties logging._handlerList of any entries that were leaked by problem 2, and removes the entry from atexit._exithandlers to address problem 1.

Testing shows that this fix eliminates the memory leak related to logging. As a result most MGET tools do not leak large amounts of memory under Arc 9.3. Most still leak small amounts of memory. This appears to be due to many modules, including Python's own modules and GeoEco modules, allocating static variables (e.g. module globals or class statics) that are not freed at sub-interpreter shut down. Fixing all of these small leaks will be very difficult, so I will probably not attempt it.

Change History

Changed 4 years ago by jjr8

  • status changed from new to closed
  • resolution set to fixed

Fixed in [404], released in MGET 0.7b1.

Note: See TracTickets for help on using tickets.