Links to other parts:

Update (3/7/2012): Fixed an infinite-loop bug found by Simon Lundmark in section 5b that occurred in the following sequence of events:

1. The search for whole tracking units runs past the end of the tracking array
2. The for-loop terminates on the i < kMemNumTrackingUnits condition
3. uBeginningTrackingUnit does not get updated
4. The uNumContigousTrackingUnitsAvailable != uNumContiguousTrackingUnitsToFind condition is true.
5. The while-loop is continued, with uBeginningTrackingUnit at the same value it started with in the previous iteration.

Original (with edits):

In today’s part I will be discussing the three stages of the search for pages in the malloc rewrite. Special thanks to Simon Lundmark for digging in, pointing out and providing the correction for a couple flaws.

First though, here is a quick recounting of the computer used for development.  I’m working on an old laptop running Windows XP Pro Service Pack 3, with a Pentium-M 1.6GHz 32-bit processor, 400MHz FSB, and 512MB of RAM.

With the given machine architecture (Little Endian), I want to be clear that I conceptualize the units of the tracking array like it’s Big Endian, when the end of the tracking unit is referred to, this refers to the least significant bit, and the beginning or start of the tracking unit is the most significant bit.

Figure 1: Big Endian bit field

The array of tracking units positions the end of each preceding tracking unit next to the start of the subsequent tracking unit.

Figure 2: Preceding / Subsequent tracking units

Just to give a quick reminder, here’s the list of things malloc needs to do:

  1. Add padding to compensate for alignment of the allocation.
  2. Align the header so the beginning address of the memory will be aligned.
  3. Add the size of the header to the allocation.
  4. Check the request to make sure it will fit in the memory being tracked.
  5. Find enough contiguous pages of memory to satisfy the allocation.
    1. Find a starting point of available pages within a tracking unit.
    2. Find whole tracking units where all pages are available and required in fulfilling the request.
    3. Find the remaining pages needed to fulfill the request.
  6. Return the memory, if enough available memory was found, otherwise return NULL.

The search begins at the first tracking unit, and continues iterating over the array of tracking units until all tracking units have been exhausted.

u32 uBeginningTrackingUnit = 0;
 
  while (uBeginningTrackingUnit < kMemNumTrackingUnits)
 
  {

5a. Find a starting point of available pages within a tracking unit.

Four potential situations exist within the first tracking unit.

  1. All the pages have been used.
  2. Not enough free pages exist internal to the tracking unit to fulfill the request.
  3. All the pages needed to fulfill the request are contained within the starting tracking unit.
  4. Available pages exist starting at some point within the tracking unit, extending to the end of the tracking unit, and may be used in conjunction with the next tracking unit to fulfill the request if enough available contiguous pages exist.

The first two situations are not useful for fulfilling the request, so the last two are the ones for which to test.  If there are enough pages contained within a tracking unit to fulfill the request, then they can be located by building a bitmask that represents enough pages and using that bitmask to find a match with the available pages.  The number of pages not needed in a tracking unit is calculated by subtracting the number of pages needed from the number of pages per tracking unit.  Then the bitmask is created by bit shifting the ‘all pages in use’ constant towards the end of the tracking unit by the number of pages not needed; this leaves the number of pages that are needed as the bitmask.

      u32 uPreOffset = 0;
 
        TRACKING_UNIT uPreBitMask = 0;
 
        if (uNumPagesRequested < kMemNumPagesPerUnit)
 
        {
 
              // Build a PreBitMask to deal with there being enough contiguous pages internal to the
 
              // beginning unit to satisfy the request, or there is enough room to start a request
 
              // across the tracking unit boundary.
 
   
 
              // Build a PreBitMask that will find internal contiguous pages.
 
              uPreBitMask = kMemTrackingUnitAllPagesInUse << (kMemNumPagesPerUnit - uNumPagesRequested);
 
        }

If there are not enough available pages contained within a tracking unit to fulfill the request, then either the entire tracking unit, if all pages are available, or whatever available pages are at end of the tracking unit can be used in conjunction with the available pages contained in the start, or entirety, of the subsequent tracking unit.  The bitmask is built that represents all pages available.

      else
 
        {
 
              uPreBitMask = kMemTrackingUnitAllPagesInUse;
 
        }

Once the bitmask is created it is repeatedly checked and shifted toward the end of the tracking unit until a match to the bitmask is found, or all bits in the bitmask are exhausted.  If a match isn’t found prior to shifting the bitmask, then the bits, representing needed pages, shifted off the end of the tracking unit will need to be found in the next tracking unit, and this is kept in the uPreOffset counter used in a for loop.

      for ( ; uPreOffset < kMemNumPagesPerUnit; ++uPreOffset)
 
        {
 
              if (( ~(aMemPageTrackingBits[uBeginningTrackingUnit]) & uPreBitMask) == uPreBitMask)
 
              {
 
                    // Found a tracking unit with enough contiguous pages internal to the tracking unit
 
                    // or with available contiguous pages at the end of the tracking unit.
 
                    break;
 
              }
 
              uPreBitMask = uPreBitMask >> 1;
 
        }

Next, a check is done to make sure the bitmask wasn’t completely exhausted, and if it was to move to the next tracking unit and start over.

      // Check if the PreOffset has completely exhausted the current beginning tracking unit.
 
        if (uPreOffset == kMemNumPagesPerUnit)
 
        {
 
              uBeginningTrackingUnit++;
 
              // Minimize worst case searches, where contiguous allocations, or one large allocation, fills contiguous tracking units.
 
              while (aMemPageTrackingBits[uBeginningTrackingUnit] == kMemTrackingUnitAllPagesInUse)
 
              {
 
                    uBeginningTrackingUnit++;
 
              }
 
              continue;
 
        }

If the bitmask wasn’t completely exhausted, then the number of remaining pages needed is calculated, and the number of remaining pages needed is checked.  If more pages are needed then a check is performed to ensure more tracking units are available.  When no more tracking units are available, but more pages are needed, a null is returned.  In the case that enough pages have been found to fulfill the request, the pages to be assigned are marked as in use, the beginning memory address is calculated, the memory used to pad the allocation header, if any, is cleared, the allocation header is written, and the memory address immediately following the allocation header is returned.  By placing the allocation header immediately prior to the memory address returned, the size and alignment information can be easily retrieved during a free method call, or anywhere it needs to be viewed, such as in a debugger, simply by moving the memory address pointer backwards.

      TRACKING_UNIT uNumRemainingPagesNeeded = ((uNumPagesRequested >= (kMemNumPagesPerUnit - uPreOffset))? (uNumPagesRequested - (kMemNumPagesPerUnit - uPreOffset)) : 0);
 
   
 
        // Check if all the required pages needed have been found.
 
        if (uNumRemainingPagesNeeded == 0)
 
        {
 
              // Mark the pages of the tracking unit as used.
 
              aMemPageTrackingBits[uBeginningTrackingUnit] |= uPreBitMask;
 
   
 
              // Calculate the memory address.
 
              u32 uAddress = ((uBeginningTrackingUnit * kMemNumPagesPerUnit) + uPreOffset) * kMemPageSize;
 
   
 
              // Zero out the Allocation Header Padding memory.
 
              memset((void*)((u32)(const_cast<void*>(kpMemory)) + uAddress), 0, uAllocationHeaderPaddingSize);
 
              // Store the size and alignment of the allocation (Pad the front of the header, so _free and realloc can get the alignment data).
 
              ((TALLOCATION_HEADER*)((u32)(const_cast<void*>(kpMemory)) + uAddress + uAllocationHeaderPaddingSize))->uSize = uSize;
 
              ((TALLOCATION_HEADER*)((u32)(const_cast<void*>(kpMemory)) + uAddress + uAllocationHeaderPaddingSize))->uAlignment = uAlignment;
 
   
 
              // Return the memory.
 
              return (void*)((u32)(const_cast<void*>(kpMemory)) + uAddress + uAllocationHeaderSize);
 
        }
 
   
 
        // Check if beginning tracking unit was the last one (early termination saves unnecessary checking).
 
        // If the next tracking unit is less than the beginning tracking unit, uNextTrackingUnitToCheck was larger than the data type would hold,
 
        // and therefore the last tracking unit is already included.
 
        u32 uNextTrackingUnitToCheck  = (uBeginningTrackingUnit + 1) % kMemNumTrackingUnits;
 
        if (uNextTrackingUnitToCheck < uBeginningTrackingUnit)
 
        {
 
              return NULL;
 
        }

5b. Find whole tracking units where all pages are available and required in fulfilling the request.

Since the tracking units are 32-bit unsigned integers, any tracking units where all pages in the tracking unit are available will have a value of zero.  The number of contiguous whole tracking units to find is calculated and then the tracking units subsequent to the starting tracking unit are evaluated.  If a partially or fully used tracking unit is encountered before all the required whole tracking units are found, then the search for whole tracking units is terminated.  If a tracking unit is completely available, it is counted and evaluation continues until failure or all required contiguous whole tracking units are found and confirmed.

      // The last tracking unit may be partial, so begin by determining how many whole units to find.
 
        // This will ensure that any remaining individual available page searches are limited to one tracking unit.
 
        u32 uNumContiguousTrackingUnitsAvailable = 0;
 
        u32 uNumContiguousTrackingUnitsToFind = uNumRemainingPagesNeeded / kMemNumPagesPerUnit;
 
        for (u32 i = uNextTrackingUnitToCheck; i < kMemNumTrackingUnits && uNumContiguousTrackingUnitsAvailable < uNumContiguousTrackingUnitsToFind; ++i)
 
        {
 
              if (aMemPageTrackingBits[i] == 0)
 
              {
 
                    uNumContiguousTrackingUnitsAvailable++;
 
              }
 
              else
 
              {
 
                    break;
 
              }
 
        }

Once the search has terminated or completed, the number of available whole tracking units found is compared to the number of contiguous whole tracking units required, to make sure the search wasn’t terminated early.  If the search was terminated early, then the tracking unit to start searching from is moved up to the current tracking unit, and the search for the requested memory must begin again at the beginning.

      // Check for failure to locate enough contiguous whole tracking units.
 
        if (uNumContiguousTrackingUnitsAvailable != uNumContiguousTrackingUnitsToFind)
 
        {
 
              // Skip past all the memory that was just checked for whole contiguous tracking units,
 
              // because the allocation isn't going to fit in there, and trying again would be wasted effort.
 
              uBeginningTrackingUnit = uNextTrackingUnitToCheck + uNumContiguousTrackingUnitsAvailable;
 
              continue;
 
        }

The number of pages needed is recalculated, subtracting out the pages contained in the whole tracking units that were just located.

      uNumRemainingPagesNeeded = uNumRemainingPagesNeeded - (uNumContiguousTrackingUnitsAvailable * kMemNumPagesPerUnit);

A check is then performed to determine if all the pages needed have been found.  If all the pages needed have been found, then the same steps are taken as described at the end of the search for the starting point, except when the pages are marked as used the whole tracking units used to satisfy the request are marked as well.  If more pages are required then the algorithm continues.

      if (uNumRemainingPagesNeeded == 0)
 
        {
 
              // Mark the pages of the tracking unit as used.
 
              aMemPageTrackingBits[uBeginningTrackingUnit] |= uPreBitMask;
 
              for (u32 i = uNextTrackingUnitToCheck; i < (uNextTrackingUnitToCheck + uNumContiguousTrackingUnitsAvailable) ; ++i)
 
              {
 
                    aMemPageTrackingBits[i] |= kMemTrackingUnitAllPagesInUse;
 
              }
 
   
 
              // Calculate the memory address.
 
              u32 uAddress = ((uBeginningTrackingUnit * kMemNumPagesPerUnit) + uPreOffset) * kMemPageSize;
 
   
 
              // Zero out the Allocation Header Padding memory.
 
              memset((void*)((u32)(const_cast<void*>(kpMemory)) + uAddress), 0, uAllocationHeaderPaddingSize);
 
              // Store the size and alignment of the allocation (Pad the front of the header, so _free and realloc can get the alignment data).
 
              ((TALLOCATION_HEADER*)((u32)(const_cast<void*>(kpMemory)) + uAddress + uAllocationHeaderPaddingSize))->uSize = uSize;
 
              ((TALLOCATION_HEADER*)((u32)(const_cast<void*>(kpMemory)) + uAddress + uAllocationHeaderPaddingSize))->uAlignment = uAlignment;
 
   
 
              // Return the memory.
 
              return (void*)((u32)(const_cast<void*>(kpMemory)) + uAddress + uAllocationHeaderSize);
 
        }

5c. Find the remaining pages needed to fulfill the request.

Before checking for the last tracking unit to fulfill the request, a check is performed to make sure the index for the next tracking unit to check is not past the end of the array of tracking units.  If the next tracking unit to check would be past the end of the array of tracking units, then not enough contiguous memory exists, and null is returned. When the index for the next tracking unit to check is valid, then a bitmask is created for the remaining pages needed; the bits for this are shifted to begin at the start of the tracking unit, because the pages must be contiguous with the previously located pages.

      // Check the next tracking unit for the last of the contiguous pages needed.
 
   
 
        // Check if available tracking units already includes the last one (early termination saves unnecessary checking).
 
        // If the next tracking unit is less than or equal to the beginning tracking unit, uNextTrackingUnitToCheck was larger
 
        // than the last tracking unit or larger than the data type would hold, and therefore the last tracking unit is
 
        // already included.
 
        uNextTrackingUnitToCheck = (uNextTrackingUnitToCheck + uNumContiguousTrackingUnitsAvailable) % kMemNumTrackingUnits;
 
        if (uNextTrackingUnitToCheck <= uBeginningTrackingUnit)
 
        {
 
              return NULL;
 
        }
 
   
 
        // Build a bitmask for finding the last batch of contiguous pages.
 
        TRACKING_UNIT uPostBitMask = kMemTrackingUnitAllPagesInUse << (kMemNumPagesPerUnit - uNumRemainingPagesNeeded);

One last check is performed to determine if enough contiguous pages have been located to fulfill the request.  The new bitmask is compared to the next tracking unit to check, if a match is found, then the same steps are taken as described at the end of the search for the starting point, except when the pages are marked as used the whole tracking units used to satisfy the request and the pages required in the next tracking unit to check, are marked as well.  If a match is not found, then the tracking unit to start at is moved to the tracking unit that was evaluated for the end in this iteration, because it may be the start of the tracking units that will fulfill the request in the next iteration of the search over the tracking unit array.

      // The pages must be at the beginning of the tracking unit to be contiguous with the previous pages.
 
        // If the negation of the next tracking unit bits AND'd with the pages needed bit mask is equal to the pages needed bit mask, then the memory has been found.
 
        if ((~(aMemPageTrackingBits[uNextTrackingUnitToCheck]) & uPostBitMask) == uPostBitMask)
 
        {
 
              // Mark the pages of the tracking unit as used.
 
              aMemPageTrackingBits[uBeginningTrackingUnit] |= uPreBitMask;
 
              for (u32 i = (uBeginningTrackingUnit + 1); i < uNextTrackingUnitToCheck ; ++i)
 
              {
 
                    aMemPageTrackingBits[i] |= kMemTrackingUnitAllPagesInUse;
 
              }
 
              aMemPageTrackingBits[uNextTrackingUnitToCheck] |= uPostBitMask;
 
   
 
              // Calculate the memory address.
 
              u32 uAddress = ((uBeginningTrackingUnit * kMemNumPagesPerUnit) + uPreOffset) * kMemPageSize;
 
   
 
              // Zero out the Allocation Header Padding memory.
 
              memset((void*)((u32)(const_cast<void*>(kpMemory)) + uAddress), 0, uAllocationHeaderPaddingSize);
 
              // Store the size and alignment of the allocation (Pad the front of the header, so _free and realloc can get the alignment data).
 
              ((TALLOCATION_HEADER*)((u32)(const_cast<void*>(kpMemory)) + uAddress + uAllocationHeaderPaddingSize))->uSize = uSize;
 
              ((TALLOCATION_HEADER*)((u32)(const_cast<void*>(kpMemory)) + uAddress + uAllocationHeaderPaddingSize))->uAlignment = uAlignment;
 
   
 
              // Return the memory.
 
              return (void*)((u32)(const_cast<void*>(kpMemory)) + uAddress + uAllocationHeaderSize);
 
        }
 
   
 
        // Set the next tracking unit to begin checking.
 
        uBeginningTrackingUnit = uNextTrackingUnitToCheck;
 
  } // End of while (uBeginningTrackingUnit < kMemNumTrackingUnits)

If all the tracking units have been exhausted, without locating enough pages to fulfill the request, then null is returned.

      // If this has been reached, then not enough contiguous memory was found.
 
        return NULL;

And that’s malloc.

In the bigger picture, there are issues that still need to be considered and dealt with, such as fragmentation or frequent small requests. Issues such as those can be handled through other schemes, like memory heaps, hashes containing lists of available memory with frequently used sizes and object factories that would be implemented on top of malloc. The reason these other schemes are not handled in malloc itself is because in my implementation I chose not to incur the costs associated with them in trying to allocate larger blocks of memory.  In other words, only incur the execution costs when necessary.

I hope this post has been informative, and I promise next time it will not be so long when I cover the free method.