Recyclable.Collections 0.0.4

There is a newer version of this package available.
See the version list below for details.
dotnet add package Recyclable.Collections --version 0.0.4                
NuGet\Install-Package Recyclable.Collections -Version 0.0.4                
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Recyclable.Collections" Version="0.0.4" />                
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add Recyclable.Collections --version 0.0.4                
#r "nuget: Recyclable.Collections, 0.0.4"                
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install Recyclable.Collections as a Cake Addin
#addin nuget:?package=Recyclable.Collections&version=0.0.4

// Install Recyclable.Collections as a Cake Tool
#tool nuget:?package=Recyclable.Collections&version=0.0.4                

Recyclable.Collections

Recyclable.Collections project is an open source framework for operating dynamic lists at performance close to raw arrays, but fairly unlimited in size. It aims at providing minimal memory footprint. It implements IList<T> interface and is targeted as direct replacements of List<T>, SortableList<T>, PriorityQueue<T> & similar.

Included

  • RecyclableList<T>
  • RecyclableLongList<T>

Milestones

  1. ✅ Create basic classes

    1. RecyclableList<T>
    2. RecyclableLongList<T>
    3. 🅿️ RecyclableQueue<T>
    4. 🅿️ RecyclableSortedList<T>
    5. 🅿️ RecyclableStack<T>
    6. 🅿️ RecyclableUnorderedList<T>
  2. ✅ Create basic unit tests

    1. RecyclableList<T>
    2. RecyclableLongList<T>
    3. 🅿️ RecyclableQueue<T>
    4. 🅿️ RecyclableSortedList<T>
    5. 🅿️ RecyclableStack<T>
    6. 🅿️ RecyclableUnorderedList<T>
  3. ✅ Optimize RecyclableList<T>

    1. Add
    2. AddRange
      1. ✅ when source is array<T>
      2. ✅ when source is List<T>
      3. ✅ when source is IList<T>
      4. ✅ when source is RecyclableList<T>
      5. ✅ when source is RecyclableLongList<T>
      6. ✅ when source is IEnumerable<T>
      7. ✅ when source has non-enumerated count
    3. Clear
    4. Contains
    5. CopyTo
    6. Dispose
    7. EnsureCapacity
    8. GetEnumerator
    9. IndexOf
    10. Insert
    11. Remove
    12. RemoveAt
    13. Resize
    14. this[int index]
  4. ✅ Port RecyclableList<T> implementation to RecyclableLongList<T>

  5. 👉 Optimize RecyclableLongList<T>

    1. Add
    2. AddRange
      1. ✅ when source is array<T>
      2. ✅ when source is List<T>
      3. ✅ when source is IList<T>
      4. ✅ when source is RecyclableList<T>
      5. ✅ when source is RecyclableLongList<T>
      6. ✅ when source is IEnumerable<T>
      7. ✅ when source has non-enumerated count
    3. Clear
    4. Contains
    5. 🅿️ CopyTo
    6. Dispose
    7. EnsureCapacity
    8. GetEnumerator
    9. IndexOf
    10. LongIndexOf
    11. Insert
    12. Remove
    13. RemoveAt(int index)
    14. RemoveAt(long index)
    15. RemoveAt
    16. Resize
    17. this[int index]
    18. this[long index]
    19. ✅ Rename RecyclableList<T> to RecyclableLongList<T>
    20. ✅ Rename RecyclableArrayList<T> to RecyclableList<T>
    21. ✅ Fix failing tests
    22. ✅ Hide not ready classes
      1. RecyclableQueue<T>
      2. RecyclableSortedList<T>
      3. RecyclableStack<T>
      4. RecyclableUnorderedList<T>
    23. ✅ Add support for ReadOnlySpan<T>
    24. ✅ Release 0.0.3-alpha
    25. ✅ Implement List<T> interfaces
      1. IList<T>
      2. ICollection<T>
      3. IEnumerable<T>
      4. IEnumerable
      5. IReadOnlyList<T>
      6. IReadOnlyCollection<T>
      7. IList
      8. ICollection
    26. ✅ Implement list versioning to allow data change identification
    27. ✅ Make sure that NeedsClearing is used & items are cleared in
      1. RecyclableList<T>
        1. Clear
        2. Dispose
        3. Remove
        4. RemoveAt
        5. ✅ Properties
        6. ✅ Non-versioned & Versioned Enumerators
        7. RecyclableListHelpers
          1. EnsureCapacity
          2. Resize
        8. IList<T> implementation
      2. RecyclableLongList<T>
        1. Clear
        2. Dispose
        3. Remove
        4. RemoveAt
        5. RemoveBlock
        6. ✅ Properties
        7. ✅ Non-versioned & Versioned Enumerators
        8. RecyclableLongListHelpers
          1. CopyFollowingItems
          2. CopyTo
          3. EnsureCapacity
          4. Resize
        9. RecyclableLongList<T>.IndexOfHelpers
    28. ✅ Complete the implementation of RecyclableLongList<T>
      1. InsertAt to allow inserting items anywhere
      2. RemoveAt to allow removing item at any index
    29. ✅ Release 0.0.4
    30. 👉 Add .ToRecyclableList / .ToRecyclableLongList variants for all supported collection types
      1. 🅿️ RecyclableList
      2. 🅿️ RecyclableLongList
      3. 🅿️ IList<T>
      4. 🅿️ ICollection<T>
      5. 🅿️ IEnumerable<T>
      6. 🅿️ IEnumerable
      7. 🅿️ IReadOnlyList<T>
      8. 🅿️ IReadOnlyCollection<T>
      9. 🅿️ IList
      10. 🅿️ ICollection
    31. 🅿️ Release 0.0.5
    32. 🅿️ Add support for ulong indexing
      1. 🅿️ Convert _memoryBlocks to Array to allow ulong lengths
      2. 🅿️ Convert block indexes from int to ulong or long
    33. 🅿️ Overflow review
      1. 🅿️ Add type casting to long for << & >> operations, where required
      2. 🅿️ Make type castings checked
    34. 🅿️ Port RecyclableLongList<T> optimizations to RecyclableList<T>
      1. IndexOf parallelization
    35. 🅿️ Release 0.0.6
  6. 🅿️ Implement ILongList<T> interface

    1. 🅿️ RecyclableList<T>
    2. 🅿️ RecyclableLongList<T>
  7. 🅿️ Implement RecyclableQueue<T>

    1. 🅿️ Port RecyclableLongList<T> optimizations to RecyclableQueue<T>
    2. 🅿️ Release 0.0.7
  8. 🅿️ Implement RecyclableStack<T>

    1. 🅿️ Port RecyclableLongList<T> optimizations to RecyclableStack<T>
    2. 🅿️ Release 0.0.8
  9. 🅿️ Implement RecyclableSortedList<T>

    1. 🅿️ Port RecyclableLongList<T> optimizations to RecyclableSortedList<T>
    2. 🅿️ Release 0.0.9
  10. 🅿️ Implement RecyclableUnorderedList<T>

    1. 🅿️ Port RecyclableLongList<T> optimizations to RecyclableUnorderedList<T>
    2. 🅿️ Release 0.0.10
  11. 🅿️ Implement RecyclableVersionedList<T>

    1. 🅿️ Port RecyclableList<T> to RecyclableVersionedList<T>
    2. 🅿️ Port RecyclableLongList<T> to RecyclableVersionedLongList<T>
    3. 🅿️ Release 0.0.11
  12. 🅿️ Optimize OneSizeArrayPool

    1. 🅿️ Review locks
    2. 🅿️ Measure multi-threading performance
    3. 🅿️ Implement memory bucket disposal in high RAM pressure scenario
  13. 🅿️ Optimize RecyclableCollectionVersionObjectPool

    1. 🅿️ Implement custom ObjectPool<T>
    2. 🅿️ Measure multi-threading performance
    3. 🅿️ Implement memory bucket disposal in high RAM pressure scenario
  14. 🅿️ Review RecyclableArrayPool

    1. 🅿️ Review locks
    2. 🅿️ Measure multi-threading performance
    3. 🅿️ Implement array disposal in high RAM pressure scenario
  15. 🅿️ Optimize MemoryBucket<T>

    1. 🅿️ Convert to struct, if possible
    2. 🅿️ Find out if there are better replacements
    3. 🅿️ Release 0.0.12
  16. 🅿️ Optimize

    1. 🅿️ IndexOfSynchronizationContext
    2. 🅿️ IndexOfSynchronizationContextPool
    3. 🅿️ ManualResetEventSlimmer
      1. 🅿️ Multi-threading benchmarks
    4. 🅿️ ManualResetEventSlimmerPool
      1. 🅿️ Multi-threading benchmarks
    5. 🅿️ SpinLockSlimmer
      1. 🅿️ Multi-threading benchmarks
  17. 🅿️ Release 0.0.13

  18. 🅿️ Extend unit tests

    1. 🅿️ .Add / .AddRange must allow null values
    2. 🅿️ .Remove / .RemoveAt / .Clear must clear reference when reference type
    3. 🅿️ RecyclableLongListExtensions.CopyTo
  19. 🅿️ Cleanup

    1. 🅿️ Replace LastBlockWithData property with _lastBlockWithData field
    2. 🅿️ Replace properties with field refs, where possible
    3. RecyclableLongListExtensions
    4. ListExtensions
    5. 🅿️ MathUtils
    6. 🅿️ Resolve TODOs
    7. 🅿️ Remove obsolete methods
    8. 🅿️ RecyclableLongListHelpers.Enumerate to benchmarks
  20. 🅿️ Release 0.0.14

  21. 🅿️ Optimize

    1. 🅿️ ListExtensions
    2. 🅿️ MathUtils
    3. 🅿️ SystemRandomNumberGenerator
    4. 🅿️ RecyclableLongListExtensions
  22. 🅿️ Release 0.0.15

  23. ✅ Review and remove warnings & hints

    1. ✅ Warnings
    2. ✅ Hints
  24. Documentation

    1. 🅿️ Document the most efficient iteration over array collections
    2. 🅿️ Document the most efficient iteration over blocked collections
    3. 🅿️ Document differences in behavior
    4. 🅿️ Document other specifics
  25. 🅿️ Release 1.0.0

  26. 🅿️ Implement source code generators

    1. 🅿️ Add attributes
      1. 🅿️ GeneratorBaseClassAttribute for marking base class used for generation
      2. 🅿️ VersionedAttribute for marking classes as versioned
      3. 🅿️ IncVersionAttribute for marking methods & setters as resulting in version increase
    2. 🅿️ Implement source code generator
      1. 🅿️ Generate partial class for classes marked with GeneratorBaseClassAttribute
      2. 🅿️ Add support for fields
      3. 🅿️ Add support for properties
      4. 🅿️ Add support for methods
      5. 🅿️ Add support for constructor
      6. 🅿️ Skip base fields, methods, properties etc. when they're overridden in the child class
  27. 🅿️ Optimize

    1. 🅿️ RecyclableLongList<T>.Resize
    2. 🅿️ RecyclableLongList<T>.CopyTo
    3. 🅿️ Check if we can benefit from Sse2 in .IndexOf/.Contains methods as given in MS blog.
    4. 🅿️ Memory pools
      1. 🅿️ OneSizeArrayPool<T>
      2. 🅿️ RecyclableArrayPool<T>
  28. 🅿️ Add support for ICollection<T> interface in .AddRange & constructor

  29. 🅿️ Final optimizations

    1. 🅿️ Replace Math class usages with if statements
    2. 🅿️ Replace a - b > 0 & a - b < 0 comparisons with a > b & a < b
    3. 🅿️ Replace a + b > 0 & a + b < 0 comparisons with a > b & a < b
    4. 🅿️ Replace a / b & a * b calculations with equivalents, where possible
    5. 🅿️ Replace virtual calls with static calls
    6. 🅿️ Separate ++number, number++, --number & number-- operations to their own lines (they add extra variable)
    7. 🅿️ Replace blockSize sums by powers of 2, minus 1
    8. 🅿️ Remove type castings, if possible
    9. 🅿️ Convert generic methods to non-generic
    10. 🅿️ Replace integer comparisons with comparisons to 0 / 1, if possible
  30. 🅿️ Release 2.0.0

Characteristics of the classes

Common

  • All classes implement IDisposable interface & SHOULD be disposed after use. That's to return block arrays taken from the shared pool. It may be foreseen as an issue for replacement in existing code, which obviously is missing using clause. But considering that Dispose will be called by GC anyway, it should cause issues in specific scenarios, only. In either case the fix is one-word addition of using.

  • Memory blocks are created in 2 ways, depending on blockSize value

    • when blockSize < 128 ⇨ blocks are created as regular arrays.
    • when blockSize >= 128 ⇨ blocks are taken from and returned to ArrayPool<T>.Shared, when blocks are removed.
    • maximum blockSize value is 2_147_483_591, which corresponds more or less to the maximum array size.
    • blockSize MUST be a power of 2. It will be rounded up to the closest power of 2, if needed. That is due to high performance gain on some operations, like the calculation of item index in a block.
  • Array pools are shared between the same T type. I.e. List<int> will use a different pool from List<short> and so on. For high concurrency environments you may want to provide your own pools, when this feature becomes available in the upcoming releases.

  • Trying to access this[int index], this[long index], Count, LongCount etc. when Capacity == 0 will raise NullReferenceException. This is by design to remove all non-critical code from the constructor.

  • ⚠️ The default enumerators don't check if the collection was modified. That's by design due to high performance hit of the check. That should be fine in most use-cases. If you need to check for modifications, you're welcomed to use .GetVersionedEnumerator() / IRecyclableVersionedXxxList<T> type-casting, e.g. in foreach loops.

      foreach (var item in (IRecyclableVersionedList<long>)recyclableList)
      {
      	...
      	...
      	...
      }
    
      foreach (var item in (IRecyclableVersionedLongList<long>)recyclableLongList)
      {
      	...
      	...
      	...
      }
    
  • ⚠️ The state of the default enumerators before the first & after the last call to .MoveNext() is undefined, but no exception is raised. That's by design due to high performance hit of the checks. This behavior is compatible with foreach loops, which will never access .Current property

    • when the collection is empty, or
    • when .MoveNext() returned false, after reaching the end of the collection.
  • ⚠️ If you decide to use .GetEnumerator() / .GetVersionedEnumerator(), instead of relying on foreach loops

    • you MUST ensure that you always respect the result of .MoveNext() call, and
    • you MUST NOT access .Current property before calling .MoveNext() or after reaching the end of the collection.


    Failing to do so WILL result in unpredictable behavior.

    [Benchmark results](https://github.com/mlemanczyk/Recyclable.Collections/blob/main/Recyclable.Collections.Benchmarks/Results/Benchmarks.md#`IndexOf` Benchmarks)

RecyclableList<T>

  • Range: int
  • Interfaces: IList<T>, IEnumerable<T>, IDisposable

    This is the direct equivalent of List<T> class, except that the arrays are taken from the shared pool, instead of directly being allocated. It proves to out perform List<T> & the standard arrays in certain operations and is fairly close in others. But they may perform worse in certain scenarios. If you're interested in details, please refer to benchmarks.

RecyclableLongList<T>, RecyclableSortableList<T>, RecyclableQueue<T>, RecyclableStack<T>

  • Range: long

  • Interfaces: IList<T>, ILongList<T>, IEnumerable<T>, IDisposable

  • All classes provide large storage capabilities, in practice limited only by the memory available in your system.

  • All data is stored in blocks, with the provided int blockSize or the default, currently being 10,240 items in each block. It's recommended to use smaller numbers, if you foresee storing low no. of items on the list.

  • There is no memory copying to increase the capacity of the recyclable classes, unless the no. of items exceeds the no. of allocated blocks. In that case, a new array of blocks is created to accommodate . If more capacity is needed, a new memory block is allocate & added to the internal list of blocks. This is the only list that may re-allocate memory block & copy their content to grow. That shouldn't be an issue considering limited no. of blocks stored by each class in practical scenarios. Currently I'm not planning any works around that. Shall there be a need to eliminate it, you're welcomed to file a PR with the proposed changes.

  • ⚠️ IndexOf will automatically switch to parallel search mode. Collections bigger than (currently) 850_000 items will be scanned using parallelization. In either case it'll return the index of the first matching item. This note also applies to Contains, because it internally uses IndexOf. The value of 850_000 may become configurable in the future, unless it proves to result in much worse performance.

    The parallelization utilizes default Task scheduling, resulting in non-deterministic order in which search tasks are executed. In most cases it resulted in significant performance improvements. Below you can find the benchmarks for various item counts.

    As RecyclableLongList<T> is foreseen for use-cases with big data, you should automatically benefit from it in many use-cases.

    Benchmark Results

Architecture Decisions

The architecture of the classes included in this project is driven primarily by performance & memory footprints. The following decisions were made to support it.

Code duplication

There is high level of code duplication in each of the classes. When you look at them, you will see they are alike & some of the code could be extracted to extension methods. It has been done & has proven to perform worse. For this reason, properties & methods expected to be on the hot-path, duplicate much of the code to get the most out of it.

Common base - ILongList<T>

Having a common base class for all the classes looks very attractive. But it is also proved to perform worse, if you start making properties & methods virtual & there are subsequent calls to base. For this reason, each of the classes is the root class & there are hardly any virtual methods & properties. All classes are meant to be final, but I've not marked them as sealed. If you feel that you'd benefit from it, you're welcomed to inherit from them. The protected stuff is there for your use.

If you desire to have a common base for calls, each of the classes implements IList<T> & ILongList<T> interfaces. ILongList<T> is a new interface, made alike IList<T>, but unlocking full support for long indexing & use. Considering the characteristics of the classes I feel that the generic name is justified, instead of IRecyclableList<T> what you could expect.

Protection against incorrect use & state

Having protection against invalid state of the object is an important part of software engineering. The classes included in this project provide the same level of protection as the regular array & List class. They will raise exceptions when invalid index is given. But there is no additional protection that the content & state of the objects internally used by these classes are valid. You can break them, if you want to.

This decision is driven by performance measures. Considering wide use of the lists on the hot-paths, each additional check & operation starts counting. I've decided to eliminate all the checks for the sake of performance. If you feel that you need additional protection, feel free to inherit or reuse any of the provided classes to provide that.

this[int index], Count, IndexOf(T item) etc. may overflow, but...

All the classes included in this package support long indexing and addressing. The LongCount property will always provide the total no. of items stored on the list. Since Count is just LongCount type-casted to int, its value may overflow.

But List<T> addressing is kind of currently limited to int ranges anyway in practical scenarios. As such Count overflowing shouldn't be an issue when using the classes with existing code. If you use Count in your loops, it's recommended to replace it with LongCount, tough. You don't need to make any changes in foreach or other iterator loops. They are expected to transparently use the new classes & yield all items, beyond int.MaxValue range.

This is by design, for the best performance. Initially there have been checks in place to prevent overflowing & return int.MaxValue instead, but they proved to slow down the operations. Considering low risk for existing code, I've decided to take it out.

If you foresee the overflowing as an issue, you're welcomed to propose an improvement & file a PR with proposed solution.

Auto-properties, new language features & code style

When you look at the code you may note, that it looks like in early c# days. Where the new language features yield the same or better performance, they are & will be used.

However, many of the new language features have proved to provide worse performance at some point in my testing. It includes auto-properties, for which I could see getters & setters generated in IL, instead of fields. Because all the classes are expected to be on the hot-paths, I've eliminated most of them.

Similarly, you will find places in code with hints or warnings disabled. The decision was always driven by performance measures. The use case & business logic was carefully reviewed to ensure it's safe to take shortcuts, before making the change. If you run into failing scenario, please feel free to create a work item against it and/or file a PR to fix it.

Finally, you'll find places with code commented out. That's to easily track what has been tried & what proved to perform worse.

At any time, if you know & can show in your benchmarks that the code can be simplified without impacting the performance, feel free to file a PR with proposed changes. Please include benchmark results from your testing to speed up things.

Thread Safety

The public static members of the classes are thread safe. Any instance members are not guaranteed to be thread safe. You need to implement locking mechanism to safely use the classes in a multi-threading environment.

Use

All the classes included in this package are meant to be direct replacements of their corresponding system classes.

  • IList<T>ILongList<T>
  • List<T>RecyclableLongList<T>
  • Queue<T>RecyclableQueue<T>
  • PriorityQueue<T>RecyclableSortedList<T>
  • SortedList<T>RecyclableSortedList<T>
  • Stack<T>RecyclableStack<T>

Examples of each of the above classes are provided below. Please refer automated tests for more.

RecyclableList<T>

It's the closest equivalent to List<T> in regards to functionality. It supports items' removal & delivers the best performance in most scenarios. But it's limited to int range, tough. If you find that acceptable, my recommendation is to use this class as the direct replacement for List<T>.

Do note, that it implements ILongList<T> interface, too. This is to allow having common code base regardless, if you use the basic RecyclableList<T> or it's long-range version - RecyclableLongList<T>. By doing that you can switch between them with highly limited changes in code.

RecyclableLongList<T>

public void RecyclableListExample()
{
	int[] testNumbers = new[] { 1, 2, 2, 3 };

	// Create
	using var list = new RecyclableLongList<int>();

	// Add items
	foreach (var index in testNumbers)
	{
		list.Add(index);
	}

	foreach (var item in list)
	{
		Console.WriteLine(item);
	}

	// This will iterate over this[long index]
	for (var itemIdx = 0; itemIdx <= list.LongCount; itemIdx++)
	{
		Console.WriteLine(list[itemIdx]);
	}

	// This will iterate over this[int index] and may overflow if there are > int.MaxValue no. of items on the list. See: Architecture Decisions
	for (var itemIdx = 0; itemIdx <= list.LongCount; itemIdx++)
	{
		Console.WriteLine(list[itemIdx]);
	}

	list.Clear();
}

[Benchmark results](https://github.com/mlemanczyk/Recyclable.Collections/blob/main/Recyclable.Collections.Benchmarks/Results/Benchmarks.md#`IndexOf` Benchmarks)

Product Compatible and additional computed target framework versions.
.NET net6.0 is compatible.  net6.0-android was computed.  net6.0-ios was computed.  net6.0-maccatalyst was computed.  net6.0-macos was computed.  net6.0-tvos was computed.  net6.0-windows was computed.  net7.0 is compatible.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 was computed.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages (4)

Showing the top 4 NuGet packages that depend on Recyclable.Collections:

Package Downloads
Recyclable.Collections.TestData

`Recyclable.Collections.TestData` project is a basic open source package for containing test data useful for testing collections like `RecyclableList<T>`, `RecyclableLongList<T>`, `List<T>`, `SortableList<T>`, `PriorityQueue<T>` & similar.

Collections.Benchmarks.Core

`Collections.Benchmarks.Core` project is a basic open source framework for performance measurements based on various collection types like `RecyclableList<T>`, `RecyclableLongList<T>`, `List<T>`, `SortableList<T>`, `PriorityQueue<T>` & similar.

Recyclable.Collections.Compatibility.List

Recyclable.Collections.Compatibility.List project is an open source framework for operating dynamic lists at performance close to raw arrays. It aims at providing minimal memory footprint. It implements IList<T>'s interface and is targeted as direct replacements of List<T>, SortableList<T>, PriorityQueue<T> and similar.

Recyclable.Collections.TestData.xUnit

`Recyclable.Collections.TestData.xUnit` project is a basic open source package for working with repeatable xUnit test data useful for testing collections like `RecyclableList<T>`, `RecyclableLongList<T>`, `List<T>`, `SortableList<T>`, `PriorityQueue<T>` & similar.

GitHub repositories

This package is not used by any popular GitHub repositories.

Various performance improvements and fixes. Initial RecyclableLongList implementation finished.