HDF5.NET
1.0.0-alpha.22
The package has been renamed to PureHDF.
dotnet add package HDF5.NET --version 1.0.0-alpha.22
NuGet\Install-Package HDF5.NET -Version 1.0.0-alpha.22
<PackageReference Include="HDF5.NET" Version="1.0.0-alpha.22" />
paket add HDF5.NET --version 1.0.0-alpha.22
#r "nuget: HDF5.NET, 1.0.0-alpha.22"
// Install HDF5.NET as a Cake Addin #addin nuget:?package=HDF5.NET&version=1.0.0-alpha.22&prerelease // Install HDF5.NET as a Cake Tool #tool nuget:?package=HDF5.NET&version=1.0.0-alpha.22&prerelease
See https://github.com/Apollo3zehn/HDF5.NET/issues/9 for not yet implemented features.
API Documentation |
---|
.NET Standard 2.0 |
.NET Standard 2.1 |
.NET 5 |
.NET 6 |
HDF5.NET
A pure C# library without native dependencies that makes reading of HDF5 files (groups, datasets, attributes, links, ...) very easy.
The minimum supported target framework is .NET Standard 2.0 which includes
- .NET Framework 4.6.1+
- .NET Core (all versions)
- .NET 5+
This library runs on all platforms (ARM, x86, x64) and operating systems (Linux, Windows, MacOS, Raspbian, etc) that are supported by the .NET ecosystem without special configuration.
The implemention follows the HDF5 File Format Specification.
Overwhelmed by the number of different HDF 5 libraries? Here is a comparison table.
1. Objects
// open HDF5 file, the returned H5File instance represents the root group ('/')
using var root = H5File.OpenRead(filePath);
1.1 Get Object
Group
// get nested group
var group = root.Group("/my/nested/group");
Dataset
// get dataset in group
var dataset = group.Dataset("myDataset");
// alternatively, use the full path
var dataset = group.Dataset("/my/nested/group/myDataset");
Commited Data Type
// get commited data type in group
var commitedDatatype = group.CommitedDatatype("myCommitedDatatype");
Any Object Type
When you do not know what kind of link to expect at a given path, use the following code:
// get H5Object (base class of all HDF5 object types)
var myH5Object = group.Get("/path/to/unknown/object");
1.2 Additional Info
External File Link
With an external link pointing to a relative file path it might be necessary to provide a file prefix (see also this overview).
You can either set an environment variable:
Environment.SetEnvironmentVariable("HDF5_EXT_PREFIX", "/my/prefix/path");
Or you can pass the prefix as an overload parameter:
var linkAccess = new H5LinkAccess()
{
ExternalLinkPrefix = prefix
}
var dataset = group.Dataset(path, linkAccess);
Iteration
Iterate through all links in a group:
foreach (var link in group.Children)
{
var message = link switch
{
H5Group group => $"I am a group and my name is '{group.Name}'.",
H5Dataset dataset => $"I am a dataset, call me '{dataset.Name}'.",
H5CommitedDatatype datatype => $"I am the data type '{datatype.Name}'.",
H5UnresolvedLink lostLink => $"I cannot find my link target =( shame on '{lostLink.Name}'."
_ => throw new Exception("Unknown link type");
}
Console.WriteLine(message)
}
An H5UnresolvedLink
becomes part of the Children
collection when a symbolic link is dangling, i.e. the link target does not exist or cannot be accessed.
2. Attributes
// get attribute of group
var attribute = group.Attribute("myAttributeOnAGroup");
// get attribute of dataset
var attribute = dataset.Attribute("myAttributeOnADataset");
3. Data
The following code samples work for datasets as well as attributes.
// class: fixed-point
var data = dataset.Read<int>();
// class: floating-point
var data = dataset.Read<double>();
// class: string
var data = dataset.ReadString();
// class: bitfield
[Flags]
enum SystemStatus : ushort /* make sure the enum in HDF file is based on the same type */
{
MainValve_Open = 0x0001
AuxValve_1_Open = 0x0002
AuxValve_2_Open = 0x0004
MainEngine_Ready = 0x0008
FallbackEngine_Ready = 0x0010
// ...
}
var data = dataset.Read<SystemStatus>();
var readyToLaunch = data[0].HasFlag(SystemStatus.MainValve_Open | SystemStatus.MainEngine_Ready);
// class: opaque
var data = dataset.Read<byte>();
var data = dataset.Read<MyOpaqueStruct>();
// class: compound
/* option 1 (faster) */
var data = dataset.Read<MyNonNullableStruct>();
/* option 2 (slower, for more info see the link below after this code block) */
var data = dataset.ReadCompound<MyNullableStruct>();
// class: reference
var data = dataset.Read<H5ObjectReference>();
var firstRef = data.First();
/* NOTE: Dereferencing would be quite fast if the object's name
* was known. Instead, the library searches recursively for the
* object. Do not dereference using a parent (group) that contains
* any circular soft links. Hard links are no problem.
*/
/* option 1 (faster) */
var firstObject = directParent.Get(firstRef);
/* option 1 (slower, use if you don't know the objects parent) */
var firstObject = root.Get(firstRef);
// class: enumerated
enum MyEnum : short /* make sure the enum in HDF file is based on the same type */
{
MyValue1 = 1,
MyValue2 = 2,
// ...
}
var data = dataset.Read<MyEnum>();
// class: variable length
var data = dataset.ReadString();
// class: array
var data = dataset
.Read<int>()
/* dataset dims = int[2, 3] */
/* array dims = int[4, 5] */
.ToArray4D(2, 3, 4, 5);
// class: time
// -> not supported (reason: the HDF5 C lib itself does not fully support H5T_TIME)
For more information on compound data, see section Reading compound data.
4. Partial I/O and Hyperslabs
4.1 Overview
Partial I/O is one of the strengths of HDF5 and is applicable to all dataset types (contiguous, compact and chunked). With HDF5.NET, the full dataset can be read with a simple call to dataset.Read()
. However, if you want to read only parts of the dataset, hyperslab selections are your friend. The following code shows how to work with these selections using a three-dimensional dataset (source) and a two-dimensional memory buffer (target):
var dataset = root.Dataset("myDataset");
var memoryDims = new ulong[] { 75, 25 };
var datasetSelection = new HyperslabSelection(
rank: 3,
starts: new ulong[] { 2, 2, 0 },
strides: new ulong[] { 5, 8, 2 },
counts: new ulong[] { 5, 3, 2 },
blocks: new ulong[] { 3, 5, 2 }
);
var memorySelection = new HyperslabSelection(
rank: 2,
starts: new ulong[] { 2, 1 },
strides: new ulong[] { 35, 17 },
counts: new ulong[] { 2, 1 },
blocks: new ulong[] { 30, 15 }
);
var result = dataset
.Read<int>(
fileSelection: datasetSelection,
memorySelection: memorySelection,
memoryDims: memoryDims
)
.ToArray2D(75, 25);
All shown parameters are optional. For example, when the fileSelection
parameter is unspecified, the whole dataset will be read. Note that the number of data points in the file selection must always match that of the memory selection.
Additionally, there is an overload method that allows you to provide your own buffer.
4.2 Experimental: IQueryable (1-dimensional data only)
Another way to build the file selection is to invoke the AsQueryable
method which can then be used as follows:
var result = dataset.AsQueryable<int>()
.Skip(5) // start
.Stride(5) // stride
.Repeat(2) // count
.Take(3) // block
.ToArray();
All methods are optional, i.e. the code
var result = dataset.AsQueryable<int>()
.Skip(5)
.ToArray();
will simply skip the first 5 elements and return the rest of the dataset.
This way of building a hyperslab / selection has been implemented in an efford to provide a more .NET-like experience when working with data.
5. Filters
5.1 Built-in Filters
- Shuffle (hardware accelerated<sup>1</sup>, SSE2/AVX2)
- Fletcher32
- Deflate (zlib)
- Scale-Offset
<sup>1</sup> NET Standard 2.1 and above
5.2 External Filters
Before you can use external filters, you need to register them using H5Filter.Register(...)
. This method accepts a filter identifier, a filter name and the actual filter function.
This function could look like the following and should be adapted to your specific filter library:
public static Memory<byte> FilterFunc(
H5FilterFlags flags,
uint[] parameters,
Memory<byte> buffer)
{
// Decompressing
if (flags.HasFlag(H5FilterFlags.Decompress))
{
// pseudo code
byte[] decompressedData = MyFilter.Decompress(parameters, buffer.Span);
return decompressedData;
}
// Compressing
else
{
throw new Exception("Writing data chunks is not yet supported by HDF5.NET.");
}
}
5.3 Tested External Filters
- deflate (based on Intrinsics.ISA-L.PInvoke, SSE2 / AVX2 / AVX512, benchmark results)
- c-blosc2 (based on Blosc2.PInvoke, SSE2 / AVX2)
- bzip2 (based on SharpZipLib)
5.4 How to use Deflate (hardware accelerated)
(1) Install the P/Invoke package:
dotnet package add Intrinsics.ISA-L.PInvoke
(2) Add the Deflate filter registration helper function to your code.
(3) Register Deflate:
H5Filter.Register(
identifier: H5FilterID.Deflate,
name: "deflate",
filterFunc: DeflateHelper_Intel_ISA_L.FilterFunc);
(4) Enable unsafe code blocks in .csproj
:
<PropertyGroup>
<AllowUnsafeBlocks>true</AllowUnsafeBlocks>
</PropertyGroup>
5.5 How to use Blosc / Blosc2 (hardware accelerated)
(1) Install the P/Invoke package:
dotnet package add Blosc2.PInvoke
(2) Add the Blosc filter registration helper function to your code.
(3) Register Blosc:
H5Filter.Register(
identifier: (H5FilterID)32001,
name: "blosc2",
filterFunc: BloscHelper.FilterFunc);
5.6 How to use BZip2
(1) Install the SharpZipLib package:
dotnet package add SharpZipLib
(2) Add the BZip2 filter registration helper function and the MemorySpanStream implementation to your code.
(3) Register BZip2:
H5Filter.Register(
identifier: (H5FilterID)307,
name: "bzip2",
filterFunc: BZip2Helper.FilterFunc);
6. Reading Compound Data
There are three ways to read structs which are explained in the following sections. Here is an overview:
method | return value | speed | restrictions |
---|---|---|---|
Read<T>() |
T |
fast | predefined type with correct field offsets required; nullable fields are not allowed |
ReadCompound<T>() |
T |
slow | predefined type with matching names required |
ReadCompound() |
Dictionary<string, object> |
slow | - |
6.1 Structs without nullable fields
Structs without any nullable fields (i.e. no strings and other reference types) can be read like any other dataset using a high performance copy operation:
[StructLayout(LayoutKind.Explicit, Size = 5)]
struct SimpleStruct
{
[FieldOffset(0)]
public byte ByteValue;
[FieldOffset(1)]
public ushort UShortValue;
[FieldOffset(3)]
public TestEnum EnumValue;
}
var compoundData = dataset.Read<SimpleStruct>();
Just make sure the field offset attributes matches the field offsets defined in the HDF5 file when the dataset was created.
This method does not require that the structs field names match since they are simply mapped by their offset.
If your struct contains an array of fixed size (here: 3
), you would need to add the unsafe
modifier to the struct definition and define the struct as follows:
[StructLayout(LayoutKind.Explicit, Size = 8)]
unsafe struct SimpleStructWithArray
{
// ... all the fields from the struct above, plus:
[FieldOffset(5)]
public fixed float FloatArray[3];
}
var compoundData = dataset.Read<SimpleStruct>();
6.2 Structs with nullable fields (strings, arrays)
If you have a struct with string
or normal array
fields, you need to use the slower ReadCompound
method:
struct NullableStruct
{
public float FloatValue;
public string StringValue1;
public string StringValue2;
public byte ByteValue;
public short ShortValue;
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 3)]
public float[] FloatArray;
}
var compoundData = dataset.ReadCompound<NullableStruct>();
var compoundData = attribute.ReadCompound<NullableStruct>();
Please note the use of the
MarshalAs
attribute on the array property. This attribute tells the runtime that this array is of fixed size (here:3
) and that it should be treated as value which is embedded into the struct instead of being a separate object.Nested structs with nullable fields are not supported with this method.
Arrays with nullable element type are not supported with this method.
It is mandatory that the field names match exactly those in the HDF5 file. If you would like to use custom field names, consider the following approach:
// Apply the H5NameAttribute to the field with custom name.
struct NullableStructWithCustomFieldName
{
[H5Name("FloatValue")]
public float FloatValueWithCustomName;
// ... more fields
}
// Create a name translator.
Func<FieldInfo, string> converter = fieldInfo =>
{
var attribute = fieldInfo.GetCustomAttribute<H5NameAttribute>(true);
return attribute is not null ? attribute.Name : fieldInfo.Name;
};
// Use that name translator.
var compoundData = dataset.ReadCompound<NullableStructWithCustomFieldName>(converter);
6.3 Unknown structs
You have no idea how the struct in the H5 file looks like? Or it is so large that it is no fun to predefine it? In that case, you can fall back to the non-generic dataset.ReadCompound()
which returns a Dictionary<string, object?>[]
where the dictionary values can be anything from simple value types to arrays or nested dictionaries (or even H5ObjectReference
), depending on the kind of data in the file. Use the standard .NET dictionary methods to work with these kind of data.
The type mapping is as follows:
H5 type | .NET type |
---|---|
fixed point, 1 byte, unsigned | byte |
fixed point, 1 byte, signed | sbyte |
fixed point, 2 bytes, unsigned | ushort |
fixed point, 2 bytes, signed | short |
fixed point, 4 bytes, unsigned | uint |
fixed point, 4 bytes, signed | int |
fixed point, 8 bytes, unsigned | ulong |
fixed point, 8 bytes, signed | long |
floating point, 4 bytes | float |
floating point, 8 bytes, | double |
string | string |
bitfield | byte[] |
opaque | byte[] |
compound | Dictionary<string, object?> |
reference | H5ObjectReference |
enumerated | <base type> |
variable length, type = string | string |
array | <base type>[] |
Not supported data types like time
and variable length type = sequence
will be represented as null
.
7. Advanced Scenarios
7.1 Memory-Mapped File
In some cases, it might be useful to read data from a memory-mapped file instead of a regular FileStream
to reduce the number of (costly) system calls. Depending on the file structure this may heavily increase random access performance. Here is an example:
using var mmf = MemoryMappedFile.CreateFromFile(
fileStream,
mapName: default,
capacity: 0,
MemoryMappedFileAccess.Read,
HandleInheritability.None);
using var mmfStream = mmf.CreateViewStream(
offset: 0,
size: 0,
MemoryMappedFileAccess.Read);
using var root = H5File.Open(mmfStream);
...
7.2 Reading Multidimensional Data
7.2.1 Generic Method
Sometimes you want to read the data as multidimensional arrays. In that case use one of the byte[]
overloads like ToArray3D
(there are overloads up to 6D). Here is an example:
var data3D = dataset
.Read<int>()
.ToArray3D(new long[] { -1, 7, 2 });
The methods accepts a long[]
with the new array dimensions. This feature works similar to Matlab's reshape function. A slightly adapted citation explains the behavior:
When you use
-1
to automatically calculate a dimension size, the dimensions that you do explicitly specify must divide evenly into the number of elements in the input array.
7.2.2 High-Performance Method (2D only)
The previously shown method (ToArrayXD
) performs a copy operation. If you would like to avoid this, you might find the Span2D
type interesting which is part of the CommunityToolkit.HighPerformance. To make use of it, run dotnet add package CommunityToolkit.HighPerformance
and then use it like this:
using CommunityToolkit.HighPerformance;
data2D = dataset
.Read<int>()
.AsSpan()
.AsSpan2D(height: 20, width: 10);
No data are being copied and you can work with the array similar to a normal Span<T>
, i.e. you may want to slice through it.
8 Asynchronous Data Access (.NET 6+)
HDF5.NET supports reading data asynchronously to allow the CPU work on other tasks while waiting for the result.
Note: All
async
methods shown below are only truly asynchronous if the FileStream is opened with theuseAsync
parameter set totrue
:
var h5File = H5File.Open(
filePath,
FileMode.Open,
FileAccess.Read,
FileShare.Read,
useAsync: true);
// alternative
var stream = new FileStream(..., useAsync: true);
var h5File = H5File.Open(stream);
Sample 1: Load data of two datasets
async Task LoadDataAsynchronously()
{
var data1Task = dataset1.ReadAsync<int>();
var data2Task = dataset2.ReadAsync<int>();
await Task.WhenAll(data1Task, data2Task);
}
Sample 2: Load data of two datasets and process it
async Task LoadAndProcessDataAsynchronously()
{
var processedData1Task = Task.Run(async () =>
{
var data1 = await dataset1.ReadAsync<int>();
ProcessData(data1);
});
var processedData2Task = Task.Run(async () =>
{
var data2 = await dataset2.ReadAsync<int>();
ProcessData(data2);
});
await Task.WhenAll(processedData1Task, processedData2Task);
}
Sample 3: Load data of a single dataset and process it
async Task LoadAndProcessDataAsynchronously()
{
var processedData1Task = Task.Run(async () =>
{
var fileSelection1 = new HyperslabSelection(start: 0, block: 50);
var data1 = await dataset1.ReadAsync<int>(fileSelection1);
ProcessData(data1);
});
var processedData2Task = Task.Run(async () =>
{
var fileSelection2 = new HyperslabSelection(start: 50, block: 50);
var data2 = await dataset2.ReadAsync<int>(fileSelection2);
ProcessData(data2);
});
await Task.WhenAll(processedData1Task, processedData2Task);
}
9 Comparison Table
The following table considers only projects listed on Nuget.org.
Name | Arch | Platform | Kind | Mode | Version | License | Maintainer | Comment |
---|---|---|---|---|---|---|---|---|
v1.10 | ||||||||
HDF5.NET | all | all | managed | ro | N/A | MIT | Apollo3zehn | version does not apply, standalone implementation |
HDF5-CSharp | x86,x64 | Win,Lin,Mac | HL | rw | 1.10.6 | MIT | LiorBanai | |
SciSharp.Keras.HDF5 | x86,x64 | Win,Lin,Mac | HL | rw | 1.10.5 | MIT | SciSharp | fork of HDF-CSharp |
ILNumerics.IO.HDF5 | x64 | Win,Lin | HL | rw | ? | proprietary | IL_Numerics_GmbH | probably 1.10 |
LiteHDF | x86,x64 | Win,Lin,Mac | HL | ro | 1.10.5 | MIT | silkfire | |
hdflib | x86,x64 | Windows | HL | wo | 1.10.6 | MIT | bdebree | |
Mbc.Hdf5Utils | x86,x64 | Win,Lin,Mac | HL | rw | 1.10.6 | Apache-2.0 | bqstony | |
HDF.PInvoke | x86,x64 | Windows | bindings | rw | 1.8,1.10 | HDF5 | hdf,gheber | |
HDF.PInvoke.1.10 | x86,x64 | Win,Lin,Mac | bindings | rw | 1.10.6 | HDF5 | hdf,Apollo3zehn | |
HDF.PInvoke.NETStandard | x86,x64 | Win,Lin,Mac | bindings | rw | 1.10.5 | HDF5 | surban | |
v1.8 | ||||||||
HDF5DotNet.x64 | x64 | Windows | HL | rw | 1.8 | HDF5 | thieum | |
HDF5DotNet.x86 | x86 | Windows | HL | rw | 1.8 | HDF5 | thieum | |
sharpHDF | x64 | Windows | HL | rw | 1.8 | MIT | bengecko | |
HDF.PInvoke | x86,x64 | Windows | bindings | rw | 1.8,1.10 | HDF5 | hdf,gheber | |
hdf5-v120-complete | x86,x64 | Windows | native | rw | 1.8 | HDF5 | daniel.gracia | |
hdf5-v120 | x86,x64 | Windows | native | rw | 1.8 | HDF5 | keen |
Abbreviations:
Term | .NET API | Native dependencies |
---|---|---|
managed |
high-level | none |
HL |
high-level | C-library |
bindings |
low-level | C-library |
native |
none | C-library |
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net5.0 is compatible. net5.0-windows was computed. net6.0 is compatible. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. |
.NET Core | netcoreapp2.0 was computed. netcoreapp2.1 was computed. netcoreapp2.2 was computed. netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
.NET Standard | netstandard2.0 is compatible. netstandard2.1 is compatible. |
.NET Framework | net461 was computed. net462 was computed. net463 was computed. net47 was computed. net471 was computed. net472 was computed. net48 was computed. net481 was computed. |
MonoAndroid | monoandroid was computed. |
MonoMac | monomac was computed. |
MonoTouch | monotouch was computed. |
Tizen | tizen40 was computed. tizen60 was computed. |
Xamarin.iOS | xamarinios was computed. |
Xamarin.Mac | xamarinmac was computed. |
Xamarin.TVOS | xamarintvos was computed. |
Xamarin.WatchOS | xamarinwatchos was computed. |
-
.NETStandard 2.0
- System.Memory (>= 4.5.5)
- System.Runtime.CompilerServices.Unsafe (>= 6.0.0)
- System.Threading.Tasks.Extensions (>= 4.5.4)
-
.NETStandard 2.1
- System.Runtime.CompilerServices.Unsafe (>= 6.0.0)
-
net5.0
- No dependencies.
-
net6.0
- No dependencies.
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
Version | Downloads | Last updated | |
---|---|---|---|
1.0.0-alpha.22 | 1,402 | 1/20/2023 | |
1.0.0-alpha.21 | 201 | 1/6/2023 | |
1.0.0-alpha.20 | 166 | 1/3/2023 | |
1.0.0-alpha.19 | 154 | 1/2/2023 | |
1.0.0-alpha.18 | 303 | 12/13/2022 | |
1.0.0-alpha.16.final | 30,246 | 6/28/2022 | |
1.0.0-alpha.15.final | 215 | 6/21/2022 | |
1.0.0-alpha.13.final | 178 | 6/20/2022 | |
1.0.0-alpha.12.final | 8,419 | 12/17/2021 | |
1.0.0-alpha.11.final | 3,678 | 9/29/2021 | |
1.0.0-alpha.10.final | 5,396 | 5/7/2021 | |
1.0.0-alpha.9.final | 222 | 5/3/2021 | |
1.0.0-alpha.8.final | 330 | 4/9/2021 | |
1.0.0-alpha.7.final | 226 | 4/8/2021 | |
1.0.0-alpha.6.final | 213 | 4/1/2021 | |
1.0.0-alpha.5.final | 224 | 4/1/2021 | |
1.0.0-alpha.4.final | 281 | 3/27/2021 | |
1.0.0-alpha.3.final | 246 | 3/25/2021 | |
1.0.0-alpha.2.final | 213 | 3/25/2021 | |
1.0.0-alpha.1.final | 242 | 3/25/2021 |