FilePrepper 0.4.8

.NET 10.0

dotnet add package FilePrepper --version 0.4.8

NuGet\Install-Package FilePrepper -Version 0.4.8

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="FilePrepper" Version="0.4.8" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="FilePrepper" Version="0.4.8" />
                    

                            Directory.Packages.props

<PackageReference Include="FilePrepper" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add FilePrepper --version 0.4.8

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: FilePrepper, 0.4.8"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package FilePrepper@0.4.8

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=FilePrepper&version=0.4.8
                    

                            Install as a Cake Addin

#tool nuget:?package=FilePrepper&version=0.4.8
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

FilePrepper

A powerful .NET library and CLI tool for data preprocessing. Features a Pipeline API for efficient in-memory data transformations with 67-90% reduction in file I/O. Perfect for ML data preparation, ETL pipelines, and data analysis workflows.

🚀 Quick Start

SDK Installation

# Install FilePrepper SDK for programmatic use
dotnet add package FilePrepper

# Or install CLI tool globally
dotnet tool install -g fileprepper-cli

SDK Usage (Recommended)

using FilePrepper.Pipeline;

// CSV Processing: Only 2 file I/O operations (read + write)
await DataPipeline
    .FromCsvAsync("data.csv")
    .Normalize(columns: new[] { "Age", "Salary", "Score" },
               method: NormalizationMethod.MinMax)
    .FillMissing(columns: new[] { "Score" }, method: FillMethod.Mean)
    .FilterRows(row => int.Parse(row["Age"]) >= 30)
    .ToCsvAsync("output.csv");

// Multi-Format Support: Excel → Transform → JSON
await DataPipeline
    .FromExcelAsync("sales.xlsx")
    .AddColumn("Total", row =>
        (double.Parse(row["Price"]) * double.Parse(row["Quantity"])).ToString())
    .FilterRows(row => double.Parse(row["Total"]) >= 1000)
    .ToJsonAsync("high_value_sales.json");

// Multi-File CSV Concatenation: Merge 33 files ⭐ NEW
await DataPipeline
    .ConcatCsvAsync("kemp-*.csv", "dataset/")
    .ParseKoreanTime("Time", "ParsedTime")  // Korean time format ⭐ NEW
    .ExtractDateFeatures("ParsedTime", DateFeatures.Hour | DateFeatures.Minute)
    .ToCsvAsync("processed.csv");

CLI Usage

# Normalize numeric columns
fileprepper normalize-data --input data.csv --output normalized.csv \
  --columns "Age,Salary,Score" --method MinMax

# Fill missing values
fileprepper fill-missing-values --input data.csv --output filled.csv \
  --columns "Age,Salary" --method Mean

# Get help
fileprepper --help
fileprepper <command> --help

📦 Supported Formats

Process data in multiple formats:

CSV (Comma-Separated Values)
TSV (Tab-Separated Values)
JSON (JavaScript Object Notation)
XML (Extensible Markup Language)
Excel (XLSX/XLS files)

🛠️ Available Commands (26+)

Data Transformation

normalize-data - Normalize columns (MinMax, ZScore)
scale-data - Scale numeric data (StandardScaler, MinMaxScaler, RobustScaler)
one-hot-encoding - Convert categorical to binary columns
data-type-convert - Convert column data types
date-extraction - Extract date features (Year, Month, Day, DayOfWeek)
datetime - Parse datetime and extract features ⭐ Phase 2
string - String transformations (upper, lower, trim, substring) ⭐ Phase 2
conditional - Conditional column creation with if-then-else logic ⭐ Phase 2

Data Cleaning

fill-missing-values - Fill missing data (Mean, Median, Mode, Forward, Backward, Constant)
drop-duplicates - Remove duplicate rows
value-replace - Replace values in columns

Column Operations

add-columns - Add new calculated columns
remove-columns - Delete unwanted columns
rename-columns - Rename column headers
reorder-columns - Change column order
column-interaction - Create interaction features

Data Analysis

basic-statistics - Calculate statistics (Mean, Median, StdDev, ZScore)
aggregate - Group and aggregate data
filter-rows - Filter rows by conditions
merge-asof - Time-series merge with tolerance ⭐ Phase 2

Data Organization

merge - Combine multiple files (Horizontal/Vertical merge)
merge-asof - Time-series merge with tolerance ⭐ Phase 2
data-sampling - Sample rows (Random, Stratified, Systematic)
file-format-convert - Convert between formats
unpivot - Reshape data from wide to long format ⭐ Phase 2

Feature Engineering

create-lag-features - Create time-series lag features
window - Window operations (resample, rolling aggregations) ⭐ Phase 2
file-format-convert - Convert between formats

💡 Common Use Cases

Data Cleaning Pipeline (CLI)

# 1. Remove unnecessary columns
fileprepper remove-columns --input raw.csv --output step1.csv \
  --columns "Debug,TempCol,Notes"

# 2. Drop duplicates
fileprepper drop-duplicates --input step1.csv --output step2.csv \
  --columns "Email" --keep First

# 3. Fill missing values
fileprepper fill-missing-values --input step2.csv --output step3.csv \
  --columns "Age,Salary" --method Mean

# 4. Normalize numeric columns
fileprepper normalize-data --input step3.csv --output clean.csv \
  --columns "Age,Salary,Score" --method MinMax

Time-Series Processing (Phase 2) ⭐

# 5-minute window aggregation for sensor data
fileprepper window --input sensor_current.csv --output aggregated.csv \n  --type resample --method mean \n  --columns "RMS[A]" --time-column "Time_s[s]" \n  --window 5T --header

# Rolling window for smoothing
fileprepper window --input noisy_data.csv --output smoothed.csv \n  --type rolling --method mean \n  --columns temperature,humidity --window-size 3 \n  --suffix "_smooth" --header

ML Feature Engineering (SDK - Efficient!)

using FilePrepper.Pipeline;

// Single pipeline: Only 2 file I/O operations instead of 8!
await DataPipeline
    .FromCsvAsync("orders.csv")
    .AddColumn("Year", row => DateTime.Parse(row["OrderDate"]).Year.ToString())
    .AddColumn("Month", row => DateTime.Parse(row["OrderDate"]).Month.ToString())
    .Normalize(columns: new[] { "Revenue", "Quantity" },
               method: NormalizationMethod.MinMax)
    .FilterRows(row => int.Parse(row["Year"]) >= 2023)
    .ToCsvAsync("features.csv");

// 67-90% reduction in file I/O compared to CLI approach!

Format Conversion

# CSV to JSON
fileprepper file-format-convert --input data.csv --output data.json --format JSON

# Excel to CSV
fileprepper file-format-convert --input report.xlsx --output report.csv --format CSV

# CSV to XML
fileprepper file-format-convert --input data.csv --output data.xml --format XML

Data Analysis

# Calculate statistics
fileprepper basic-statistics --input data.csv --output stats.csv \
  --columns "Age,Salary,Score" --statistics Mean,Median,StdDev,ZScore

# Aggregate by group
fileprepper aggregate --input sales.csv --output summary.csv \
  --group-by "Region,Category" --agg-columns "Revenue:Sum,Quantity:Mean"

# Sample data
fileprepper data-sampling --input large.csv --output sample.csv \
  --method Random --sample-size 1000

🔧 Programmatic Usage (SDK)

FilePrepper provides a powerful SDK with Pipeline API for efficient data processing:

dotnet add package FilePrepper

✨ Pipeline API (Recommended)

Benefits: 67-90% reduction in file I/O, fluent API, in-memory processing

using FilePrepper.Pipeline;
using FilePrepper.Tasks.NormalizeData;

// Efficient: Only 2 file I/O operations (read + write)
await DataPipeline
    .FromCsvAsync("data.csv")
    .Normalize(columns: new[] { "Age", "Salary", "Score" },
               method: NormalizationMethod.MinMax)
    .FillMissing(columns: new[] { "Score" }, method: FillMethod.Mean)
    .FilterRows(row => int.Parse(row["Age"]) >= 30)
    .AddColumn("ProcessedDate", _ => DateTime.Now.ToString())
    .ToCsvAsync("output.csv");

// Or work in-memory without any file I/O
var result = DataPipeline
    .FromData(inMemoryData)
    .Normalize(columns: new[] { "Age", "Salary" },
               method: NormalizationMethod.MinMax)
    .ToDataFrame();  // Get immutable snapshot

Advanced Pipeline Features

// Chain multiple transformations
var pipeline = await DataPipeline
    .FromCsvAsync("sales.csv")
    .RemoveColumns(new[] { "Debug", "TempCol" })
    .RenameColumn("OldName", "NewName")
    .AddColumn("Total", row =>
        (double.Parse(row["Price"]) * double.Parse(row["Quantity"])).ToString())
    .FilterRows(row => double.Parse(row["Total"]) > 100)
    .Normalize(columns: new[] { "Total" }, method: NormalizationMethod.MinMax);

// Get intermediate results without file I/O
var dataFrame = pipeline.ToDataFrame();
Console.WriteLine($"Processed {dataFrame.RowCount} rows");

// Continue processing
await pipeline
    .AddColumn("ProcessedAt", _ => DateTime.UtcNow.ToString("o"))
    .ToCsvAsync("output.csv");

In-Memory Processing

// Work entirely in memory - zero file I/O
var data = new List<Dictionary<string, string>>
{
    new() { ["Name"] = "Alice", ["Age"] = "25", ["Salary"] = "50000" },
    new() { ["Name"] = "Bob", ["Age"] = "30", ["Salary"] = "60000" }
};

var result = DataPipeline
    .FromData(data)
    .Normalize(columns: new[] { "Age", "Salary" },
               method: NormalizationMethod.MinMax)
    .AddColumn("Category", row =>
        int.Parse(row["Age"]) < 30 ? "Junior" : "Senior")
    .ToDataFrame();

// Access results directly
foreach (var row in result.Rows)
{
    Console.WriteLine($"{row["Name"]}: {row["Category"]}");
}

Traditional Task API

using FilePrepper.Tasks.NormalizeData;
using Microsoft.Extensions.Logging;

var options = new NormalizeDataOption
{
    InputPath = "data.csv",
    OutputPath = "normalized.csv",
    TargetColumns = new[] { "Age", "Salary", "Score" },
    Method = NormalizationMethod.MinMax
};

var task = new NormalizeDataTask(logger);
var context = new TaskContext(options);
bool success = await task.ExecuteAsync(context);

See SDK Usage Guide for comprehensive examples and best practices.

📖 Documentation

Getting Started

Quick Start Guide - Get started in 5 minutes
CLI Guide - Complete command reference
Installation Guide - Detailed installation

SDK & Programming

API Reference - Pipeline API and Task API reference
Quick Start Guide - Get started with SDK in 5 minutes

Advanced Features

Phase 2 Complete Guide - Window operations, datetime, string, conditional features ⭐
Common Scenarios - Real-world use cases

Use Cases

Common Scenarios - Real-world use cases

For more documentation, see the docs/ directory.

🎯 Use Cases

Machine Learning - Prepare datasets for training (normalization, encoding, feature engineering)
Time-Series Analysis - Window aggregations, resampling, lag features ⭐ Phase 2 - Prepare datasets for training (normalization, encoding, feature engineering)
Data Analysis - Clean and transform data for analysis
ETL Pipelines - Extract, transform, and load data workflows with minimal I/O overhead
Data Migration - Convert between formats and clean legacy data
Automation - Script data processing with SDK or CLI
In-Memory Processing - Chain transformations without file I/O costs

📋 Requirements

.NET 9.0 or later
Cross-platform - Windows, Linux, macOS
Flexible Usage - CLI tool (no coding) or SDK (programmatic)

🤝 Contributing

Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🔗 Links

SDK NuGet Package: https://www.nuget.org/packages/FilePrepper
CLI NuGet Package: https://www.nuget.org/packages/fileprepper-cli
GitHub Repository: https://github.com/iyulab/FilePrepper
Issues: https://github.com/iyulab/FilePrepper/issues
Documentation: docs/
Changelog: CHANGELOG.md

**Made with ❤️ by iyulab | Efficient Data Preprocessing - CLI & SDK | Phase 2 Complete ⭐

Product	Compatible and additional computed target framework versions.
.NET	net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net10.0
- CsvHelper (>= 33.1.0)
- EPPlus (>= 8.2.1)
- ExcelDataReader (>= 3.8.0)
- ExcelDataReader.DataSet (>= 3.8.0)
- Microsoft.Extensions.DependencyInjection.Abstractions (>= 10.0.0)
- Microsoft.Extensions.Logging.Abstractions (>= 10.0.0)
- Microsoft.Extensions.Options (>= 10.0.0)
- Scrutor (>= 6.1.0)

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
0.4.8	138	11/16/2025
0.4.7	247	11/14/2025
0.4.5	287	11/13/2025
0.4.3	263	11/10/2025
0.4.0	191	11/3/2025
0.2.3	190	11/3/2025
0.2.2	154	1/17/2025
0.2.1	132	1/16/2025
0.2.0	159	1/11/2025
0.1.1	167	12/16/2024
0.1.0	161	12/6/2024

FilePrepper 0.4.8

FilePrepper

🚀 Quick Start

SDK Installation

SDK Usage (Recommended)

CLI Usage

📦 Supported Formats

🛠️ Available Commands (26+)

Data Transformation

Data Cleaning

Column Operations

Data Analysis

Data Organization

Feature Engineering

💡 Common Use Cases

Data Cleaning Pipeline (CLI)

Time-Series Processing (Phase 2) ⭐

ML Feature Engineering (SDK - Efficient!)

Format Conversion

Data Analysis

🔧 Programmatic Usage (SDK)

✨ Pipeline API (Recommended)

Advanced Pipeline Features

In-Memory Processing

Traditional Task API

📖 Documentation

Getting Started

SDK & Programming

Advanced Features

Use Cases

🎯 Use Cases

📋 Requirements

🤝 Contributing

📄 License

🔗 Links

net10.0

NuGet packages

GitHub repositories