| Title: | Read Near-Infrared Spectroscopy and Spectral File Formats |
|---|---|
| Description: | Thin R binding for the Rust-first 'nirs4all-formats' near-infrared spectroscopy (NIRS) file-loading core. When installed via R CMD INSTALL with Cargo available, the package compiles a native 'extendr' static library from 'src/rust/' and dispatches probe, read, and walk calls directly through Rust. Without Cargo it falls back to invoking the 'nirs4all-formats' command-line interface. This is the complete build: it ships every reader, including the optional large ones (HDF5/netCDF, Parquet/Arrow, MATLAB) on top of the core readers (JCAMP-DX, Galactic SPC, Bruker OPUS, ASD, ENVI, CSV, Excel, and many vendor ASCII/binary formats). A smaller sibling package 'nirs4allformats.lite' drops only the Parquet/Arrow reader for size-sensitive installs. |
| Authors: | Gregory Beurier [aut, cre] |
| Maintainer: | Gregory Beurier <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-06-12 11:45:34 UTC |
| Source: | https://github.com/GBeurier/nirs4all-formats |
as.data.frame() method for nirs4allformats_dataset
objects. Returns a wide table whose first column is sample_id, followed by
any target columns, followed by one spectral column per wavelength. Spectral
columns are named x_<wavelength> (the axis value formatted without
scientific notation).
## S3 method for class 'nirs4allformats_dataset' as.data.frame(x, row.names = NULL, optional = FALSE, ...)## S3 method for class 'nirs4allformats_dataset' as.data.frame(x, row.names = NULL, optional = FALSE, ...)
x |
An |
row.names |
Ignored; present for S3 method signature compatibility. |
optional |
Ignored; present for S3 method signature compatibility. |
... |
Ignored; present for S3 method consistency. |
A data.frame with columns sample_id, the target columns (if any),
and x_<wavelength> spectral columns.
nirs4allformats_open_dataset(), as.matrix.nirs4allformats_dataset(),
nirs4allformats_as_tibble().
## Not run: ds <- nirs4allformats_open_dataset("samples/csv_tsv/synthetic_nirs.csv") df <- as.data.frame(ds) names(df)[1:5] ## End(Not run)## Not run: ds <- nirs4allformats_open_dataset("samples/csv_tsv/synthetic_nirs.csv") df <- as.data.frame(ds) names(df)[1:5] ## End(Not run)
as.matrix() method for nirs4allformats_dataset
objects. Returns the stored n_samples x n_wavelengths numeric matrix of
spectra. Rows correspond to x$sample_ids and columns to x$wavelengths.
## S3 method for class 'nirs4allformats_dataset' as.matrix(x, ...)## S3 method for class 'nirs4allformats_dataset' as.matrix(x, ...)
x |
An |
... |
Ignored; present for S3 method consistency. |
A numeric matrix with one row per sample and one column per wavelength.
nirs4allformats_open_dataset(), as.data.frame.nirs4allformats_dataset().
## Not run: ds <- nirs4allformats_open_dataset("samples/csv_tsv/synthetic_nirs.csv") m <- as.matrix(ds) dim(m) ## End(Not run)## Not run: ds <- nirs4allformats_open_dataset("samples/csv_tsv/synthetic_nirs.csv") m <- as.matrix(ds) dim(m) ## End(Not run)
Convenience wrapper that converts an
nirs4allformats_dataset to a
tibble via as.data.frame.nirs4allformats_dataset(). The
optional tibble package must be installed.
nirs4allformats_as_tibble(dataset)nirs4allformats_as_tibble(dataset)
dataset |
An |
A tibble::tibble with the same columns as
as.data.frame.nirs4allformats_dataset() (sample_id, target columns,
x_<wavelength> spectral columns).
nirs4allformats_open_dataset(), as.data.frame.nirs4allformats_dataset().
## Not run: ds <- nirs4allformats_open_dataset("samples/csv_tsv/synthetic_nirs.csv") nirs4allformats_as_tibble(ds) ## End(Not run)## Not run: ds <- nirs4allformats_open_dataset("samples/csv_tsv/synthetic_nirs.csv") nirs4allformats_as_tibble(ds) ## End(Not run)
Returns TRUE when the compiled extendr static library is registered in the
running R session, i.e. the package was installed with Cargo on PATH and
the Rust core is callable directly. When FALSE, filesystem reads fall back
to the nirs4all-formats CLI, and the in-memory paths
(nirs4allformats_open_bytes(), nirs4allformats_open_with_sidecars()) are
unavailable.
nirs4allformats_native_available()nirs4allformats_native_available()
A length-one logical: TRUE if the native backend is loaded,
otherwise FALSE.
nirs4allformats_open_bytes(), nirs4allformats_open_with_sidecars().
## Not run: if (nirs4allformats_native_available()) { message("native extendr backend active") } else { message("using nirs4all-formats CLI fallback") } ## End(Not run)## Not run: if (nirs4allformats_native_available()) { message("native extendr backend active") } else { message("using nirs4all-formats CLI fallback") } ## End(Not run)
Decodes an in-memory byte buffer through the Rust registry and returns the
normalized records as nested R lists, without touching the filesystem. The
name drives extension-based sniffing and provenance. This path requires the
native extendr static library; it is unavailable through the CLI fallback and
raises an error when the native library is absent.
Formats that need companion files (sidecars) are rejected here; use
nirs4allformats_open_with_sidecars() for those.
nirs4allformats_open_bytes(name, bytes)nirs4allformats_open_bytes(name, bytes)
name |
Character scalar. Logical file name (with extension) used for
format sniffing and recorded in provenance, e.g. |
bytes |
A |
A list of records, identical in shape to nirs4allformats_open_records().
nirs4allformats_open_with_sidecars(), nirs4allformats_open_records(),
nirs4allformats_native_available().
## Not run: bytes <- readBin("spectrum.dx", what = "raw", n = file.info("spectrum.dx")$size) records <- nirs4allformats_open_bytes("spectrum.dx", bytes) ## End(Not run)## Not run: bytes <- readBin("spectrum.dx", what = "raw", n = file.info("spectrum.dx")$size) records <- nirs4allformats_open_bytes("spectrum.dx", bytes) ## End(Not run)
Loads a file with nirs4allformats_open_records() and projects one signal per
record into a rectangular, R-friendly dataset: a samples-by-wavelengths
matrix plus sample IDs, targets and metadata. All records must share the same
spectral axis (an error is raised otherwise), so this is intended for a
homogeneous set of spectra. For heterogeneous or N-dimensional data, work
from nirs4allformats_open_records() directly.
Parsing happens only in Rust; the R layer just selects a signal and reshapes the JSON the core returns.
nirs4allformats_open_dataset(path, signal = NULL)nirs4allformats_open_dataset(path, signal = NULL)
path |
Character scalar. Path to the input file (resolved with
|
signal |
Optional character scalar naming the signal channel to project.
When |
An object of class nirs4allformats_dataset: a named list with
xNumeric matrix of spectra, n_samples x n_wavelengths.
wavelengthsNumeric vector of axis coordinates (length
n_wavelengths).
targetsdata.frame of reference values, one column per target
key (zero columns when none were parsed).
sample_idsCharacter vector of per-row identifiers.
metadataList of per-record metadata lists.
signal_typeSignal type of the selected channel.
axis_unitUnit string of the spectral axis (e.g. "nm").
formatsCharacter vector of the source format per row.
Use as.matrix() / as.data.frame() / nirs4allformats_as_tibble() to project
it into common R shapes.
When signal is NULL the channel is chosen per record in this order:
the first signal whose signal_type equals the record-level
signal_type;
otherwise the first present of "reflectance", "absorbance",
"transmittance", "signal";
otherwise the alphabetically first signal name.
Passing an explicit signal name selects that channel and errors if a record
lacks it.
Each row's identifier is taken from metadata$sample_id when present;
otherwise it is derived from the source file basename and 0-based row index
("<basename>:<i>"), falling back to "record:<i>" when no source path is
known.
Reference values found under each record's targets are gathered into a
data.frame (missing values become NA). The full per-record metadata
lists are preserved verbatim in the metadata field of the returned object.
The call is served by the native extendr static library when it is present
(built by R CMD INSTALL with Cargo on PATH). Otherwise it shells out to
the nirs4all-formats CLI: the NIRS4ALL_FORMATS_CLI environment variable may point
to a prebuilt binary (it is whitespace-split into command + arguments), a
nirs4all-formats binary on PATH is used if found, and in a source checkout it
falls back to cargo run -p nirs4all-formats-cli.
nirs4allformats_open_records() for the lossless record view,
as.matrix.nirs4allformats_dataset(), as.data.frame.nirs4allformats_dataset(),
nirs4allformats_as_tibble().
## Not run: ds <- nirs4allformats_open_dataset("samples/csv_tsv/synthetic_nirs.csv") dim(as.matrix(ds)) head(as.data.frame(ds)) ds$wavelengths[1:5] # Select a specific channel by name ds_abs <- nirs4allformats_open_dataset("spectrum.dx", signal = "absorbance") ## End(Not run)## Not run: ds <- nirs4allformats_open_dataset("samples/csv_tsv/synthetic_nirs.csv") dim(as.matrix(ds)) head(as.data.frame(ds)) ds$wavelengths[1:5] # Select a specific channel by name ds_abs <- nirs4allformats_open_dataset("spectrum.dx", signal = "absorbance") ## End(Not run)
Decodes a single file through the Rust nirs4all-formats registry and returns the
normalized records exactly as the core emits them, as nested R lists. Format
detection is content-based: the file is sniffed and dispatched to the
highest-confidence reader. No reshaping, alignment or column-building is done
here – this is the faithful, lossless view of the Rust SpectralRecord
model. For a flat spectral matrix use nirs4allformats_open_dataset().
Parser logic lives only in Rust; this function never parses bytes itself. It
dispatches through the native extendr library when available and otherwise
through the nirs4all-formats CLI (see Transport in nirs4allformats_open_dataset()).
nirs4allformats_open_records(path)nirs4allformats_open_records(path)
path |
Character scalar. Path to the input file. It is resolved with
|
A list of records. Each record is a named list mirroring the Rust
SpectralRecord:
signalsNamed list of signal channels. Each channel carries
values (flat C-order buffer), shape, dims (exactly one is "x"),
optional coords, signal_type, unit, role, source and an
axis (values, unit, kind, order).
signal_typeRecord-level signal type (e.g. "absorbance",
"reflectance", "unknown").
targetsNamed list of reference values parsed from the file.
metadataNamed list of typed metadata key/value pairs.
provenanceReader name/version, per-source SHA-256 (sources),
format, record_schema_version and warnings.
quality_flagsCharacter vector of quality annotations.
nirs4allformats_open_dataset() for a flat matrix view,
nirs4allformats_probe_path() to inspect candidate readers,
nirs4allformats_walk_path() to scan a directory.
## Not run: records <- nirs4allformats_open_records("samples/csv_tsv/synthetic_nirs.csv") length(records) records[[1]]$provenance$format names(records[[1]]$signals) ## End(Not run)## Not run: records <- nirs4allformats_open_records("samples/csv_tsv/synthetic_nirs.csv") length(records) records[[1]]$provenance$format names(records[[1]]$signals) ## End(Not run)
Decodes an in-memory primary buffer together with a named map of companion
files (sidecars) through the Rust registry, returning normalized records as
nested R lists. This serves formats that split a measurement across multiple
files, such as ENVI Standard cubes (.img + .hdr) or ERDAS LAN. Sidecar
names are interpreted as paths relative to the primary file.
This path requires the native extendr static library and raises an error when it is absent; it has no CLI fallback.
nirs4allformats_open_with_sidecars(name, bytes, sidecars = list())nirs4allformats_open_with_sidecars(name, bytes, sidecars = list())
name |
Character scalar. Logical file name of the primary file (with
extension), used for sniffing and provenance, e.g. |
bytes |
A |
sidecars |
A named list of |
A list of records, identical in shape to nirs4allformats_open_records().
nirs4allformats_open_bytes(), nirs4allformats_open_records(),
nirs4allformats_native_available().
## Not run: read_raw <- function(p) readBin(p, "raw", n = file.info(p)$size) records <- nirs4allformats_open_with_sidecars( "cube.img", read_raw("cube.img"), list("cube.hdr" = read_raw("cube.hdr")) ) ## End(Not run)## Not run: read_raw <- function(p) readBin(p, "raw", n = file.info(p)$size) records <- nirs4allformats_open_with_sidecars( "cube.img", read_raw("cube.img"), list("cube.hdr" = read_raw("cube.hdr")) ) ## End(Not run)
Sniffs a file (reading only its head, not a full parse) and returns the ordered list of readers that recognize it, highest confidence first. Useful for diagnosing format detection without decoding the whole file.
Sniffing is performed entirely in Rust. The native extendr library is used
when present; otherwise the nirs4all-formats probe CLI command is invoked (see
Transport in nirs4allformats_open_dataset()).
nirs4allformats_probe_path(path)nirs4allformats_probe_path(path)
path |
Character scalar. Path to the file to probe (resolved with
|
A list of candidate descriptors. Each entry includes at least a
format name and a confidence indication, ordered from most to least
confident. The list is empty when no reader recognizes the file.
nirs4allformats_open_records(), nirs4allformats_walk_path().
## Not run: probes <- nirs4allformats_probe_path("samples/csv_tsv/synthetic_nirs.csv") probes[[1]]$format ## End(Not run)## Not run: probes <- nirs4allformats_probe_path("samples/csv_tsv/synthetic_nirs.csv") probes[[1]]$format ## End(Not run)
Returns the version of the nirs4allformats R binding as a character scalar.
This is the binding's own version and is independent of the underlying Rust
nirs4all-formats core version.
nirs4allformats_version()nirs4allformats_version()
A length-one character vector with the binding version.
nirs4allformats_native_available().
nirs4allformats_version()nirs4allformats_version()
Recursively visits a directory (or a single file) and reports the detection outcome for each visited file: whether it parsed, errored, or is unsupported, together with its detected format. Only sniffing and walking happen here; no file is fully decoded.
The walk runs in Rust. The native extendr library is used when present;
otherwise the nirs4all-formats scan --json CLI command is invoked and its
entries are returned (see Transport in nirs4allformats_open_dataset()).
nirs4allformats_walk_path( path, max_depth = NULL, include_hidden = FALSE, follow_symlinks = FALSE, include_unsupported = FALSE )nirs4allformats_walk_path( path, max_depth = NULL, include_hidden = FALSE, follow_symlinks = FALSE, include_unsupported = FALSE )
path |
Character scalar. Directory or file to scan (resolved with
|
max_depth |
Optional integer. Maximum recursion depth; |
|
Logical. Include hidden files/directories. Defaults to
|
|
follow_symlinks |
Logical. Follow symbolic links during the walk.
Defaults to |
include_unsupported |
Logical. Include entries for files no reader
recognizes. Defaults to |
A list of per-file outcome entries. Each entry includes at least a
status (e.g. "parsed", "error", "unsupported") and, when detected,
a format.
nirs4allformats_probe_path(), nirs4allformats_open_records().
## Not run: entries <- nirs4allformats_walk_path("samples/asd") length(entries) entries[[1]]$status entries[[1]]$format # Limit recursion depth and include unsupported files nirs4allformats_walk_path("samples", max_depth = 1, include_unsupported = TRUE) ## End(Not run)## Not run: entries <- nirs4allformats_walk_path("samples/asd") length(entries) entries[[1]]$status entries[[1]]$format # Limit recursion depth and include unsupported files nirs4allformats_walk_path("samples", max_depth = 1, include_unsupported = TRUE) ## End(Not run)