Features¶

API Documentation

siapy.features

The features module provides automated feature engineering and selection capabilities specifically designed for spectral data analysis.

Spectral Indices¶

API Documentation

siapy.features.spectral_indices

Spectral indices are mathematical combinations of spectral bands that highlight specific characteristics of materials or conditions. The module provides functions to discover available indices and compute them from spectral data.

Getting available indices¶

The get_spectral_indices() function returns all spectral indices that can be computed from the available bands:

from siapy.features.spectral_indices import get_spectral_indices

# Get indices computable from Red and Green bands
bands = ["R", "G"]
available_indices = get_spectral_indices(bands)
print(f"Found {len(available_indices)} indices")

# Display the names and long names of the available indices
for name, index in list(available_indices.items()):
    print(f"{name}: {index.long_name}")

Computing spectral indices¶

The compute_spectral_indices() function calculates spectral indices from DataFrame data:

import numpy as np
import pandas as pd

from siapy.features.spectral_indices import compute_spectral_indices

# Create sample spectral data
np.random.seed(42)
data = pd.DataFrame(
    {
        "R": np.random.random(100),
        "G": np.random.random(100),
    }
)

indices_df = compute_spectral_indices(
    data=data,
    spectral_indices=["BIXS", "RI"],  # Indices to compute
)
print(f"Computed indices\n: {indices_df.head()}")

Band mapping¶

When your data uses non-standard column names, use the bands_map parameter:

# Data with custom column names
custom_data = pd.DataFrame(
    {"red_band": np.random.random(100), "green_band": np.random.random(100), "nir_band": np.random.random(100)}
)

# Map custom names to standard band acronyms
bands_map = {"red_band": "R", "green_band": "G", "nir_band": "N"}

indices_df = compute_spectral_indices(data=custom_data, spectral_indices=["NDVI", "GNDVI"], bands_map=bands_map)

Automatic features generation¶

API Documentation

siapy.features.AutoFeatClassification
siapy.features.AutoFeatRegression
siapy.features.AutoSpectralIndicesClassification
siapy.features.AutoSpectralIndicesRegression

Mathematically extracted features¶

The AutoFeat classes provide deterministic wrappers around the AutoFeat library, which automatically generates and selects engineered features through symbolic regression.

import pandas as pd
from sklearn.datasets import make_classification

from siapy.features import AutoFeatClassification

# Generate sample data
X, y = make_classification(n_samples=200, n_features=5, random_state=42)
data = pd.DataFrame(X, columns=[f"f{i}" for i in range(5)])
target = pd.Series(y)

# Create and configure AutoFeat
autofeat = AutoFeatClassification(
    random_seed=42,  # For reproducibility
    verbose=1,  # Show progress
)

# Fit and transform
features_engineered = autofeat.fit_transform(data, target)
print(f"Original features: {data.shape[1]}")
print(f"Engineered features: {features_engineered.shape[1]}")

Features extracted using spectral indices¶

These classes integrate spectral index computation with automated feature selection, offering end-to-end pipelines for identifying the most relevant spectral indices.

import pandas as pd
from sklearn.datasets import make_classification

from siapy.features import AutoSpectralIndicesClassification
from siapy.features.helpers import FeatureSelectorConfig
from siapy.features.spectral_indices import get_spectral_indices

# Create spectral-like data
X, y = make_classification(n_samples=300, n_features=4, random_state=42)
data = pd.DataFrame(X, columns=["R", "G", "B", "N"])  # Red, Green, Blue, NIR
target = pd.Series(y)

# Get available spectral indices
available_indices = get_spectral_indices(["R", "G", "B", "N"])
print(f"Available indices: {len(available_indices)}")

# Configure feature selection
config = FeatureSelectorConfig(
    k_features=(5, 20),  # Select 5-20 best indices
    cv=5,  # Cross-validation for feature selection
    verbose=1,
)

# Create automated spectral indices classifier
auto_spectral = AutoSpectralIndicesClassification(
    spectral_indices=list(available_indices.keys()),
    selector_config=config,
    merge_with_original=True,  # Include original bands
)

# Fit and transform
enhanced_features = auto_spectral.fit_transform(data, target)
print(f"\nOriginal features: {data.shape[1]}")
print(f"Enhanced features: {enhanced_features.shape[1]}")

Integration with siapy enitites¶

The features module integrates seamlessly with siapy entity system.

import numpy as np
import pandas as pd
from sklearn.datasets import make_classification

from siapy.entities import Pixels, Signatures, SpectralImage
from siapy.features import AutoSpectralIndicesClassification
from siapy.features.helpers import FeatureSelectorConfig
from siapy.features.spectral_indices import compute_spectral_indices, get_spectral_indices

# Create a mock spectral image with 4 bands (Red, Green, Blue, Near-infrared)
rng = np.random.default_rng(seed=42)
image_array = rng.random((50, 50, 4))  # height, width, bands (R, G, B, N)
image = SpectralImage.from_numpy(image_array)

# Define region of interest (ROI) pixels for sampling
roi_pixels = Pixels.from_iterable(
    [(10, 15), (12, 18), (15, 20), (18, 22), (20, 25), (25, 30), (28, 32), (30, 35), (32, 38), (35, 40)]
)

# Extract spectral signatures from ROI pixels
signatures = image.to_signatures(roi_pixels)
print(f"Extracted {len(signatures)} signatures from the image")

# Convert signatures to DataFrame and assign standard band names
spectral_data = signatures.signals.df.copy()
spectral_data = spectral_data.rename(columns=dict(zip(spectral_data.columns, ["R", "G", "B", "N"])))

# Create synthetic classification labels for demonstration purposes
_, y = make_classification(n_samples=len(spectral_data), n_features=4, random_state=42)
target = pd.Series(y[: len(spectral_data)])

# Get all spectral indices that can be computed with available bands
available_indices = get_spectral_indices(["R", "G", "B", "N"])
print(f"Found {len(available_indices)} computable spectral indices")

# Method 1: Manually compute spectral indices
indices_df = compute_spectral_indices(
    data=spectral_data,
    spectral_indices=list(available_indices.keys())[:10],  # Use first 10 indices
)
print(f"Computed {indices_df.shape[1]} spectral indices")

# Method 2: Automated feature selection with spectral indices
# Configure the feature selector
config = FeatureSelectorConfig(
    k_features=5,  # Select 5 best performing indices
    cv=3,  # Use 3-fold cross-validation
    verbose=0,
)

# Create automated selector that finds optimal spectral indices
auto_spectral = AutoSpectralIndicesClassification(
    spectral_indices=list(available_indices.keys())[:15],  # Use first 15 indices as candidates
    selector_config=config,
    merge_with_original=False,  # Return only selected indices, not original bands
)

# Apply feature selection to find the best spectral indices
selected_features = auto_spectral.fit_transform(spectral_data, target)
print(f"Selected {selected_features.shape[1]} optimal spectral indices")

# Create new signatures object with selected features
enhanced_signatures = Signatures.from_signals_and_pixels(signals=selected_features, pixels=signatures.pixels)
print(f"Created enhanced signatures with shape: {enhanced_signatures.signals.df.shape}")

# Display results - the enhanced signatures contain only the most informative spectral indices
print(f"Enhanced signatures DataFrame:\n{enhanced_signatures.to_dataframe().head()}")