Skip to content

Schemas

siapy.datasets.schemas

Target

Bases: BaseModel, ABC

Abstract base class for machine learning target variables.

This class defines the interface for target variables used in machine learning datasets, supporting both classification and regression targets.

model_config class-attribute instance-attribute

model_config = ConfigDict(arbitrary_types_allowed=True)

value instance-attribute

value: Series

from_dict abstractmethod classmethod

from_dict(data: dict[str, Any]) -> Target

Create a Target instance from a dictionary.

PARAMETER DESCRIPTION
data

Dictionary containing target data with appropriate keys.

TYPE: dict[str, Any]

RETURNS DESCRIPTION
Target

New Target instance created from the dictionary data.

Source code in siapy/datasets/schemas.py
51
52
53
54
55
56
57
58
59
60
61
62
@classmethod
@abstractmethod
def from_dict(cls, data: dict[str, Any]) -> "Target":
    """Create a Target instance from a dictionary.

    Args:
        data: Dictionary containing target data with appropriate keys.

    Returns:
        New Target instance created from the dictionary data.
    """
    ...

from_iterable abstractmethod classmethod

from_iterable(data: Iterable[Any]) -> Target

Create a Target instance from an iterable of values.

PARAMETER DESCRIPTION
data

Iterable containing target values.

TYPE: Iterable[Any]

RETURNS DESCRIPTION
Target

New Target instance created from the iterable data.

Source code in siapy/datasets/schemas.py
64
65
66
67
68
69
70
71
72
73
74
75
@classmethod
@abstractmethod
def from_iterable(cls, data: Iterable[Any]) -> "Target":
    """Create a Target instance from an iterable of values.

    Args:
        data: Iterable containing target values.

    Returns:
        New Target instance created from the iterable data.
    """
    ...

to_dict abstractmethod

to_dict() -> dict[str, Any]

Convert the target to a dictionary representation.

RETURNS DESCRIPTION
dict[str, Any]

Dictionary containing the target data.

Source code in siapy/datasets/schemas.py
77
78
79
80
81
82
83
84
@abstractmethod
def to_dict(self) -> dict[str, Any]:
    """Convert the target to a dictionary representation.

    Returns:
        Dictionary containing the target data.
    """
    ...

to_dataframe abstractmethod

to_dataframe() -> DataFrame

Convert the target to a pandas DataFrame.

RETURNS DESCRIPTION
DataFrame

DataFrame representation of the target data.

Source code in siapy/datasets/schemas.py
86
87
88
89
90
91
92
93
@abstractmethod
def to_dataframe(self) -> pd.DataFrame:
    """Convert the target to a pandas DataFrame.

    Returns:
        DataFrame representation of the target data.
    """
    ...

reset_index abstractmethod

reset_index() -> Target

Reset the index of all internal pandas objects to a default integer index.

RETURNS DESCRIPTION
Target

New Target instance with reset indices.

Source code in siapy/datasets/schemas.py
 95
 96
 97
 98
 99
100
101
102
@abstractmethod
def reset_index(self) -> "Target":
    """Reset the index of all internal pandas objects to a default integer index.

    Returns:
        New Target instance with reset indices.
    """
    ...

ClassificationTarget

Bases: Target

Target variable for classification tasks.

Represents categorical target variables with string labels, numerical values, and optional encoding information for machine learning classification tasks.

label instance-attribute

label: Series

value instance-attribute

value: Series

encoding instance-attribute

encoding: Series

model_config class-attribute instance-attribute

model_config = ConfigDict(arbitrary_types_allowed=True)

from_iterable classmethod

from_iterable(data: Iterable[Any]) -> ClassificationTarget

Create a ClassificationTarget from an iterable of labels.

Automatically generates numerical values and encoding for the provided labels.

PARAMETER DESCRIPTION
data

Iterable containing classification labels.

TYPE: Iterable[Any]

RETURNS DESCRIPTION
ClassificationTarget

New ClassificationTarget instance with generated values and encoding.

Source code in siapy/datasets/schemas.py
138
139
140
141
142
143
144
145
146
147
148
149
150
151
@classmethod
def from_iterable(cls, data: Iterable[Any]) -> "ClassificationTarget":
    """Create a ClassificationTarget from an iterable of labels.

    Automatically generates numerical values and encoding for the provided labels.

    Args:
        data: Iterable containing classification labels.

    Returns:
        New ClassificationTarget instance with generated values and encoding.
    """
    label = pd.DataFrame(data, columns=["label"])
    return generate_classification_target(label, "label")

from_dict classmethod

from_dict(data: dict[str, Any]) -> ClassificationTarget

Create a ClassificationTarget from a dictionary.

PARAMETER DESCRIPTION
data

Dictionary with keys 'label', 'value', and 'encoding'.
- label: List of string labels
- value: List of numerical values corresponding to labels
- encoding: List of encoding information for labels

TYPE: dict[str, Any]

RETURNS DESCRIPTION
ClassificationTarget

New ClassificationTarget instance created from the dictionary.

Source code in siapy/datasets/schemas.py
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "ClassificationTarget":
    """Create a ClassificationTarget from a dictionary.

    Args:
        data: Dictionary with keys 'label', 'value', and 'encoding'.<br>
            - label: List of string labels <br>
            - value: List of numerical values corresponding to labels <br>
            - encoding: List of encoding information for labels

    Returns:
        New ClassificationTarget instance created from the dictionary.
    """
    label = pd.Series(data["label"], name="label")
    value = pd.Series(data["value"], name="value")
    encoding = pd.Series(data["encoding"], name="encoding")
    return cls(label=label, value=value, encoding=encoding)

to_dict

to_dict() -> dict[str, Any]

Convert the classification target to a dictionary.

RETURNS DESCRIPTION
dict[str, Any]

Dictionary with keys 'label', 'value', and 'encoding' containing list representations of the respective pandas Series.

Source code in siapy/datasets/schemas.py
171
172
173
174
175
176
177
178
179
180
181
def to_dict(self) -> dict[str, Any]:
    """Convert the classification target to a dictionary.

    Returns:
        Dictionary with keys 'label', 'value', and 'encoding' containing list representations of the respective pandas Series.
    """
    return {
        "label": self.label.to_list(),
        "value": self.value.to_list(),
        "encoding": self.encoding.to_list(),
    }

to_dataframe

to_dataframe() -> DataFrame

Convert the classification target to a pandas DataFrame.

RETURNS DESCRIPTION
DataFrame

DataFrame containing 'value' and 'label' columns. The encoding information is not included in the DataFrame representation.

Source code in siapy/datasets/schemas.py
183
184
185
186
187
188
189
def to_dataframe(self) -> pd.DataFrame:
    """Convert the classification target to a pandas DataFrame.

    Returns:
        DataFrame containing 'value' and 'label' columns. The encoding information is not included in the DataFrame representation.
    """
    return pd.concat([self.value, self.label], axis=1)

reset_index

reset_index() -> ClassificationTarget

Reset indices of label and value Series to default integer index.

RETURNS DESCRIPTION
ClassificationTarget

New ClassificationTarget instance with reset indices for label and value. The encoding Series is preserved as-is since it represents the overall encoding scheme rather than instance-specific data.

Source code in siapy/datasets/schemas.py
191
192
193
194
195
196
197
198
199
200
201
def reset_index(self) -> "ClassificationTarget":
    """Reset indices of label and value Series to default integer index.

    Returns:
        New ClassificationTarget instance with reset indices for label and value. The encoding Series is preserved as-is since it represents the overall encoding scheme rather than instance-specific data.
    """
    return ClassificationTarget(
        label=self.label.reset_index(drop=True),
        value=self.value.reset_index(drop=True),
        encoding=self.encoding,
    )

RegressionTarget

Bases: Target

Target variable for regression tasks.

Represents continuous numerical target variables for machine learning regression tasks with an optional descriptive name.

value instance-attribute

value: Series

name class-attribute instance-attribute

name: str = 'value'

model_config class-attribute instance-attribute

model_config = ConfigDict(arbitrary_types_allowed=True)

from_iterable classmethod

from_iterable(data: Iterable[Any]) -> RegressionTarget

Create a RegressionTarget from an iterable of numerical values.

PARAMETER DESCRIPTION
data

Iterable containing numerical regression target values.

TYPE: Iterable[Any]

RETURNS DESCRIPTION
RegressionTarget

New RegressionTarget instance with default name "value".

Source code in siapy/datasets/schemas.py
234
235
236
237
238
239
240
241
242
243
244
245
@classmethod
def from_iterable(cls, data: Iterable[Any]) -> "RegressionTarget":
    """Create a RegressionTarget from an iterable of numerical values.

    Args:
        data: Iterable containing numerical regression target values.

    Returns:
        New RegressionTarget instance with default name "value".
    """
    value = pd.DataFrame(data, columns=["value"])
    return generate_regression_target(value, "value")

from_dict classmethod

from_dict(data: dict[str, Any]) -> RegressionTarget

Create a RegressionTarget from a dictionary.

PARAMETER DESCRIPTION
data

Dictionary with required key 'value' and optional key 'name'.
- value: List of numerical target values
- name: Optional descriptive name for the target variable

TYPE: dict[str, Any]

RETURNS DESCRIPTION
RegressionTarget

New RegressionTarget instance. Uses "value" as default name if not provided in the dictionary.

Source code in siapy/datasets/schemas.py
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "RegressionTarget":
    """Create a RegressionTarget from a dictionary.

    Args:
        data: Dictionary with required key 'value' and optional key 'name'.<br>
            - value: List of numerical target values <br>
            - name: Optional descriptive name for the target variable

    Returns:
        New RegressionTarget instance. Uses "value" as default name if not provided in the dictionary.
    """
    value = pd.Series(data["value"], name="value")
    name = data["name"] if "name" in data else "value"
    return cls(value=value, name=name)

to_dict

to_dict() -> dict[str, Any]

Convert the regression target to a dictionary.

RETURNS DESCRIPTION
dict[str, Any]

Dictionary with keys 'value' and 'name' containing the list representation of values and the descriptive name.

Source code in siapy/datasets/schemas.py
263
264
265
266
267
268
269
270
271
272
def to_dict(self) -> dict[str, Any]:
    """Convert the regression target to a dictionary.

    Returns:
        Dictionary with keys 'value' and 'name' containing the list representation of values and the descriptive name.
    """
    return {
        "value": self.value.to_list(),
        "name": self.name,
    }

to_dataframe

to_dataframe() -> DataFrame

Convert the regression target to a pandas DataFrame.

RETURNS DESCRIPTION
DataFrame

DataFrame containing a single column with the target values. The column name corresponds to the Series name, not the target name.

Source code in siapy/datasets/schemas.py
274
275
276
277
278
279
280
def to_dataframe(self) -> pd.DataFrame:
    """Convert the regression target to a pandas DataFrame.

    Returns:
        DataFrame containing a single column with the target values. The column name corresponds to the Series name, not the target name.
    """
    return pd.DataFrame(self.value)

reset_index

reset_index() -> RegressionTarget

Reset the index of the value Series to a default integer index.

RETURNS DESCRIPTION
RegressionTarget

New RegressionTarget instance with reset index for the value Series. The name is preserved.

Source code in siapy/datasets/schemas.py
282
283
284
285
286
287
288
def reset_index(self) -> "RegressionTarget":
    """Reset the index of the value Series to a default integer index.

    Returns:
        New RegressionTarget instance with reset index for the value Series. The name is preserved.
    """
    return RegressionTarget(value=self.value.reset_index(drop=True), name=self.name)

TabularDatasetData dataclass

TabularDatasetData(
    signatures: Signatures,
    metadata: DataFrame,
    target: Target | None = None,
)

Container for tabular machine learning dataset components.

Combines spectral signatures, metadata, and optional target variables into a unified dataset structure for machine learning workflows. Ensures data consistency through length validation and provides various data access patterns.

signatures instance-attribute

signatures: Signatures

metadata instance-attribute

metadata: DataFrame

target class-attribute instance-attribute

target: Target | None = None

from_dict classmethod

from_dict(data: dict[str, Any]) -> TabularDatasetData

Create a TabularDatasetData instance from a dictionary.

PARAMETER DESCRIPTION
data

Dictionary containing dataset components with keys:
- pixels: Dictionary for pixel data
- signals: Dictionary for signal data
- metadata: Dictionary for metadata
- target: Optional dictionary for target data

TYPE: dict[str, Any]

RETURNS DESCRIPTION
TabularDatasetData

New TabularDatasetData instance created from the dictionary data.

Source code in siapy/datasets/schemas.py
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "TabularDatasetData":
    """Create a TabularDatasetData instance from a dictionary.

    Args:
        data: Dictionary containing dataset components with keys: <br>
            - pixels: Dictionary for pixel data <br>
            - signals: Dictionary for signal data <br>
            - metadata: Dictionary for metadata <br>
            - target: Optional dictionary for target data

    Returns:
        New TabularDatasetData instance created from the dictionary data.
    """
    signatures = Signatures.from_dict({"pixels": data["pixels"], "signals": data["signals"]})
    metadata = pd.DataFrame(data["metadata"])
    target = TabularDatasetData.target_from_dict(data.get("target", None))
    return cls(signatures=signatures, metadata=metadata, target=target)

target_from_dict staticmethod

target_from_dict(
    data: dict[str, Any] | None = None,
) -> Optional[Target]

Create an appropriate Target instance from a dictionary.

Automatically determines whether to create a ClassificationTarget or RegressionTarget based on the keys present in the data dictionary.

PARAMETER DESCRIPTION
data

Optional dictionary containing target data. If None, returns None.
Keys determine the target type:
- Classification: Requires keys compatible with ClassificationTarget
- Regression: Requires keys compatible with RegressionTarget

TYPE: dict[str, Any] | None DEFAULT: None

RETURNS DESCRIPTION
Optional[Target]

ClassificationTarget, RegressionTarget, or None based on input data.

RAISES DESCRIPTION
InvalidInputError

If the dictionary keys don't match any known target type.

Source code in siapy/datasets/schemas.py
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
@staticmethod
def target_from_dict(data: dict[str, Any] | None = None) -> Optional[Target]:
    """Create an appropriate Target instance from a dictionary.

    Automatically determines whether to create a ClassificationTarget or
    RegressionTarget based on the keys present in the data dictionary.

    Args:
        data: Optional dictionary containing target data. If None, returns None. <br>
            Keys determine the target type: <br>
            - Classification: Requires keys compatible with ClassificationTarget <br>
            - Regression: Requires keys compatible with RegressionTarget <br>

    Returns:
        ClassificationTarget, RegressionTarget, or None based on input data.

    Raises:
        InvalidInputError: If the dictionary keys don't match any known target type.
    """
    if data is None:
        return None

    regression_keys = set(RegressionTarget.model_fields.keys())
    classification_keys = set(ClassificationTarget.model_fields.keys())
    data_keys = set(data.keys())

    if data_keys.issubset(regression_keys):
        return RegressionTarget.from_dict(data)
    elif data_keys.issubset(classification_keys):
        return ClassificationTarget.from_dict(data)
    else:
        raise InvalidInputError(data, "Invalid target dict.")

set_attributes

set_attributes(
    *,
    signatures: Signatures | None = None,
    metadata: DataFrame | None = None,
    target: Target | None = None,
) -> TabularDatasetData

Create a new dataset with updated attributes.

Creates a copy of the current dataset with specified attributes replaced. Unspecified attributes are copied from the current dataset.

PARAMETER DESCRIPTION
signatures

Optional new Signatures to replace current signatures.

TYPE: Signatures | None DEFAULT: None

metadata

Optional new DataFrame to replace current metadata.

TYPE: DataFrame | None DEFAULT: None

target

Optional new Target to replace current target.

TYPE: Target | None DEFAULT: None

RETURNS DESCRIPTION
TabularDatasetData

New TabularDatasetData instance with updated attributes.

Note

The returned dataset will be validated for length consistency.

Source code in siapy/datasets/schemas.py
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
def set_attributes(
    self,
    *,
    signatures: Signatures | None = None,
    metadata: pd.DataFrame | None = None,
    target: Target | None = None,
) -> "TabularDatasetData":
    """Create a new dataset with updated attributes.

    Creates a copy of the current dataset with specified attributes replaced.
    Unspecified attributes are copied from the current dataset.

    Args:
        signatures: Optional new Signatures to replace current signatures.
        metadata: Optional new DataFrame to replace current metadata.
        target: Optional new Target to replace current target.

    Returns:
        New TabularDatasetData instance with updated attributes.

    Note:
        The returned dataset will be validated for length consistency.
    """
    current_data = self.copy()
    signatures = signatures if signatures is not None else current_data.signatures
    metadata = metadata if metadata is not None else current_data.metadata
    target = target if target is not None else current_data.target
    return TabularDatasetData(signatures=signatures, metadata=metadata, target=target)

to_dict

to_dict() -> dict[str, Any]

Convert the dataset to a dictionary representation.

RETURNS DESCRIPTION
dict[str, Any]

Dictionary with keys 'pixels', 'signals', 'metadata', and 'target'. The target key contains None if no target is present.

Source code in siapy/datasets/schemas.py
458
459
460
461
462
463
464
465
466
467
468
469
470
def to_dict(self) -> dict[str, Any]:
    """Convert the dataset to a dictionary representation.

    Returns:
        Dictionary with keys 'pixels', 'signals', 'metadata', and 'target'. The target key contains None if no target is present.
    """
    signatures_dict = self.signatures.to_dict()
    return {
        "pixels": signatures_dict["pixels"],
        "signals": signatures_dict["signals"],
        "metadata": self.metadata.to_dict(),
        "target": self.target.to_dict() if self.target is not None else None,
    }

to_dataframe

to_dataframe() -> DataFrame

Convert the dataset to a single pandas DataFrame.

Combines signatures, metadata, and target (if present) into a single DataFrame with all columns at the same level.

RETURNS DESCRIPTION
DataFrame

DataFrame containing all dataset components as columns.

Source code in siapy/datasets/schemas.py
472
473
474
475
476
477
478
479
480
481
482
483
484
485
def to_dataframe(self) -> pd.DataFrame:
    """Convert the dataset to a single pandas DataFrame.

    Combines signatures, metadata, and target (if present) into a single
    DataFrame with all columns at the same level.

    Returns:
        DataFrame containing all dataset components as columns.
    """
    combined_df = pd.concat([self.signatures.to_dataframe(), self.metadata], axis=1)
    if self.target is not None:
        target_series = self.target.to_dataframe()
        combined_df = pd.concat([combined_df, target_series], axis=1)
    return combined_df

to_dataframe_multiindex

to_dataframe_multiindex() -> DataFrame

Convert the dataset to a pandas DataFrame with MultiIndex columns.

Creates a DataFrame where columns are organized hierarchically by category (pixel, signal, metadata, target) and field names within each category.

RETURNS DESCRIPTION
DataFrame

DataFrame with MultiIndex columns having levels ['category', 'field'].

RAISES DESCRIPTION
InvalidInputError

If target type is not ClassificationTarget or RegressionTarget.

Source code in siapy/datasets/schemas.py
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
def to_dataframe_multiindex(self) -> pd.DataFrame:
    """Convert the dataset to a pandas DataFrame with MultiIndex columns.

    Creates a DataFrame where columns are organized hierarchically by category
    (pixel, signal, metadata, target) and field names within each category.

    Returns:
        DataFrame with MultiIndex columns having levels ['category', 'field'].

    Raises:
        InvalidInputError: If target type is not ClassificationTarget or RegressionTarget.
    """
    signatures_df = self.signatures.to_dataframe_multiindex()

    metadata_columns = pd.MultiIndex.from_tuples(
        [("metadata", col) for col in self.metadata.columns], names=["category", "field"]
    )
    metadata_df = pd.DataFrame(self.metadata.values, columns=metadata_columns)

    combined_df = pd.concat([signatures_df, metadata_df], axis=1)

    if self.target is not None:
        target_df = self.target.to_dataframe()
        if isinstance(self.target, ClassificationTarget):
            target_columns = pd.MultiIndex.from_tuples(
                [("target", col) for col in target_df.columns],
                names=["category", "field"],
            )
        elif isinstance(self.target, RegressionTarget):
            target_columns = pd.MultiIndex.from_tuples(
                [("target", self.target.name)],
                names=["category", "field"],
            )
        else:
            raise InvalidInputError(
                self.target,
                "Invalid target type. Expected ClassificationTarget or RegressionTarget.",
            )
        target_df = pd.DataFrame(target_df.values, columns=target_columns)
        combined_df = pd.concat([combined_df, target_df], axis=1)

    return combined_df

reset_index

reset_index() -> TabularDatasetData

Reset indices of all dataset components to default integer indices.

Creates a new dataset with all pandas objects having their indices reset to consecutive integers starting from 0.

RETURNS DESCRIPTION
TabularDatasetData

New TabularDatasetData instance with reset indices for all components.

Source code in siapy/datasets/schemas.py
530
531
532
533
534
535
536
537
538
539
540
541
542
543
def reset_index(self) -> "TabularDatasetData":
    """Reset indices of all dataset components to default integer indices.

    Creates a new dataset with all pandas objects having their indices
    reset to consecutive integers starting from 0.

    Returns:
        New TabularDatasetData instance with reset indices for all components.
    """
    return TabularDatasetData(
        signatures=self.signatures.reset_index(),
        metadata=self.metadata.reset_index(drop=True),
        target=self.target.reset_index() if self.target is not None else None,
    )

copy

Create a deep copy of the dataset.

Creates a new TabularDatasetData instance with copied versions of all components, ensuring that modifications to the copy don't affect the original.

RETURNS DESCRIPTION
TabularDatasetData

New TabularDatasetData instance that is a deep copy of the current dataset.

Source code in siapy/datasets/schemas.py
545
546
547
548
549
550
551
552
553
554
555
556
557
558
def copy(self) -> "TabularDatasetData":
    """Create a deep copy of the dataset.

    Creates a new TabularDatasetData instance with copied versions of all
    components, ensuring that modifications to the copy don't affect the original.

    Returns:
        New TabularDatasetData instance that is a deep copy of the current dataset.
    """
    return TabularDatasetData(
        signatures=self.signatures.copy(),
        metadata=self.metadata.copy(),
        target=self.target.model_copy() if self.target is not None else None,
    )