Schemas
siapy.datasets.schemas
¶
Target
¶
Bases: BaseModel
, ABC
Abstract base class for machine learning target variables.
This class defines the interface for target variables used in machine learning datasets, supporting both classification and regression targets.
model_config
class-attribute
instance-attribute
¶
model_config = ConfigDict(arbitrary_types_allowed=True)
from_dict
abstractmethod
classmethod
¶
Create a Target instance from a dictionary.
PARAMETER | DESCRIPTION |
---|---|
data
|
Dictionary containing target data with appropriate keys. |
RETURNS | DESCRIPTION |
---|---|
Target
|
New Target instance created from the dictionary data. |
Source code in siapy/datasets/schemas.py
51 52 53 54 55 56 57 58 59 60 61 62 |
|
from_iterable
abstractmethod
classmethod
¶
Create a Target instance from an iterable of values.
PARAMETER | DESCRIPTION |
---|---|
data
|
Iterable containing target values. |
RETURNS | DESCRIPTION |
---|---|
Target
|
New Target instance created from the iterable data. |
Source code in siapy/datasets/schemas.py
64 65 66 67 68 69 70 71 72 73 74 75 |
|
to_dict
abstractmethod
¶
Convert the target to a dictionary representation.
RETURNS | DESCRIPTION |
---|---|
dict[str, Any]
|
Dictionary containing the target data. |
Source code in siapy/datasets/schemas.py
77 78 79 80 81 82 83 84 |
|
to_dataframe
abstractmethod
¶
to_dataframe() -> DataFrame
Convert the target to a pandas DataFrame.
RETURNS | DESCRIPTION |
---|---|
DataFrame
|
DataFrame representation of the target data. |
Source code in siapy/datasets/schemas.py
86 87 88 89 90 91 92 93 |
|
reset_index
abstractmethod
¶
reset_index() -> Target
Reset the index of all internal pandas objects to a default integer index.
RETURNS | DESCRIPTION |
---|---|
Target
|
New Target instance with reset indices. |
Source code in siapy/datasets/schemas.py
95 96 97 98 99 100 101 102 |
|
ClassificationTarget
¶
Bases: Target
Target variable for classification tasks.
Represents categorical target variables with string labels, numerical values, and optional encoding information for machine learning classification tasks.
model_config
class-attribute
instance-attribute
¶
model_config = ConfigDict(arbitrary_types_allowed=True)
from_iterable
classmethod
¶
from_iterable(data: Iterable[Any]) -> ClassificationTarget
Create a ClassificationTarget from an iterable of labels.
Automatically generates numerical values and encoding for the provided labels.
PARAMETER | DESCRIPTION |
---|---|
data
|
Iterable containing classification labels. |
RETURNS | DESCRIPTION |
---|---|
ClassificationTarget
|
New ClassificationTarget instance with generated values and encoding. |
Source code in siapy/datasets/schemas.py
138 139 140 141 142 143 144 145 146 147 148 149 150 151 |
|
from_dict
classmethod
¶
from_dict(data: dict[str, Any]) -> ClassificationTarget
Create a ClassificationTarget from a dictionary.
PARAMETER | DESCRIPTION |
---|---|
data
|
Dictionary with keys 'label', 'value', and 'encoding'. |
RETURNS | DESCRIPTION |
---|---|
ClassificationTarget
|
New ClassificationTarget instance created from the dictionary. |
Source code in siapy/datasets/schemas.py
153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 |
|
to_dict
¶
Convert the classification target to a dictionary.
RETURNS | DESCRIPTION |
---|---|
dict[str, Any]
|
Dictionary with keys 'label', 'value', and 'encoding' containing list representations of the respective pandas Series. |
Source code in siapy/datasets/schemas.py
171 172 173 174 175 176 177 178 179 180 181 |
|
to_dataframe
¶
to_dataframe() -> DataFrame
Convert the classification target to a pandas DataFrame.
RETURNS | DESCRIPTION |
---|---|
DataFrame
|
DataFrame containing 'value' and 'label' columns. The encoding information is not included in the DataFrame representation. |
Source code in siapy/datasets/schemas.py
183 184 185 186 187 188 189 |
|
reset_index
¶
reset_index() -> ClassificationTarget
Reset indices of label and value Series to default integer index.
RETURNS | DESCRIPTION |
---|---|
ClassificationTarget
|
New ClassificationTarget instance with reset indices for label and value. The encoding Series is preserved as-is since it represents the overall encoding scheme rather than instance-specific data. |
Source code in siapy/datasets/schemas.py
191 192 193 194 195 196 197 198 199 200 201 |
|
RegressionTarget
¶
Bases: Target
Target variable for regression tasks.
Represents continuous numerical target variables for machine learning regression tasks with an optional descriptive name.
model_config
class-attribute
instance-attribute
¶
model_config = ConfigDict(arbitrary_types_allowed=True)
from_iterable
classmethod
¶
from_iterable(data: Iterable[Any]) -> RegressionTarget
Create a RegressionTarget from an iterable of numerical values.
PARAMETER | DESCRIPTION |
---|---|
data
|
Iterable containing numerical regression target values. |
RETURNS | DESCRIPTION |
---|---|
RegressionTarget
|
New RegressionTarget instance with default name "value". |
Source code in siapy/datasets/schemas.py
234 235 236 237 238 239 240 241 242 243 244 245 |
|
from_dict
classmethod
¶
from_dict(data: dict[str, Any]) -> RegressionTarget
Create a RegressionTarget from a dictionary.
PARAMETER | DESCRIPTION |
---|---|
data
|
Dictionary with required key 'value' and optional key 'name'. |
RETURNS | DESCRIPTION |
---|---|
RegressionTarget
|
New RegressionTarget instance. Uses "value" as default name if not provided in the dictionary. |
Source code in siapy/datasets/schemas.py
247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 |
|
to_dict
¶
Convert the regression target to a dictionary.
RETURNS | DESCRIPTION |
---|---|
dict[str, Any]
|
Dictionary with keys 'value' and 'name' containing the list representation of values and the descriptive name. |
Source code in siapy/datasets/schemas.py
263 264 265 266 267 268 269 270 271 272 |
|
to_dataframe
¶
to_dataframe() -> DataFrame
Convert the regression target to a pandas DataFrame.
RETURNS | DESCRIPTION |
---|---|
DataFrame
|
DataFrame containing a single column with the target values. The column name corresponds to the Series name, not the target name. |
Source code in siapy/datasets/schemas.py
274 275 276 277 278 279 280 |
|
reset_index
¶
reset_index() -> RegressionTarget
Reset the index of the value Series to a default integer index.
RETURNS | DESCRIPTION |
---|---|
RegressionTarget
|
New RegressionTarget instance with reset index for the value Series. The name is preserved. |
Source code in siapy/datasets/schemas.py
282 283 284 285 286 287 288 |
|
TabularDatasetData
dataclass
¶
TabularDatasetData(
signatures: Signatures,
metadata: DataFrame,
target: Target | None = None,
)
Container for tabular machine learning dataset components.
Combines spectral signatures, metadata, and optional target variables into a unified dataset structure for machine learning workflows. Ensures data consistency through length validation and provides various data access patterns.
from_dict
classmethod
¶
from_dict(data: dict[str, Any]) -> TabularDatasetData
Create a TabularDatasetData instance from a dictionary.
PARAMETER | DESCRIPTION |
---|---|
data
|
Dictionary containing dataset components with keys: |
RETURNS | DESCRIPTION |
---|---|
TabularDatasetData
|
New TabularDatasetData instance created from the dictionary data. |
Source code in siapy/datasets/schemas.py
351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 |
|
target_from_dict
staticmethod
¶
Create an appropriate Target instance from a dictionary.
Automatically determines whether to create a ClassificationTarget or RegressionTarget based on the keys present in the data dictionary.
PARAMETER | DESCRIPTION |
---|---|
data
|
Optional dictionary containing target data. If None, returns None. |
RETURNS | DESCRIPTION |
---|---|
Optional[Target]
|
ClassificationTarget, RegressionTarget, or None based on input data. |
RAISES | DESCRIPTION |
---|---|
InvalidInputError
|
If the dictionary keys don't match any known target type. |
Source code in siapy/datasets/schemas.py
370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 |
|
set_attributes
¶
set_attributes(
*,
signatures: Signatures | None = None,
metadata: DataFrame | None = None,
target: Target | None = None,
) -> TabularDatasetData
Create a new dataset with updated attributes.
Creates a copy of the current dataset with specified attributes replaced. Unspecified attributes are copied from the current dataset.
PARAMETER | DESCRIPTION |
---|---|
signatures
|
Optional new Signatures to replace current signatures.
TYPE:
|
metadata
|
Optional new DataFrame to replace current metadata.
TYPE:
|
target
|
Optional new Target to replace current target.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
TabularDatasetData
|
New TabularDatasetData instance with updated attributes. |
Note
The returned dataset will be validated for length consistency.
Source code in siapy/datasets/schemas.py
429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 |
|
to_dict
¶
Convert the dataset to a dictionary representation.
RETURNS | DESCRIPTION |
---|---|
dict[str, Any]
|
Dictionary with keys 'pixels', 'signals', 'metadata', and 'target'. The target key contains None if no target is present. |
Source code in siapy/datasets/schemas.py
458 459 460 461 462 463 464 465 466 467 468 469 470 |
|
to_dataframe
¶
to_dataframe() -> DataFrame
Convert the dataset to a single pandas DataFrame.
Combines signatures, metadata, and target (if present) into a single DataFrame with all columns at the same level.
RETURNS | DESCRIPTION |
---|---|
DataFrame
|
DataFrame containing all dataset components as columns. |
Source code in siapy/datasets/schemas.py
472 473 474 475 476 477 478 479 480 481 482 483 484 485 |
|
to_dataframe_multiindex
¶
to_dataframe_multiindex() -> DataFrame
Convert the dataset to a pandas DataFrame with MultiIndex columns.
Creates a DataFrame where columns are organized hierarchically by category (pixel, signal, metadata, target) and field names within each category.
RETURNS | DESCRIPTION |
---|---|
DataFrame
|
DataFrame with MultiIndex columns having levels ['category', 'field']. |
RAISES | DESCRIPTION |
---|---|
InvalidInputError
|
If target type is not ClassificationTarget or RegressionTarget. |
Source code in siapy/datasets/schemas.py
487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 |
|
reset_index
¶
reset_index() -> TabularDatasetData
Reset indices of all dataset components to default integer indices.
Creates a new dataset with all pandas objects having their indices reset to consecutive integers starting from 0.
RETURNS | DESCRIPTION |
---|---|
TabularDatasetData
|
New TabularDatasetData instance with reset indices for all components. |
Source code in siapy/datasets/schemas.py
530 531 532 533 534 535 536 537 538 539 540 541 542 543 |
|
copy
¶
copy() -> TabularDatasetData
Create a deep copy of the dataset.
Creates a new TabularDatasetData instance with copied versions of all components, ensuring that modifications to the copy don't affect the original.
RETURNS | DESCRIPTION |
---|---|
TabularDatasetData
|
New TabularDatasetData instance that is a deep copy of the current dataset. |
Source code in siapy/datasets/schemas.py
545 546 547 548 549 550 551 552 553 554 555 556 557 558 |
|