01 - Data Loading and Exploration¶
This notebook demonstrates how to prepare your results and use the EvaluationData class, which is the foundation of the labicompare library.
In this tutorial, you will learn:
- How to structure your raw results in a Pandas DataFrame.
- How to initialize the
EvaluationDataobject. - How the library handles missing values and automatic ranking.
In [11]:
Copied!
import pandas as pd
import numpy as np
from labicompare.core.data import EvaluationData
# Setting display options for better visualization
pd.options.display.precision = 4
import pandas as pd
import numpy as np
from labicompare.core.data import EvaluationData
# Setting display options for better visualization
pd.options.display.precision = 4
Step 1: Preparing Raw Data¶
The library expects a pandas.DataFrame where:
- Columns are the models/algorithms.
- Rows are the datasets or cross-validation folds.
Let's load a sample dataset to see how the library reacts.
In [12]:
Copied!
df = pd.read_csv("./results.csv", index_col="dataset")
print("Raw DataFrame:")
display(df)
df = pd.read_csv("./results.csv", index_col="dataset")
print("Raw DataFrame:")
display(df)
Raw DataFrame:
| FCN | ResNet | Inception | InceptionTime | LITE | LITETime | ROCKET | MultiROCKET | |
|---|---|---|---|---|---|---|---|---|
| dataset | ||||||||
| ACSF1 | 0.8960 | 0.9160 | NaN | 0.9400 | 0.8980 | 0.9100 | 0.8860 | 0.9200 |
| Adiac | 0.8445 | 0.8332 | 0.8220 | 0.8465 | 0.8102 | 0.8338 | 0.7834 | 0.8338 |
| AllGestureWiimoteX | 0.7100 | 0.7406 | 0.7729 | 0.7871 | 0.7423 | 0.7657 | 0.7900 | 0.7271 |
| AllGestureWiimoteY | 0.7820 | 0.7937 | 0.8089 | 0.8229 | 0.7717 | 0.7886 | 0.7727 | 0.7729 |
| AllGestureWiimoteZ | 0.6871 | 0.7257 | 0.7763 | 0.7929 | 0.7317 | 0.7529 | 0.7661 | 0.7529 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| Wine | 0.6259 | 0.7222 | 0.6667 | 0.6667 | 0.6407 | 0.6667 | 0.8130 | 0.8889 |
| WordSynonyms | 0.5643 | 0.6166 | 0.7292 | 0.7602 | 0.6787 | 0.7179 | 0.7534 | 0.7759 |
| Worms | 0.7766 | 0.7610 | 0.7688 | 0.7922 | 0.8104 | 0.8182 | 0.7403 | 0.7532 |
| WormsTwoClass | 0.7325 | 0.7481 | 0.7584 | 0.7792 | 0.7844 | 0.7662 | 0.7974 | 0.7922 |
| Yoga | 0.8424 | 0.8667 | 0.8998 | 0.9077 | 0.9067 | 0.9177 | 0.9104 | 0.9190 |
128 rows × 8 columns
Step 2: Initializing EvaluationData¶
When you create the object, you specify higher_is_better.
- For Accuracy/F1:
True - For Error/Loss:
False
Note: Observe the warning about NaNs. The library will drop 'Dataset_4' because it has a missing value for 'DecisionTree'.
In [13]:
Copied!
# Wrap the data
eval_data = EvaluationData(df, higher_is_better=True)
# Inspect the object summary
print(eval_data)
# Wrap the data
eval_data = EvaluationData(df, higher_is_better=True)
# Inspect the object summary
print(eval_data)
WARNING:labicompare.core.data:Null values detected. Rows (or datasets) with NaNs will be removed to ensure the integrity of paired statistical tests and methods.
<EvaluationData: 127 datasets, 8 models>
Step 3: Exploring the Attributes¶
The EvaluationData class exposes several attributes to check the state of your data after cleaning and ranking.
In [16]:
Copied!
print(f"Models: {eval_data.model_names}")
print(f"Valid Datasets: {eval_data.dataset_names}")
print(f"Metric direction: {'Higher is Better' if eval_data.higher_is_better else 'Lower is Better'}")
print("\n--- Automatic Ranking (ranks_df) ---")
# The best model in each row gets rank 1.0
display(eval_data.ranks_df)
print(f"Models: {eval_data.model_names}")
print(f"Valid Datasets: {eval_data.dataset_names}")
print(f"Metric direction: {'Higher is Better' if eval_data.higher_is_better else 'Lower is Better'}")
print("\n--- Automatic Ranking (ranks_df) ---")
# The best model in each row gets rank 1.0
display(eval_data.ranks_df)
Models: ['FCN', 'ResNet', 'Inception', 'InceptionTime', 'LITE', 'LITETime', 'ROCKET', 'MultiROCKET'] Valid Datasets: ['Adiac', 'AllGestureWiimoteX', 'AllGestureWiimoteY', 'AllGestureWiimoteZ', 'ArrowHead', 'BME', 'Beef', 'BeetleFly', 'BirdChicken', 'CBF', 'Car', 'Chinatown', 'ChlorineConcentration', 'CinCECGTorso', 'Coffee', 'Computers', 'CricketX', 'CricketY', 'CricketZ', 'Crop', 'DiatomSizeReduction', 'DistalPhalanxOutlineAgeGroup', 'DistalPhalanxOutlineCorrect', 'DistalPhalanxTW', 'DodgerLoopDay', 'DodgerLoopGame', 'DodgerLoopWeekend', 'ECG200', 'ECG5000', 'ECGFiveDays', 'EOGHorizontalSignal', 'EOGVerticalSignal', 'Earthquakes', 'ElectricDevices', 'EthanolLevel', 'FaceAll', 'FaceFour', 'FacesUCR', 'FiftyWords', 'Fish', 'FordA', 'FordB', 'FreezerRegularTrain', 'FreezerSmallTrain', 'Fungi', 'GestureMidAirD1', 'GestureMidAirD2', 'GestureMidAirD3', 'GesturePebbleZ1', 'GesturePebbleZ2', 'GunPoint', 'GunPointAgeSpan', 'GunPointMaleVersusFemale', 'GunPointOldVersusYoung', 'Ham', 'HandOutlines', 'Haptics', 'Herring', 'HouseTwenty', 'InlineSkate', 'InsectEPGRegularTrain', 'InsectEPGSmallTrain', 'InsectWingbeatSound', 'ItalyPowerDemand', 'LargeKitchenAppliances', 'Lightning2', 'Lightning7', 'Mallat', 'Meat', 'MedicalImages', 'MelbournePedestrian', 'MiddlePhalanxOutlineAgeGroup', 'MiddlePhalanxOutlineCorrect', 'MiddlePhalanxTW', 'MixedShapesRegularTrain', 'MixedShapesSmallTrain', 'MoteStrain', 'NonInvasiveFetalECGThorax1', 'NonInvasiveFetalECGThorax2', 'OSULeaf', 'OliveOil', 'PLAID', 'PhalangesOutlinesCorrect', 'Phoneme', 'PickupGestureWiimoteZ', 'PigAirwayPressure', 'PigArtPressure', 'PigCVP', 'Plane', 'PowerCons', 'ProximalPhalanxOutlineAgeGroup', 'ProximalPhalanxOutlineCorrect', 'ProximalPhalanxTW', 'RefrigerationDevices', 'Rock', 'ScreenType', 'SemgHandGenderCh2', 'SemgHandMovementCh2', 'SemgHandSubjectCh2', 'ShakeGestureWiimoteZ', 'ShapeletSim', 'ShapesAll', 'SmallKitchenAppliances', 'SmoothSubspace', 'SonyAIBORobotSurface1', 'SonyAIBORobotSurface2', 'StarLightCurves', 'Strawberry', 'SwedishLeaf', 'Symbols', 'SyntheticControl', 'ToeSegmentation1', 'ToeSegmentation2', 'Trace', 'TwoLeadECG', 'TwoPatterns', 'UMD', 'UWaveGestureLibraryAll', 'UWaveGestureLibraryX', 'UWaveGestureLibraryY', 'UWaveGestureLibraryZ', 'Wafer', 'Wine', 'WordSynonyms', 'Worms', 'WormsTwoClass', 'Yoga'] Metric direction: Higher is Better --- Automatic Ranking (ranks_df) ---
| FCN | ResNet | Inception | InceptionTime | LITE | LITETime | ROCKET | MultiROCKET | |
|---|---|---|---|---|---|---|---|---|
| dataset | ||||||||
| Adiac | 2.0 | 5.0 | 6.0 | 1.0 | 7.0 | 3.5 | 8.0 | 3.5 |
| AllGestureWiimoteX | 8.0 | 6.0 | 3.0 | 2.0 | 5.0 | 4.0 | 1.0 | 7.0 |
| AllGestureWiimoteY | 5.0 | 3.0 | 2.0 | 1.0 | 8.0 | 4.0 | 7.0 | 6.0 |
| AllGestureWiimoteZ | 8.0 | 7.0 | 2.0 | 1.0 | 6.0 | 4.5 | 3.0 | 4.5 |
| ArrowHead | 4.5 | 6.0 | 3.0 | 2.0 | 7.0 | 4.5 | 8.0 | 1.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| Wine | 8.0 | 3.0 | 5.0 | 5.0 | 7.0 | 5.0 | 2.0 | 1.0 |
| WordSynonyms | 8.0 | 7.0 | 4.0 | 2.0 | 6.0 | 5.0 | 3.0 | 1.0 |
| Worms | 4.0 | 6.0 | 5.0 | 3.0 | 2.0 | 1.0 | 8.0 | 7.0 |
| WormsTwoClass | 8.0 | 7.0 | 6.0 | 4.0 | 3.0 | 5.0 | 1.0 | 2.0 |
| Yoga | 8.0 | 7.0 | 6.0 | 4.0 | 5.0 | 2.0 | 3.0 | 1.0 |
127 rows × 8 columns
Visualizing the Transformation¶
The library converts scores into ranks to perform non-parametric tests. Here is the transformation for the first valid dataset:
In [17]:
Copied!
first_dataset = eval_data.dataset_names[0]
comparison = pd.DataFrame({
'Score': eval_data._df.loc[first_dataset],
'Rank': eval_data.ranks_df.loc[first_dataset]
})
print(f"Transformation for {first_dataset}:")
display(comparison.sort_values(by='Rank'))
first_dataset = eval_data.dataset_names[0]
comparison = pd.DataFrame({
'Score': eval_data._df.loc[first_dataset],
'Rank': eval_data.ranks_df.loc[first_dataset]
})
print(f"Transformation for {first_dataset}:")
display(comparison.sort_values(by='Rank'))
Transformation for Adiac:
| Score | Rank | |
|---|---|---|
| InceptionTime | 0.8465 | 1.0 |
| FCN | 0.8445 | 2.0 |
| MultiROCKET | 0.8338 | 3.5 |
| LITETime | 0.8338 | 3.5 |
| ResNet | 0.8332 | 5.0 |
| Inception | 0.8220 | 6.0 |
| LITE | 0.8102 | 7.0 |
| ROCKET | 0.7834 | 8.0 |