01 - Data Loading and Exploration¶

This notebook demonstrates how to prepare your results and use the EvaluationData class, which is the foundation of the labicompare library.

In this tutorial, you will learn:

How to structure your raw results in a Pandas DataFrame.
How to initialize the EvaluationData object.
How the library handles missing values and automatic ranking.

In [11]:

Copied!





import pandas as pd
import numpy as np
from labicompare.core.data import EvaluationData

# Setting display options for better visualization
pd.options.display.precision = 4
import pandas as pd
import numpy as np
from labicompare.core.data import EvaluationData

# Setting display options for better visualization
pd.options.display.precision = 4

Step 1: Preparing Raw Data¶

The library expects a pandas.DataFrame where:

Columns are the models/algorithms.
Rows are the datasets or cross-validation folds.

Let's load a sample dataset to see how the library reacts.

In [12]:

Copied!

df = pd.read_csv("./results.csv", index_col="dataset")

print("Raw DataFrame:")
display(df)
df = pd.read_csv("./results.csv", index_col="dataset")

print("Raw DataFrame:")
display(df)

Raw DataFrame:

	FCN	ResNet	Inception	InceptionTime	LITE	LITETime	ROCKET	MultiROCKET
dataset
ACSF1	0.8960	0.9160	NaN	0.9400	0.8980	0.9100	0.8860	0.9200
Adiac	0.8445	0.8332	0.8220	0.8465	0.8102	0.8338	0.7834	0.8338
AllGestureWiimoteX	0.7100	0.7406	0.7729	0.7871	0.7423	0.7657	0.7900	0.7271
AllGestureWiimoteY	0.7820	0.7937	0.8089	0.8229	0.7717	0.7886	0.7727	0.7729
AllGestureWiimoteZ	0.6871	0.7257	0.7763	0.7929	0.7317	0.7529	0.7661	0.7529
...	...	...	...	...	...	...	...	...
Wine	0.6259	0.7222	0.6667	0.6667	0.6407	0.6667	0.8130	0.8889
WordSynonyms	0.5643	0.6166	0.7292	0.7602	0.6787	0.7179	0.7534	0.7759
Worms	0.7766	0.7610	0.7688	0.7922	0.8104	0.8182	0.7403	0.7532
WormsTwoClass	0.7325	0.7481	0.7584	0.7792	0.7844	0.7662	0.7974	0.7922
Yoga	0.8424	0.8667	0.8998	0.9077	0.9067	0.9177	0.9104	0.9190

128 rows × 8 columns

Step 2: Initializing EvaluationData¶

When you create the object, you specify higher_is_better.

For Accuracy/F1: True
For Error/Loss: False

Note: Observe the warning about NaNs. The library will drop 'Dataset_4' because it has a missing value for 'DecisionTree'.

In [13]:

Copied!

# Wrap the data
eval_data = EvaluationData(df, higher_is_better=True)

# Inspect the object summary
print(eval_data)
# Wrap the data
eval_data = EvaluationData(df, higher_is_better=True)

# Inspect the object summary
print(eval_data)

WARNING:labicompare.core.data:Null values detected. Rows (or datasets) with NaNs will be removed to ensure the integrity of paired statistical tests and methods.

<EvaluationData: 127 datasets, 8 models>

Step 3: Exploring the Attributes¶

The EvaluationData class exposes several attributes to check the state of your data after cleaning and ranking.

In [16]:

Copied!





print(f"Models: {eval_data.model_names}")
print(f"Valid Datasets: {eval_data.dataset_names}")
print(f"Metric direction: {'Higher is Better' if eval_data.higher_is_better else 'Lower is Better'}")

print("\n--- Automatic Ranking (ranks_df) ---")
# The best model in each row gets rank 1.0
display(eval_data.ranks_df)
print(f"Models: {eval_data.model_names}")
print(f"Valid Datasets: {eval_data.dataset_names}")
print(f"Metric direction: {'Higher is Better' if eval_data.higher_is_better else 'Lower is Better'}")

print("\n--- Automatic Ranking (ranks_df) ---")
# The best model in each row gets rank 1.0
display(eval_data.ranks_df)

Models: ['FCN', 'ResNet', 'Inception', 'InceptionTime', 'LITE', 'LITETime', 'ROCKET', 'MultiROCKET']
Valid Datasets: ['Adiac', 'AllGestureWiimoteX', 'AllGestureWiimoteY', 'AllGestureWiimoteZ', 'ArrowHead', 'BME', 'Beef', 'BeetleFly', 'BirdChicken', 'CBF', 'Car', 'Chinatown', 'ChlorineConcentration', 'CinCECGTorso', 'Coffee', 'Computers', 'CricketX', 'CricketY', 'CricketZ', 'Crop', 'DiatomSizeReduction', 'DistalPhalanxOutlineAgeGroup', 'DistalPhalanxOutlineCorrect', 'DistalPhalanxTW', 'DodgerLoopDay', 'DodgerLoopGame', 'DodgerLoopWeekend', 'ECG200', 'ECG5000', 'ECGFiveDays', 'EOGHorizontalSignal', 'EOGVerticalSignal', 'Earthquakes', 'ElectricDevices', 'EthanolLevel', 'FaceAll', 'FaceFour', 'FacesUCR', 'FiftyWords', 'Fish', 'FordA', 'FordB', 'FreezerRegularTrain', 'FreezerSmallTrain', 'Fungi', 'GestureMidAirD1', 'GestureMidAirD2', 'GestureMidAirD3', 'GesturePebbleZ1', 'GesturePebbleZ2', 'GunPoint', 'GunPointAgeSpan', 'GunPointMaleVersusFemale', 'GunPointOldVersusYoung', 'Ham', 'HandOutlines', 'Haptics', 'Herring', 'HouseTwenty', 'InlineSkate', 'InsectEPGRegularTrain', 'InsectEPGSmallTrain', 'InsectWingbeatSound', 'ItalyPowerDemand', 'LargeKitchenAppliances', 'Lightning2', 'Lightning7', 'Mallat', 'Meat', 'MedicalImages', 'MelbournePedestrian', 'MiddlePhalanxOutlineAgeGroup', 'MiddlePhalanxOutlineCorrect', 'MiddlePhalanxTW', 'MixedShapesRegularTrain', 'MixedShapesSmallTrain', 'MoteStrain', 'NonInvasiveFetalECGThorax1', 'NonInvasiveFetalECGThorax2', 'OSULeaf', 'OliveOil', 'PLAID', 'PhalangesOutlinesCorrect', 'Phoneme', 'PickupGestureWiimoteZ', 'PigAirwayPressure', 'PigArtPressure', 'PigCVP', 'Plane', 'PowerCons', 'ProximalPhalanxOutlineAgeGroup', 'ProximalPhalanxOutlineCorrect', 'ProximalPhalanxTW', 'RefrigerationDevices', 'Rock', 'ScreenType', 'SemgHandGenderCh2', 'SemgHandMovementCh2', 'SemgHandSubjectCh2', 'ShakeGestureWiimoteZ', 'ShapeletSim', 'ShapesAll', 'SmallKitchenAppliances', 'SmoothSubspace', 'SonyAIBORobotSurface1', 'SonyAIBORobotSurface2', 'StarLightCurves', 'Strawberry', 'SwedishLeaf', 'Symbols', 'SyntheticControl', 'ToeSegmentation1', 'ToeSegmentation2', 'Trace', 'TwoLeadECG', 'TwoPatterns', 'UMD', 'UWaveGestureLibraryAll', 'UWaveGestureLibraryX', 'UWaveGestureLibraryY', 'UWaveGestureLibraryZ', 'Wafer', 'Wine', 'WordSynonyms', 'Worms', 'WormsTwoClass', 'Yoga']
Metric direction: Higher is Better

--- Automatic Ranking (ranks_df) ---

	FCN	ResNet	Inception	InceptionTime	LITE	LITETime	ROCKET	MultiROCKET
dataset
Adiac	2.0	5.0	6.0	1.0	7.0	3.5	8.0	3.5
AllGestureWiimoteX	8.0	6.0	3.0	2.0	5.0	4.0	1.0	7.0
AllGestureWiimoteY	5.0	3.0	2.0	1.0	8.0	4.0	7.0	6.0
AllGestureWiimoteZ	8.0	7.0	2.0	1.0	6.0	4.5	3.0	4.5
ArrowHead	4.5	6.0	3.0	2.0	7.0	4.5	8.0	1.0
...	...	...	...	...	...	...	...	...
Wine	8.0	3.0	5.0	5.0	7.0	5.0	2.0	1.0
WordSynonyms	8.0	7.0	4.0	2.0	6.0	5.0	3.0	1.0
Worms	4.0	6.0	5.0	3.0	2.0	1.0	8.0	7.0
WormsTwoClass	8.0	7.0	6.0	4.0	3.0	5.0	1.0	2.0
Yoga	8.0	7.0	6.0	4.0	5.0	2.0	3.0	1.0

127 rows × 8 columns

Visualizing the Transformation¶

The library converts scores into ranks to perform non-parametric tests. Here is the transformation for the first valid dataset:

In [17]:

Copied!





first_dataset = eval_data.dataset_names[0]
comparison = pd.DataFrame({
  'Score': eval_data._df.loc[first_dataset],
  'Rank': eval_data.ranks_df.loc[first_dataset]
})
print(f"Transformation for {first_dataset}:")
display(comparison.sort_values(by='Rank'))
first_dataset = eval_data.dataset_names[0]
comparison = pd.DataFrame({
  'Score': eval_data._df.loc[first_dataset],
  'Rank': eval_data.ranks_df.loc[first_dataset]
})
print(f"Transformation for {first_dataset}:")
display(comparison.sort_values(by='Rank'))

Transformation for Adiac:

	Score	Rank
InceptionTime	0.8465	1.0
FCN	0.8445	2.0
MultiROCKET	0.8338	3.5
LITETime	0.8338	3.5
ResNet	0.8332	5.0
Inception	0.8220	6.0
LITE	0.8102	7.0
ROCKET	0.7834	8.0