UMAP Performance Analysis: By Metrics¶

Research Questions¶

RQ1: What is the performance impact of individual WASM configurations (Distance, Tree, Matrix, NN Descent, Optimizer components) compared to pure JavaScript?

RQ2: How does enabling all WASM configurations together compare to configurations incorporating individual components and pure JavaScript?

Methodology¶

  • Test Environment: All benchmarks run on WSL2 (Windows Subsystem for Linux)
  • Baseline: Pure JavaScript UMAP implementation (no WASM)
  • Configurations Incorporating Individual Components: Each WASM configuration enabled separately (Dist, Tree, Matrix, NN, Opt components)
  • Fully WASM-enabled configuration: The fully WASM-enabled configuration
  • Metrics: Execution Time (ms), Memory (MB), Quality (trustworthiness), FPS, Responsiveness (ms)
  • Statistical Analysis: Mann-Whitney U tests, bootstrap confidence intervals, effect sizes

Notebook Structure¶

This notebook is organized by metrics to support thesis chapter writing:

  1. Setup & Data Preparation - Load data and configure environment
  2. Overview - Quick summary of all metrics across configurations
  3. Baseline Analysis - Pure JavaScript performance characteristics
  4. Execution Time & Speedup - Execution time analysis and speedup calculations
  5. Memory Usage - Memory consumption patterns and WASM overhead
  6. Embedding Quality - Trustworthiness preservation and quality deltas
  7. Responsiveness & UX Metrics - FPS, interaction latency, and percentile analysis (p50/p95)
  8. Dataset Size Effects - How metrics scale with data size (small/medium/large)
  9. Overall Rankings - Composite performance scores and aggregated comparison table
  10. Export Results - Save tables and figures for thesis
  11. Final Conclusions - Recommendations by use case and dataset size
  12. Summary - Quick reference guide to notebook structure

1. Setup and Data Loading¶

In [173]:
# Core data manipulation and analysis
import pandas as pd
import numpy as np
import os
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Visualization
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
import seaborn as sns
from IPython.display import Markdown, display
%config InlineBackend.figure_formats = ['svg']

# Statistical analysis
from scipy import stats
from scipy.stats import mannwhitneyu, bootstrap
from scipy.optimize import curve_fit

# Set styling for publication-quality figures
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
plt.rcParams['figure.dpi'] = 100
plt.rcParams['savefig.dpi'] = 300
plt.rcParams['svg.fonttype'] = 'none'
plt.rcParams['font.size'] = 11



def save_figure(path, **kwargs):
    """Save figures as SVG for document-friendly vector exports."""
    output_path = Path(path).with_suffix('.svg')
    plt.savefig(output_path, format='svg', **kwargs)

# Pandas display options
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 120)
pd.set_option('display.precision', 3)

# Set random seed for reproducibility
np.random.seed(42)

print("✓ All dependencies loaded successfully")
print(f"  pandas: {pd.__version__}")
print(f"  numpy: {np.__version__}")
✓ All dependencies loaded successfully
  pandas: 3.0.2
  numpy: 2.4.4
In [174]:
# Load cleaned data (run preprocess.ipynb first to generate this file)
df_analysis = pd.read_csv('../outputs/preprocessed.csv')

# Ensure dataset scope is available for downstream analysis
if 'Scope' not in df_analysis.columns:
    def _scope_from_size(size):
        if pd.isna(size):
            return 'unknown'
        if size <= 200:
            return 'small'
        if size <= 800:
            return 'mid'
        return 'large'

    if 'dataset_size' in df_analysis.columns:
        df_analysis['Scope'] = df_analysis['dataset_size'].apply(_scope_from_size)
    else:
        df_analysis['Scope'] = 'unknown'

# Define standard configuration order
configuration_order = ['Baseline (JS)', 'Configuration incorporating Distance', 'Configuration incorporating Tree', 'Configuration incorporating Matrix', 'Configuration incorporating NN Descent', 'Configuration incorporating Optimizer', 'Fully WASM-enabled configuration']
configuration_order = [f for f in configuration_order if f in df_analysis['configuration_name'].unique()]

# Shared categorical colors for consistent figures
configuration_colors = {
    'Baseline (JS)': '#1f77b4',  # blue
    'Configuration incorporating Distance': '#ff7f0e',        # orange
    'Configuration incorporating Tree': '#2ca02c',            # green
    'Configuration incorporating Matrix': '#d62728',          # red
    'Configuration incorporating NN Descent': '#9467bd',      # purple
    'Configuration incorporating Optimizer': '#8c564b',       # brown
    'Fully WASM-enabled configuration': '#e377c2',    # pink
}
configuration_palette = {configuration: configuration_colors[configuration] for configuration in configuration_order if configuration in configuration_colors}

# Short labels used only in figures so tick labels and legends stay readable.
configuration_plot_labels = {
    'Baseline': 'Baseline (JS)',
    'Baseline (JS)': 'Baseline (JS)',
    'JavaScript baseline': 'Baseline (JS)',
    'Distance': 'Distance',
    'Configuration incorporating Distance': 'Distance',
    'Tree': 'Tree',
    'Configuration incorporating Tree': 'Tree',
    'Matrix': 'Matrix',
    'Configuration incorporating Matrix': 'Matrix',
    'NN Descent': 'NN-Descent',
    'NN-Descent': 'NN-Descent',
    'Configuration incorporating NN Descent': 'NN-Descent',
    'Optimizer': 'Optimizer',
    'Configuration incorporating Optimizer': 'Optimizer',
    'All Features': 'All WASM',
    'All features': 'All WASM',
    'All Configurations': 'All WASM',
    'All WASM components': 'All WASM',
    'Fully WASM-enabled configuration': 'All WASM',
}

def configuration_plot_label(configuration):
    return configuration_plot_labels.get(configuration, configuration)

df_analysis['configuration_plot_label'] = df_analysis['configuration_name'].map(configuration_plot_label)
plot_configuration_order = [configuration_plot_label(configuration) for configuration in configuration_order]
configuration_plot_palette = {
    configuration_plot_label(configuration): color
    for configuration, color in configuration_colors.items()
}

scope_order_default = ['small', 'mid', 'large']
scope_order = [scope for scope in scope_order_default if scope in df_analysis['Scope'].unique()]
scope_order += [scope for scope in sorted(df_analysis['Scope'].unique()) if scope not in scope_order]
scope_colors = {
    'small': '#4c78a8',
    'mid': '#f58518',
    'large': '#54a24b',
    'unknown': '#9d9d9d',
}
scope_palette = {scope: scope_colors.get(scope, '#9d9d9d') for scope in scope_order}
reference_line_color = '#333333'

# Shared style for scaling plots that combine measured points with fitted or predicted curves.
observed_marker = 'o'
observed_linestyle = '-'
observed_markersize = 7
observed_linewidth = 2.4
observed_alpha = 0.85
fitted_linestyle = '--'
fitted_linewidth = 1.5
fitted_alpha = 0.45
predicted_marker = 'x'
predicted_marker_size = 90
predicted_alpha = 0.6

def add_scaling_legends(ax, configurations, *, include_predicted_points=False,
                        include_reference_line=False, config_loc='upper left',
                        style_loc='upper right', config_ncol=2):
    config_handles = [
        Line2D([0], [0], color=configuration_colors.get(configuration, '#9d9d9d'),
               marker=observed_marker, linestyle=observed_linestyle,
               linewidth=observed_linewidth, markersize=observed_markersize,
               label=configuration_plot_label(configuration))
        for configuration in configurations
    ]
    config_legend = ax.legend(handles=config_handles, title='Configuration',
                              fontsize=9, title_fontsize=10,
                              loc=config_loc, ncol=config_ncol)
    ax.add_artist(config_legend)

    style_handles = [
        Line2D([0], [0], color=reference_line_color, marker=observed_marker,
               linestyle=observed_linestyle, linewidth=observed_linewidth,
               markersize=observed_markersize, alpha=observed_alpha,
               label='Observed'),
        Line2D([0], [0], color=reference_line_color, linestyle=fitted_linestyle,
               linewidth=fitted_linewidth, alpha=fitted_alpha,
               label='Fitted / predicted curve'),
    ]
    if include_predicted_points:
        style_handles.append(
            Line2D([0], [0], color=reference_line_color, marker=predicted_marker,
                   linestyle='None', markersize=observed_markersize,
                   alpha=predicted_alpha, label='Predicted points')
        )
    if include_reference_line:
        style_handles.append(
            Line2D([0], [0], color=reference_line_color, linestyle='--',
                   linewidth=2, alpha=0.7, label='Baseline reference')
        )

    ax.legend(handles=style_handles, title='Series', fontsize=9,
              title_fontsize=10, loc=style_loc)


print(f"✓ Loaded {len(df_analysis):,} cleaned measurements")
print(f"Configurations: {sorted(df_analysis['configuration_name'].unique())}")
print(f"Datasets: {df_analysis['dataset_name'].nunique()}")
✓ Loaded 500 cleaned measurements
Configurations: ['Fully WASM-enabled configuration', 'Baseline (JS)', 'Configuration incorporating Distance', 'Configuration incorporating Matrix', 'Configuration incorporating NN Descent', 'Configuration incorporating Optimizer', 'Configuration incorporating Tree']
Datasets: 6

2. Overview: All Metrics Summary¶

Quick overview of all performance metrics across configurations.

In [175]:
# Compute summary statistics for all metrics
baseline_label = 'Baseline (JS)'

# Calculate medians for each metric by configuration
summary_stats = df_analysis.groupby('configuration_name').agg({
    'execution_time_ms': ['median', 'mean', 'std'],
    'memory_delta_mb': ['median', 'mean', 'std'],
    'trustworthiness': ['median', 'mean', 'std'],
    'fps_avg': ['median', 'mean', 'std'],
    'responsiveness_ms': ['median', 'mean', 'std']
}).round(3)

print("Summary Statistics by Configuration (ordered):")
display(summary_stats.loc[configuration_order].rename(columns={'execution_time_ms': 'Execution Time (ms)'}, level=0))

# Calculate speedups
execution_time_medians = df_analysis.groupby('configuration_name')['execution_time_ms'].median()
speedup_rows = []
if baseline_label in execution_time_medians.index:
    baseline_execution_time = execution_time_medians[baseline_label]
    for configuration, rt in execution_time_medians.drop(baseline_label).items():
        if rt > 0:
            speedup_rows.append({'configuration': configuration, 'speedup': baseline_execution_time / rt, 'improvement_%': ((baseline_execution_time / rt) - 1) * 100})

speedup_summary = pd.DataFrame(speedup_rows).sort_values('speedup', ascending=False)
print("\nSpeedup vs Baseline:")
display(speedup_summary.round(3))
Summary Statistics by Configuration (ordered):
Execution Time (ms) memory_delta_mb trustworthiness fps_avg responsiveness_ms
median mean std median mean std median mean std median mean std median mean std
configuration_name
Baseline (JS) 3512.85 3552.003 827.634 14.478 13.002 9.452 0.969 0.900 0.139 57.463 55.425 4.960 21.487 34.197 33.610
Configuration incorporating Distance 3504.15 3610.205 871.606 12.275 11.694 9.505 0.969 0.901 0.139 57.461 55.151 5.428 21.530 35.152 34.678
Configuration incorporating Tree 3478.60 3496.438 788.287 9.784 14.369 15.581 0.970 0.901 0.139 56.978 55.296 5.141 20.817 32.765 30.853
Configuration incorporating Matrix 3415.65 3501.413 811.153 10.721 11.783 10.638 0.970 0.900 0.141 57.510 55.552 4.783 19.517 31.793 32.978
Configuration incorporating NN Descent 3489.85 3603.920 844.058 16.870 17.982 11.250 0.970 0.899 0.142 57.006 55.307 5.058 20.427 34.698 35.178
Configuration incorporating Optimizer 2357.95 2516.780 1248.133 6.294 8.424 7.699 0.967 0.883 0.167 39.660 34.472 16.727 24.637 42.048 39.075
Fully WASM-enabled configuration 2237.20 2396.718 1176.987 12.668 13.215 12.102 0.967 0.884 0.165 38.262 33.735 17.241 20.933 36.713 35.388
Speedup vs Baseline:
configuration speedup improvement_%
0 Fully WASM-enabled configuration 1.570 57.020
4 Configuration incorporating Optimizer 1.490 48.979
2 Configuration incorporating Matrix 1.028 2.846
5 Configuration incorporating Tree 1.010 0.985
3 Configuration incorporating NN Descent 1.007 0.659
1 Configuration incorporating Distance 1.002 0.248
In [176]:
# Overview visualization: 4-panel metric comparison
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Execution time
sns.boxplot(data=df_analysis, x='configuration_plot_label', y='execution_time_ms', order=plot_configuration_order,
            ax=axes[0, 0], showfliers=False, palette=configuration_plot_palette)
axes[0, 0].set_title('Execution Time', fontsize=13, fontweight='bold')
axes[0, 0].set_xlabel('')
axes[0, 0].set_ylabel('Execution Time (ms)', fontsize=11)
axes[0, 0].tick_params(axis='x', rotation=45)
axes[0, 0].grid(axis='y', alpha=0.3)

# Quality
if 'trustworthiness' in df_analysis:
    sns.boxplot(data=df_analysis, x='configuration_plot_label', y='trustworthiness', order=plot_configuration_order,
                ax=axes[0, 1], showfliers=False, palette=configuration_plot_palette)
    axes[0, 1].set_title('Embedding Quality', fontsize=13, fontweight='bold')
    axes[0, 1].set_xlabel('')
    axes[0, 1].set_ylabel('Trustworthiness', fontsize=11)
    axes[0, 1].tick_params(axis='x', rotation=45)
    axes[0, 1].grid(axis='y', alpha=0.3)

# FPS
if 'fps_avg' in df_analysis:
    sns.boxplot(data=df_analysis, x='configuration_plot_label', y='fps_avg', order=plot_configuration_order,
                ax=axes[1, 0], showfliers=False, palette=configuration_plot_palette)
    axes[1, 0].set_title('Responsiveness (FPS)', fontsize=13, fontweight='bold')
    axes[1, 0].set_xlabel('WASM Configuration', fontsize=12)
    axes[1, 0].set_ylabel('FPS', fontsize=11)
    axes[1, 0].tick_params(axis='x', rotation=45)
    axes[1, 0].grid(axis='y', alpha=0.3)

# Memory
if 'memory_delta_mb' in df_analysis:
    sns.boxplot(data=df_analysis, x='configuration_plot_label', y='memory_delta_mb', order=plot_configuration_order,
                ax=axes[1, 1], showfliers=False, palette=configuration_plot_palette)
    axes[1, 1].set_title('Memory Usage', fontsize=13, fontweight='bold')
    axes[1, 1].set_xlabel('WASM Configuration', fontsize=12)
    axes[1, 1].set_ylabel('Memory Delta (MB)', fontsize=11)
    axes[1, 1].tick_params(axis='x', rotation=45)
    axes[1, 1].grid(axis='y', alpha=0.3)

plt.tight_layout()
save_figure('../outputs/figures/overview_all_metrics.svg', bbox_inches='tight', dpi=200)
plt.show()
No description has been provided for this image

2.5 Baseline Analysis: Pure JavaScript Performance¶

Understanding the baseline performance characteristics before comparing WASM configurations. This section isolates the pure JavaScript implementation to establish reference distributions.

Baseline Takeaways¶

Performance Characteristics:

  • Execution time scales with dataset size (Scope), with larger datasets showing predictably longer execution times
  • Memory consumption is relatively stable across datasets, with modest variation by scope
  • Embedding quality (trustworthiness) remains consistent across different datasets and scopes, indicating reproducible UMAP behavior

Observations:

  • Pure JavaScript UMAP provides stable, predictable performance across test configurations
  • FPS and responsiveness metrics show variability depending on dataset complexity and scope
  • No anomalous outliers in baseline measurements, confirming data quality

This baseline establishes the reference point for evaluating WASM configuration improvements in subsequent sections.

In [177]:
# Execution time (baseline)
baseline_df = df_analysis[df_analysis['configuration_name'] == baseline_label].copy()
plot_scope_order = [scope for scope in scope_order if scope in baseline_df['Scope'].unique()]
plt.figure(figsize=(10, 6))
sns.boxplot(data=baseline_df, x='dataset_name', y='execution_time_ms', hue='Scope',
            hue_order=plot_scope_order, palette=scope_palette)
plt.title('Baseline (JS): Execution Time Distribution', fontsize=13, fontweight='bold')
plt.xlabel('Dataset')
plt.ylabel('Execution Time (ms)')
plt.xticks(rotation=45)
plt.legend(title='Scope', fontsize=9)
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
save_figure('../outputs/figures/baseline_execution_time.svg', bbox_inches='tight', dpi=200)
plt.show()
No description has been provided for this image
In [178]:
# Memory (baseline)
baseline_df = df_analysis[df_analysis['configuration_name'] == baseline_label].copy()
plot_scope_order = [scope for scope in scope_order if scope in baseline_df['Scope'].unique()]
if 'memory_delta_mb' in baseline_df.columns:
    plt.figure(figsize=(10, 6))
    sns.boxplot(data=baseline_df, x='dataset_name', y='memory_delta_mb', hue='Scope',
                hue_order=plot_scope_order, palette=scope_palette)
    plt.title('Baseline (JS): Memory Usage', fontsize=13, fontweight='bold')
    plt.xlabel('Dataset')
    plt.ylabel('Memory Delta (MB)')
    plt.xticks(rotation=45)
    plt.legend(title='Scope', fontsize=9)
    plt.grid(axis='y', alpha=0.3)
    plt.tight_layout()
    save_figure('../outputs/figures/baseline_memory.svg', bbox_inches='tight', dpi=200)
    plt.show()
else:
    print('⚠️ memory_delta_mb not present in baseline_df; memory plot skipped')
No description has been provided for this image
In [179]:
# Quality (baseline)
baseline_df = df_analysis[df_analysis['configuration_name'] == baseline_label].copy()
plot_scope_order = [scope for scope in scope_order if scope in baseline_df['Scope'].unique()]
if 'trustworthiness' in baseline_df.columns:
    plt.figure(figsize=(10, 6))
    sns.boxplot(data=baseline_df, x='dataset_name', y='trustworthiness', hue='Scope',
                hue_order=plot_scope_order, palette=scope_palette)
    plt.title('Baseline (JS): Embedding Quality', fontsize=13, fontweight='bold')
    plt.xlabel('Dataset')
    plt.ylabel('Trustworthiness')
    plt.xticks(rotation=45)
    plt.legend(title='Scope', fontsize=9)
    plt.grid(axis='y', alpha=0.3)
    plt.tight_layout()
    save_figure('../outputs/figures/baseline_quality.svg', bbox_inches='tight', dpi=200)
    plt.show()
else:
    print('⚠️ trustworthiness not present in baseline_df; quality plot skipped')
No description has been provided for this image
In [180]:
# FPS (baseline)
baseline_df = df_analysis[df_analysis['configuration_name'] == baseline_label].copy()
plot_scope_order = [scope for scope in scope_order if scope in baseline_df['Scope'].unique()]
if 'fps_avg' in baseline_df.columns:
    plt.figure(figsize=(10, 6))
    sns.boxplot(data=baseline_df, x='dataset_name', y='fps_avg', hue='Scope',
                hue_order=plot_scope_order, palette=scope_palette)
    plt.title('Baseline (JS): Frame Rate', fontsize=13, fontweight='bold')
    plt.xlabel('Dataset')
    plt.ylabel('FPS')
    plt.xticks(rotation=45)
    plt.axhline(y=60, color=reference_line_color, linestyle='--', linewidth=1, alpha=0.5)
    plt.legend(title='Scope', fontsize=9)
    plt.grid(axis='y', alpha=0.3)
    plt.tight_layout()
    save_figure('../outputs/figures/baseline_fps.svg', bbox_inches='tight', dpi=200)
    plt.show()
else:
    print('⚠️ fps_avg not present in baseline_df; fps plot skipped')
No description has been provided for this image
In [181]:
# Responsiveness (baseline)
baseline_df = df_analysis[df_analysis['configuration_name'] == baseline_label].copy()
plot_scope_order = [scope for scope in scope_order if scope in baseline_df['Scope'].unique()]
if 'responsiveness_ms' in baseline_df.columns:
    plt.figure(figsize=(10, 6))
    sns.boxplot(data=baseline_df, x='dataset_name', y='responsiveness_ms', hue='Scope',
                hue_order=plot_scope_order, palette=scope_palette)
    plt.title('Baseline (JS): Interaction Latency', fontsize=13, fontweight='bold')
    plt.xlabel('Dataset')
    plt.ylabel('Responsiveness (ms)')
    plt.xticks(rotation=45)
    plt.legend(title='Scope', fontsize=9)
    plt.grid(axis='y', alpha=0.3)
    plt.tight_layout()
    save_figure('../outputs/figures/baseline_responsiveness.svg', bbox_inches='tight', dpi=200)
    plt.show()
else:
    print('⚠️ responsiveness_ms not present in baseline_df; responsiveness plot skipped')

print('✓ Baseline distributions visualized (each metric shown in its own output)')
No description has been provided for this image
✓ Baseline distributions visualized (each metric shown in its own output)
In [182]:
# Filter baseline-only data
baseline_df = df_analysis[df_analysis['configuration_name'] == baseline_label].copy()

print(f"Baseline Measurements: {len(baseline_df)} observations")
print(f"Datasets: {baseline_df['dataset_name'].unique()}")
print(f"Scopes: {sorted(baseline_df['Scope'].unique())}")

# Summary statistics for baseline
baseline_summary = baseline_df.groupby(['dataset_name', 'Scope']).agg({
    'execution_time_ms': ['median', 'std'],
    'memory_delta_mb': ['median', 'std'],
    'trustworthiness': ['median', 'std'],
    'fps_avg': ['median', 'std'],
    'responsiveness_ms': ['median', 'std']
}).round(2)

print("\nBaseline Statistics by Dataset and Scope:")
display(baseline_summary.rename(columns={'execution_time_ms': 'Execution Time (ms)'}, level=0))
Baseline Measurements: 60 observations
Datasets: <StringArray>
[       'Iris Dataset (150 points, 4D)',             'Small Random (80 points)',
 'Swiss Roll (600 points, 3D manifold)',        'Medium Clustered (600 points)',
         'MNIST-like (1K points, 784D)',        '3D Dense Clusters (1K points)']
Length: 6, dtype: str
Scopes: ['large', 'mid', 'small']

Baseline Statistics by Dataset and Scope:
Execution Time (ms) memory_delta_mb trustworthiness fps_avg responsiveness_ms
median std median std median std median std median std
dataset_name Scope
3D Dense Clusters (1K points) large 4549.55 115.29 16.53 8.85 1.00 0.00 51.00 0.85 29.01 1.50
Iris Dataset (150 points, 4D) small 2350.15 21.12 1.09 1.46 0.99 0.00 60.00 0.01 8.37 2.67
MNIST-like (1K points, 784D) large 4517.80 70.19 16.05 10.90 0.61 0.00 60.00 0.09 105.68 4.81
Medium Clustered (600 points) mid 3676.10 58.82 12.12 8.48 0.95 0.00 54.05 0.60 22.02 1.17
Small Random (80 points) small 3150.00 173.47 7.92 8.59 0.86 0.01 48.20 1.03 21.03 1.25
Swiss Roll (600 points, 3D manifold) mid 2959.85 27.11 17.98 2.53 0.99 0.00 59.97 0.05 17.36 0.86

3. Execution Time & Speedup¶

Detailed analysis of execution time and speedup relative to baseline.

3.1 Execution Time Distribution by Configuration¶

In [183]:
# Execution time statistics by configuration
execution_time_stats = df_analysis.groupby('configuration_name')['execution_time_ms'].describe()
print("Execution Time Statistics (ms):")
display(execution_time_stats.loc[configuration_order].round(2))

# Baseline metrics
if baseline_label in df_analysis['configuration_name'].values:
    baseline_execution_time = df_analysis[df_analysis['configuration_name'] == baseline_label]['execution_time_ms']
    print(f"\nBaseline (Pure JavaScript):") 
    print(f"  Median: {baseline_execution_time.median():.2f} ms")
    print(f"  Mean: {baseline_execution_time.mean():.2f} ms (±{baseline_execution_time.std():.2f})")
    print(f"  Range: {baseline_execution_time.min():.2f} - {baseline_execution_time.max():.2f} ms")
Execution Time Statistics (ms):
count mean std min 25% 50% 75% max
configuration_name
Baseline (JS) 60.0 3552.00 827.63 2316.5 2959.92 3512.85 4508.85 4790.0
Configuration incorporating Distance 60.0 3610.20 871.61 2345.6 2959.25 3504.15 4533.58 5896.3
Configuration incorporating Tree 60.0 3496.44 788.29 2314.5 2956.05 3478.60 4309.85 4722.8
Configuration incorporating Matrix 60.0 3501.41 811.15 2312.3 2928.30 3415.65 4364.98 4788.4
Configuration incorporating NN Descent 60.0 3603.92 844.06 2366.2 2959.63 3489.85 4530.22 5295.8
Configuration incorporating Optimizer 100.0 2516.78 1248.13 638.1 1601.47 2357.95 3611.50 6834.9
Fully WASM-enabled configuration 100.0 2396.72 1176.99 629.0 1569.50 2237.20 3494.87 4342.0
Baseline (Pure JavaScript):
  Median: 3512.85 ms
  Mean: 3552.00 ms (±827.63)
  Range: 2316.50 - 4790.00 ms
In [184]:
# Execution time distribution visualization
fig, ax = plt.subplots(figsize=(12, 6))

sns.boxplot(data=df_analysis, x='configuration_plot_label', y='execution_time_ms', order=plot_configuration_order, 
            ax=ax, showfliers=False, palette=configuration_plot_palette)
ax.set_title('Execution Time Distribution by WASM Configuration', fontsize=14, fontweight='bold')
ax.set_xlabel('WASM Configuration', fontsize=12, fontweight='bold')
ax.set_ylabel('Execution Time (ms)', fontsize=12, fontweight='bold')
ax.tick_params(axis='x', rotation=45)
ax.grid(axis='y', alpha=0.3)

plt.tight_layout()
save_figure('../outputs/figures/execution_time_distribution.svg', bbox_inches='tight', dpi=200)
plt.show()
No description has been provided for this image

3.2 Speedup Analysis¶

In [185]:
# Calculate detailed speedup metrics
def calculate_speedup(df, baseline='Baseline (JS)'):
    results = []
    data = df
    
    for (dataset, mach), group in data.groupby(['dataset_name', 'machine_type']):
        baseline_data = group[group['configuration_name'] == baseline]['execution_time_ms']
        if len(baseline_data) == 0:
            continue
        baseline_median = baseline_data.median()
        
        for configuration in group['configuration_name'].unique():
            if configuration == baseline:
                continue
            configuration_data = group[group['configuration_name'] == configuration]
            if len(configuration_data) == 0:
                continue
            
            configuration_median = configuration_data['execution_time_ms'].median()
            speedup = baseline_median / configuration_median
            
            results.append({
                'dataset': dataset,
                'machine': mach,
                'configuration': configuration,
                'baseline_median_ms': baseline_median,
                'configuration_median_ms': configuration_median,
                'speedup': speedup,
                'improvement_pct': (speedup - 1) * 100
            })
    
    return pd.DataFrame(results)

speedup_df = calculate_speedup(df_analysis)

# Aggregate speedup statistics
speedup_summary = speedup_df.groupby('configuration').agg({
    'speedup': ['mean', 'median', 'std', 'min', 'max'],
    'improvement_pct': ['mean', 'median']
}).round(3)

print("Speedup Summary (vs Baseline):")
display(speedup_summary)
Speedup Summary (vs Baseline):
speedup improvement_pct
mean median std min max mean median
configuration
Fully WASM-enabled configuration 2.263 1.743 1.218 1.177 4.038 126.309 74.320
Configuration incorporating Distance 0.989 0.994 0.022 0.947 1.008 -1.103 -0.556
Configuration incorporating Matrix 1.014 1.015 0.007 1.004 1.023 1.370 1.508
Configuration incorporating NN Descent 0.985 0.991 0.026 0.941 1.014 -1.455 -0.862
Configuration incorporating Optimizer 2.111 1.662 1.046 1.171 3.461 111.050 66.200
Configuration incorporating Tree 1.017 1.008 0.023 0.997 1.052 1.666 0.817
In [186]:
# Speedup visualization
fig, ax = plt.subplots(figsize=(10, 6))

# Calculate median speedup for each configuration
configuration_speedups = speedup_df.groupby('configuration')['speedup'].median().sort_values(ascending=False)
colors = [configuration_colors.get(configuration, '#9d9d9d') for configuration in configuration_speedups.index]

bars = ax.barh([configuration_plot_label(configuration) for configuration in configuration_speedups.index], configuration_speedups.values, color=colors, alpha=0.8)

# Add value labels
for bar in bars:
    width = bar.get_width()
    ax.text(width, bar.get_y() + bar.get_height()/2., f'{width:.2f}x',
           ha='left', va='center', fontsize=10, fontweight='bold',
           bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8))

ax.axvline(x=1.0, color=reference_line_color, linestyle='--', linewidth=2, label='Baseline (JS) (1.0x)', alpha=0.7)
ax.set_xlabel('Speedup (vs Baseline (JS))', fontsize=12, fontweight='bold')
ax.set_title('Median Speedup by WASM Configuration', fontsize=14, fontweight='bold')
ax.legend(fontsize=10)
ax.grid(axis='x', alpha=0.3)

plt.tight_layout()
save_figure('../outputs/figures/speedup_analysis.svg', bbox_inches='tight', dpi=200)
plt.show()

print("\nInterpretation:")
print("  >1.0x = Faster than baseline (performance improvement)")
print("  <1.0x = Slower than baseline (performance regression)")
No description has been provided for this image
Interpretation:
  >1.0x = Faster than baseline (performance improvement)
  <1.0x = Slower than baseline (performance regression)

4. Memory Usage¶

Analysis of memory consumption patterns across configurations.

In [187]:
# Memory statistics by configuration
if 'memory_delta_mb' in df_analysis:
    memory_stats = df_analysis.groupby('configuration_name')['memory_delta_mb'].describe()
    print("Memory Usage Statistics (MB):")
    display(memory_stats.loc[configuration_order].round(2))
    
    # Baseline comparison
    if baseline_label in df_analysis['configuration_name'].values:
        baseline_mem = df_analysis[df_analysis['configuration_name'] == baseline_label]['memory_delta_mb']
        print(f"\nBaseline Memory Usage:")
        print(f"  Median: {baseline_mem.median():.2f} MB")
        print(f"  Mean: {baseline_mem.mean():.2f} MB (±{baseline_mem.std():.2f})")
Memory Usage Statistics (MB):
count mean std min 25% 50% 75% max
configuration_name
Baseline (JS) 60.0 13.00 9.45 -5.31 4.52 14.48 19.83 39.47
Configuration incorporating Distance 60.0 11.69 9.50 -8.78 4.83 12.28 18.71 42.08
Configuration incorporating Tree 60.0 14.37 15.58 -18.60 4.21 9.78 19.78 60.22
Configuration incorporating Matrix 60.0 11.78 10.64 -17.46 4.50 10.72 16.82 40.21
Configuration incorporating NN Descent 60.0 17.98 11.25 -8.42 9.03 16.87 26.68 47.45
Configuration incorporating Optimizer 100.0 8.42 7.70 -4.81 1.80 6.29 14.49 26.41
Fully WASM-enabled configuration 100.0 13.22 12.10 -11.73 4.63 12.67 21.13 36.28
Baseline Memory Usage:
  Median: 14.48 MB
  Mean: 13.00 MB (±9.45)
In [188]:
# Memory usage visualization
if 'memory_delta_mb' in df_analysis:
    fig, ax = plt.subplots(figsize=(12, 6))
    
    sns.boxplot(data=df_analysis, x='configuration_plot_label', y='memory_delta_mb', order=plot_configuration_order,
                ax=ax, showfliers=False, palette=configuration_plot_palette)
    ax.set_title('Memory Usage by WASM Configuration', fontsize=14, fontweight='bold')
    ax.set_xlabel('WASM Configuration', fontsize=12, fontweight='bold')
    ax.set_ylabel('Memory Delta (MB)', fontsize=12, fontweight='bold')
    ax.tick_params(axis='x', rotation=45)
    ax.grid(axis='y', alpha=0.3)
    
    plt.tight_layout()
    save_figure('../outputs/figures/memory_usage.svg', bbox_inches='tight', dpi=200)
    plt.show()
No description has been provided for this image

5. Embedding Quality (Trustworthiness)¶

Analysis of UMAP embedding quality across configurations.

In [189]:
# Quality statistics by configuration
if 'trustworthiness' in df_analysis:
    quality_stats = df_analysis.groupby('configuration_name')['trustworthiness'].describe()
    print("Embedding Quality (Trustworthiness):")
    display(quality_stats.loc[configuration_order].round(4))
    
    # Check if quality is preserved
    baseline_quality = df_analysis[df_analysis['configuration_name'] == baseline_label]['trustworthiness'].median()
    print(f"\nBaseline Quality: {baseline_quality:.4f}")
    
    for configuration in configuration_order:
        if configuration == baseline_label:
            continue
        configuration_quality = df_analysis[df_analysis['configuration_name'] == configuration]['trustworthiness'].median()
        diff = configuration_quality - baseline_quality
        pct_diff = (diff / baseline_quality) * 100
        status = "✓" if abs(pct_diff) < 1 else ("↑" if diff > 0 else "↓")
        print(f"  {configuration}: {configuration_quality:.4f} ({pct_diff:+.2f}%) {status}")
Embedding Quality (Trustworthiness):
count mean std min 25% 50% 75% max
configuration_name
Baseline (JS) 60.0 0.900 0.139 0.601 0.863 0.969 0.994 0.998
Configuration incorporating Distance 60.0 0.901 0.139 0.602 0.869 0.969 0.994 0.997
Configuration incorporating Tree 60.0 0.901 0.139 0.606 0.865 0.970 0.994 0.997
Configuration incorporating Matrix 60.0 0.900 0.141 0.601 0.865 0.970 0.994 0.997
Configuration incorporating NN Descent 60.0 0.899 0.142 0.599 0.860 0.970 0.994 0.997
Configuration incorporating Optimizer 100.0 0.883 0.167 0.553 0.845 0.967 0.994 0.997
Fully WASM-enabled configuration 100.0 0.884 0.165 0.562 0.840 0.968 0.994 0.997
Baseline Quality: 0.9688
  Configuration incorporating Distance: 0.9691 (+0.03%) ✓
  Configuration incorporating Tree: 0.9701 (+0.14%) ✓
  Configuration incorporating Matrix: 0.9696 (+0.09%) ✓
  Configuration incorporating NN Descent: 0.9698 (+0.11%) ✓
  Configuration incorporating Optimizer: 0.9666 (-0.22%) ✓
  Fully WASM-enabled configuration: 0.9675 (-0.13%) ✓
In [190]:
# Quality distribution visualization
if 'trustworthiness' in df_analysis:
    fig, ax = plt.subplots(figsize=(12, 6))
    
    sns.boxplot(data=df_analysis, x='configuration_plot_label', y='trustworthiness', order=plot_configuration_order,
                ax=ax, showfliers=False, palette=configuration_plot_palette)
    ax.set_title('Embedding Quality by WASM Configuration', fontsize=14, fontweight='bold')
    ax.set_xlabel('WASM Configuration', fontsize=12, fontweight='bold')
    ax.set_ylabel('Trustworthiness', fontsize=12, fontweight='bold')
    ax.tick_params(axis='x', rotation=45)
    ax.grid(axis='y', alpha=0.3)
    
    plt.tight_layout()
    save_figure('../outputs/figures/quality_analysis.svg', bbox_inches='tight', dpi=200)
    plt.show()
No description has been provided for this image

6. Responsiveness & UX Metrics¶

6.1 Frame Rate (FPS)¶

FPS statistics by configuration:

In [191]:
# FPS statistics by configuration
if 'fps_avg' in df_analysis:
    fps_stats = df_analysis.groupby('configuration_name')['fps_avg'].describe()
    print("FPS Statistics:")
    display(fps_stats.loc[configuration_order].round(2))
    
    # Baseline comparison
    baseline_fps = df_analysis[df_analysis['configuration_name'] == baseline_label]['fps_avg'].median()
    print(f"\nBaseline FPS: {baseline_fps:.2f}")
    
    for configuration in configuration_order:
        if configuration == baseline_label:
            continue
        configuration_fps = df_analysis[df_analysis['configuration_name'] == configuration]['fps_avg'].median()
        diff = configuration_fps - baseline_fps
        pct_diff = (diff / baseline_fps) * 100
        print(f"  {configuration}: {configuration_fps:.2f} FPS ({pct_diff:+.2f}%)")
FPS Statistics:
count mean std min 25% 50% 75% max
configuration_name
Baseline (JS) 60.0 55.43 4.96 45.99 51.00 57.46 60.00 60.14
Configuration incorporating Distance 60.0 55.15 5.43 38.43 50.15 57.46 60.00 60.13
Configuration incorporating Tree 60.0 55.30 5.14 44.99 51.05 56.98 60.00 60.13
Configuration incorporating Matrix 60.0 55.55 4.78 47.00 50.75 57.51 59.99 60.15
Configuration incorporating NN Descent 60.0 55.31 5.06 45.99 50.24 57.01 59.99 60.13
Configuration incorporating Optimizer 100.0 34.47 16.73 0.00 32.95 39.66 46.34 51.09
Fully WASM-enabled configuration 100.0 33.73 17.24 0.00 34.81 38.26 47.58 50.76
Baseline FPS: 57.46
  Configuration incorporating Distance: 57.46 FPS (-0.00%)
  Configuration incorporating Tree: 56.98 FPS (-0.84%)
  Configuration incorporating Matrix: 57.51 FPS (+0.08%)
  Configuration incorporating NN Descent: 57.01 FPS (-0.80%)
  Configuration incorporating Optimizer: 39.66 FPS (-30.98%)
  Fully WASM-enabled configuration: 38.26 FPS (-33.41%)
In [192]:
# FPS visualization
if 'fps_avg' in df_analysis:
    fig, ax = plt.subplots(figsize=(12, 6))
    
    sns.boxplot(data=df_analysis, x='configuration_plot_label', y='fps_avg', order=plot_configuration_order,
                ax=ax, showfliers=False, palette=configuration_plot_palette)
    ax.set_title('FPS Distribution by WASM Configuration', fontsize=14, fontweight='bold')
    ax.set_xlabel('WASM Configuration', fontsize=12, fontweight='bold')
    ax.set_ylabel('FPS (avg)', fontsize=12, fontweight='bold')
    ax.tick_params(axis='x', rotation=45)
    ax.grid(axis='y', alpha=0.3)
    
    # Add 60 FPS reference line
    ax.axhline(y=60, color=reference_line_color, linestyle='--', linewidth=2, alpha=0.7, label='60 FPS target')
    ax.legend()
    
    plt.tight_layout()
    save_figure('../outputs/figures/fps_analysis.svg', bbox_inches='tight', dpi=200)
    plt.show()
No description has been provided for this image

6.2 Interaction Latency (Responsiveness)¶

In [193]:
# Responsiveness statistics by configuration
if 'responsiveness_ms' in df_analysis:
    resp_stats = df_analysis.groupby('configuration_name')['responsiveness_ms'].describe()
    print("Responsiveness Statistics (ms):")
    display(resp_stats.loc[configuration_order].round(2))
    
    # Baseline comparison
    baseline_resp = df_analysis[df_analysis['configuration_name'] == baseline_label]['responsiveness_ms'].median()
    print(f"\nBaseline Responsiveness: {baseline_resp:.2f} ms")
    
    for configuration in configuration_order:
        if configuration == baseline_label:
            continue
        configuration_resp = df_analysis[df_analysis['configuration_name'] == configuration]['responsiveness_ms'].median()
        diff = configuration_resp - baseline_resp
        pct_diff = (diff / baseline_resp) * 100
        status = "✓" if diff < 0 else "↑"
        print(f"  {configuration}: {configuration_resp:.2f} ms ({pct_diff:+.2f}%) {status}")
    
    print("\nNote: Lower responsiveness = Better (less latency)")
Responsiveness Statistics (ms):
count mean std min 25% 50% 75% max
configuration_name
Baseline (JS) 60.0 34.20 33.61 0.00 17.39 21.49 28.97 116.88
Configuration incorporating Distance 60.0 35.15 34.68 7.77 18.16 21.53 30.07 130.15
Configuration incorporating Tree 60.0 32.76 30.85 0.00 17.53 20.82 29.72 104.05
Configuration incorporating Matrix 60.0 31.79 32.98 0.00 15.56 19.52 27.12 111.88
Configuration incorporating NN Descent 60.0 34.70 35.18 8.14 16.90 20.43 28.10 121.46
Configuration incorporating Optimizer 100.0 42.05 39.07 9.09 19.27 24.64 34.21 143.05
Fully WASM-enabled configuration 100.0 36.71 35.39 9.12 15.92 20.93 27.68 128.20
Baseline Responsiveness: 21.49 ms
  Configuration incorporating Distance: 21.53 ms (+0.20%) ↑
  Configuration incorporating Tree: 20.82 ms (-3.12%) ✓
  Configuration incorporating Matrix: 19.52 ms (-9.17%) ✓
  Configuration incorporating NN Descent: 20.43 ms (-4.93%) ✓
  Configuration incorporating Optimizer: 24.64 ms (+14.66%) ↑
  Fully WASM-enabled configuration: 20.93 ms (-2.58%) ✓

Note: Lower responsiveness = Better (less latency)
In [194]:
# Responsiveness visualization
if 'responsiveness_ms' in df_analysis:
    fig, ax = plt.subplots(figsize=(12, 6))
    
    sns.boxplot(data=df_analysis, x='configuration_plot_label', y='responsiveness_ms', order=plot_configuration_order,
                ax=ax, showfliers=False, palette=configuration_plot_palette)
    ax.set_title('Interaction Latency by WASM Configuration', fontsize=14, fontweight='bold')
    ax.set_xlabel('WASM Configuration', fontsize=12, fontweight='bold')
    ax.set_ylabel('Responsiveness (ms)', fontsize=12, fontweight='bold')
    ax.tick_params(axis='x', rotation=45)
    ax.grid(axis='y', alpha=0.3)
    
    plt.tight_layout()
    save_figure('../outputs/figures/responsiveness_analysis.svg', bbox_inches='tight', dpi=200)
    plt.show()
No description has been provided for this image

UX Impact Summary:

  • p50 < 50ms: Imperceptible latency, feels instant
  • p50 50-100ms: Noticeable but acceptable for interactive tasks
  • p95 < 100ms: Smooth experience for 95% of users (RAIL guidelines)
  • p95 > 200ms: Degraded UX, users perceive sluggishness

Low p95/p50 ratios indicate predictable performance, critical for user trust in interactive visualizations.

In [23]:
# Recompute percentile_df for downstream visualization so reruns cannot reuse stale state
if 'responsiveness_ms' in df_analysis:
    percentile_results = []
    
    for configuration in configuration_order:
        configuration_data = df_analysis[df_analysis['configuration_name'] == configuration]['responsiveness_ms'].dropna()
        if len(configuration_data) == 0:
            continue
        
        p50 = configuration_data.median()
        p95 = configuration_data.quantile(0.95)
        p99 = configuration_data.quantile(0.99)
        
        percentile_results.append({
            'configuration': configuration,
            'p50_median': p50,
            'p95': p95,
            'p99': p99,
            'p95_p50_ratio': p95 / p50 if p50 > 0 else float('inf')
        })
    
    percentile_df = pd.DataFrame(percentile_results)
else:
    percentile_df = pd.DataFrame()

# Visualize p50/p95 percentiles
if 'responsiveness_ms' in df_analysis and len(percentile_df) > 0:
    fig, axes = plt.subplots(1, 2, figsize=(16, 6))
    
    # Side-by-side p50 and p95 comparison
    x_pos = np.arange(len(percentile_df))
    width = 0.35
    
    p50_colors = [configuration_colors.get(configuration, '#9d9d9d') for configuration in percentile_df['configuration']]
    p95_colors = [configuration_colors.get(configuration, '#9d9d9d') for configuration in percentile_df['configuration']]
    bars1 = axes[0].bar(x_pos - width/2, percentile_df['p50_median'], width, 
                        label='p50 (Median)', alpha=0.55, color=p50_colors,
                        edgecolor=reference_line_color, linewidth=0.6)
    bars2 = axes[0].bar(x_pos + width/2, percentile_df['p95'], width,
                        label='p95', alpha=0.9, color=p95_colors, hatch='//',
                        edgecolor=reference_line_color, linewidth=0.6)
    
    axes[0].set_xlabel('WASM Configuration', fontsize=12, fontweight='bold')
    axes[0].set_ylabel('Latency (ms)', fontsize=12, fontweight='bold')
    axes[0].set_title('Latency: p50 vs p95 by Configuration', fontsize=14, fontweight='bold')
    axes[0].set_xticks(x_pos)
    axes[0].set_xticklabels([configuration_plot_label(configuration) for configuration in percentile_df['configuration']], rotation=45, ha='right')
    for label in axes[0].get_xticklabels():
        label.set_color(configuration_plot_palette.get(label.get_text(), reference_line_color))
    axes[0].legend(fontsize=11)
    axes[0].grid(axis='y', alpha=0.3)
    axes[0].axhline(y=100, color=reference_line_color, linestyle='--', linewidth=2, alpha=0.5, 
                    label='100ms threshold')
    
    # Add value labels on bars
    for bar in bars1:
        height = bar.get_height()
        axes[0].text(bar.get_x() + bar.get_width()/2., height,
                    f'{height:.1f}', ha='center', va='bottom', fontsize=8)
    for bar in bars2:
        height = bar.get_height()
        axes[0].text(bar.get_x() + bar.get_width()/2., height,
                    f'{height:.1f}', ha='center', va='bottom', fontsize=8)
    
    # p95/p50 ratio (consistency)
    percentile_df_sorted = percentile_df.sort_values('p95_p50_ratio')
    colors_ratio = [configuration_colors.get(configuration, '#9d9d9d') for configuration in percentile_df_sorted['configuration']]
    
    bars_ratio = axes[1].barh([configuration_plot_label(configuration) for configuration in percentile_df_sorted['configuration']], 
                               percentile_df_sorted['p95_p50_ratio'],
                               color=colors_ratio, alpha=0.85,
                               edgecolor=reference_line_color, linewidth=0.6)
    for label in axes[1].get_yticklabels():
        label.set_color(configuration_plot_palette.get(label.get_text(), reference_line_color))
    
    # Add value labels
    for bar in bars_ratio:
        width = bar.get_width()
        axes[1].text(width, bar.get_y() + bar.get_height()/2., f'{width:.2f}',
                    ha='left', va='center', fontsize=10, fontweight='bold',
                    bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8))
    
    axes[1].axvline(x=1.5, color=reference_line_color, linestyle='--', linewidth=1, alpha=0.4)
    axes[1].axvline(x=2.0, color=reference_line_color, linestyle='--', linewidth=1, alpha=0.5)
    axes[1].set_xlabel('p95/p50 Ratio', fontsize=12, fontweight='bold')
    axes[1].set_title('Latency Consistency (Lower = More Predictable)', fontsize=14, fontweight='bold')
    axes[1].grid(axis='x', alpha=0.3)
    
    plt.tight_layout()
    save_figure('../outputs/figures/latency_percentiles.svg', bbox_inches='tight', dpi=200)
    plt.show()
    
    print("\nConsistency Assessment:")
    print("  Bars use the shared configuration colors used throughout the notebook")
    print("  Lower p95/p50 ratio = more predictable latency")
    print("  Dashed reference lines mark 1.5x and 2.0x p95/p50 ratios")
No description has been provided for this image
Consistency Assessment:
  Bars use the shared configuration colors used throughout the notebook
  Lower p95/p50 ratio = more predictable latency
  Dashed reference lines mark 1.5x and 2.0x p95/p50 ratios
In [196]:
# Calculate p50 and p95 latency percentiles
if 'responsiveness_ms' in df_analysis:
    percentile_results = []
    
    for configuration in configuration_order:
        configuration_data = df_analysis[df_analysis['configuration_name'] == configuration]['responsiveness_ms'].dropna()
        if len(configuration_data) == 0:
            continue
        
        p50 = configuration_data.median()
        p95 = configuration_data.quantile(0.95)
        p99 = configuration_data.quantile(0.99)
        
        percentile_results.append({
            'configuration': configuration,
            'p50_median': p50,
            'p95': p95,
            'p99': p99,
            'p95_p50_ratio': p95 / p50 if p50 > 0 else float('inf')
        })
    
    percentile_df = pd.DataFrame(percentile_results)
    
    print("Latency Percentiles (ms):")
    print("="*80)
    display(percentile_df.round(2))
    
    print("\nInterpretation:")
    print("  p50 (median): Typical user experience")
    print("  p95: 95% of interactions complete within this time (worst-case threshold)")
    print("  p95/p50 ratio: Consistency indicator (lower = more consistent)")
    print("  Ideal p95 < 100ms for smooth interactive experience")
Latency Percentiles (ms):
================================================================================
configuration p50_median p95 p99 p95_p50_ratio
0 Baseline (JS) 21.49 107.82 114.66 5.02
1 Configuration incorporating Distance 21.53 109.97 118.35 5.11
2 Configuration incorporating Tree 20.82 101.23 103.81 4.86
3 Configuration incorporating Matrix 19.52 102.17 109.53 5.23
4 Configuration incorporating NN Descent 20.43 113.92 119.76 5.58
5 Configuration incorporating Optimizer 24.64 122.73 141.43 4.98
6 Fully WASM-enabled configuration 20.93 107.96 120.43 5.16
Interpretation:
  p50 (median): Typical user experience
  p95: 95% of interactions complete within this time (worst-case threshold)
  p95/p50 ratio: Consistency indicator (lower = more consistent)
  Ideal p95 < 100ms for smooth interactive experience

Latency Percentiles (p50/p95)¶

Percentile analysis provides insight into typical (p50/median) and worst-case (p95) user experience. For interactive applications, p95 latency is critical as it represents the experience for 95% of interactions.

7. Dataset Size Effects¶

How each metric scales with dataset size.

In [197]:
# Prepare dataset size analysis
df_analysis['dataset_size'] = pd.to_numeric(df_analysis['dataset_size'], errors='coerce')

# Create size categories
df_analysis['size_category'] = pd.cut(
    df_analysis['dataset_size'],
    bins=[0, 200, 800, float('inf')],
    labels=['Small (≤200)', 'Medium (200-800)', 'Large (>800)']
)

print("Dataset Size Distribution:")
print(df_analysis.groupby('dataset_name')['dataset_size'].first().sort_values())
print(f"\nSize category counts:")
print(df_analysis['size_category'].value_counts().sort_index())
Dataset Size Distribution:
dataset_name
Small Random (80 points)                  80
Iris Dataset (150 points, 4D)            150
Medium Clustered (600 points)            600
Swiss Roll (600 points, 3D manifold)     600
3D Dense Clusters (1K points)           1000
MNIST-like (1K points, 784D)            1000
Name: dataset_size, dtype: int64

Size category counts:
size_category
Small (≤200)        140
Medium (200-800)    180
Large (>800)        180
Name: count, dtype: int64

Execution Time Scaling¶

In [26]:
# Execution time scaling with dataset size
fig, ax = plt.subplots(figsize=(12, 7))

for configuration in configuration_order:
    configuration_data = df_analysis[df_analysis['configuration_name'] == configuration]
    if len(configuration_data) == 0:
        continue
    size_execution_time = configuration_data.groupby('dataset_size')['execution_time_ms'].median().sort_index()
    
    sizes = size_execution_time.index.values
    execution_times = size_execution_time.values
    
    color = configuration_colors.get(configuration, '#9d9d9d')

    # Observed measurements are shown as markers; the solid line is the smoothed observed trend.
    if len(sizes) >= 3:
        try:
            # Polynomial fit in log-space for smooth curves
            log_sizes = np.log10(sizes)
            log_execution_times = np.log10(execution_times)
            
            # Use degree 2 polynomial for better fit
            poly_degree = min(2, len(sizes) - 1)
            poly_coeffs = np.polyfit(log_sizes, log_execution_times, poly_degree)
            poly_func = np.poly1d(poly_coeffs)
            
            # Generate smooth curve
            log_sizes_smooth = np.linspace(log_sizes.min(), log_sizes.max(), 100)
            sizes_smooth = 10 ** log_sizes_smooth
            log_execution_times_smooth = poly_func(log_sizes_smooth)
            execution_times_smooth = 10 ** log_execution_times_smooth
            
            ax.plot(sizes_smooth, execution_times_smooth, linestyle=observed_linestyle,
                    linewidth=observed_linewidth, alpha=observed_alpha, color=color,
                    label=configuration_plot_label(configuration))
            ax.scatter(sizes, execution_times, s=observed_markersize ** 2,
                       marker=observed_marker, alpha=observed_alpha, color=color, zorder=3)
        except:
            # Fall back to a direct observed line if smoothing fails.
            ax.plot(sizes, execution_times, marker=observed_marker, linestyle=observed_linestyle,
                    linewidth=observed_linewidth, markersize=observed_markersize,
                    label=configuration_plot_label(configuration), alpha=observed_alpha,
                    zorder=3, color=color)
    else:
        ax.plot(sizes, execution_times, marker=observed_marker, linestyle=observed_linestyle,
                linewidth=observed_linewidth, markersize=observed_markersize,
                label=configuration_plot_label(configuration), alpha=observed_alpha,
                zorder=3, color=color)

ax.set_xlabel('Dataset Size (samples)', fontsize=12, fontweight='bold')
ax.set_ylabel('Median Execution Time (ms)', fontsize=12, fontweight='bold')
ax.set_title('Execution Time Scaling by Dataset Size (Smoothed)', fontsize=14, fontweight='bold')
ax.legend(fontsize=10)
ax.grid(alpha=0.3)
ax.set_xscale('log')
ax.set_yscale('log')

plt.tight_layout()
save_figure('../outputs/figures/execution_time_scaling.svg', bbox_inches='tight', dpi=200)
plt.show()
No description has been provided for this image

7.1 Execution Time Scaling Prediction¶

Based on the observed log-log linear relationship between dataset size and execution time, we can extrapolate performance for larger datasets. This prediction assumes:

  • Computational complexity remains consistent (UMAP's approximate O(n log n) to O(n^1.3) complexity)
  • No fundamental algorithm changes at scale
  • Similar hardware constraints (memory, CPU cache behavior)

The following analysis fits power-law models to predict execution time for datasets beyond our test range (up to 10,000 samples).

In [199]:
# Predict execution time scaling using polynomial fits in log-space
# This allows the curve to follow data more closely than a simple power-law

# Prepare data for fitting
execution_time_predictions = {}
predicted_sizes = np.array([2000, 5000, 10000])  # Extrapolate to larger datasets

fig, ax = plt.subplots(figsize=(14, 8))
used_configurations = []

for configuration in configuration_order:
    configuration_data = df_analysis[df_analysis['configuration_name'] == configuration]
    if len(configuration_data) == 0:
        continue
    
    # Get observed data
    size_execution_time = configuration_data.groupby('dataset_size')['execution_time_ms'].median().sort_index()
    if len(size_execution_time) < 3:  # Need at least 3 points to fit
        continue
    
    sizes = size_execution_time.index.values
    execution_times = size_execution_time.values
    
    color = configuration_colors.get(configuration, '#9d9d9d')

    # Fit polynomial in log-space for better flexibility
    try:
        log_sizes = np.log10(sizes)
        log_execution_times = np.log10(execution_times)
        
        # Use degree 2 polynomial if we have enough points, otherwise degree 1
        poly_degree = min(2, len(sizes) - 1)
        poly_coeffs = np.polyfit(log_sizes, log_execution_times, poly_degree)
        poly_func = np.poly1d(poly_coeffs)
        
        # Predict for larger sizes
        log_predicted_sizes = np.log10(predicted_sizes)
        log_predicted_execution_times = poly_func(log_predicted_sizes)
        predicted_execution_times = 10 ** log_predicted_execution_times
        
        # Calculate approximate exponent at the last observed point
        derivative = np.polyder(poly_func)
        effective_exponent = derivative(log_sizes[-1])
        
        execution_time_predictions[configuration] = {
            'exponent': effective_exponent,
            'poly_coeffs': poly_coeffs.tolist(),
            'predictions': dict(zip(predicted_sizes, predicted_execution_times))
        }
        used_configurations.append(configuration)
        
        # Observed measurements
        ax.plot(sizes, execution_times, marker=observed_marker, linestyle=observed_linestyle,
                linewidth=observed_linewidth, markersize=observed_markersize,
                alpha=observed_alpha, color=color, zorder=3)
        
        # Fitted curve and extrapolation
        all_log_sizes = np.linspace(np.log10(sizes.min()), np.log10(10000), 100)
        all_sizes = 10 ** all_log_sizes
        fitted_log_execution_times = poly_func(all_log_sizes)
        fitted_execution_times = 10 ** fitted_log_execution_times
        ax.plot(all_sizes, fitted_execution_times, linestyle=fitted_linestyle,
                linewidth=fitted_linewidth, alpha=fitted_alpha, color=color)
        
        # Predicted points beyond the observed range
        ax.scatter(predicted_sizes, predicted_execution_times, s=predicted_marker_size,
                   marker=predicted_marker, linewidth=2.5, alpha=predicted_alpha,
                   color=color, zorder=4)
        
    except Exception as e:
        print(f"Warning: Could not fit {configuration}: {e}")
        continue

ax.set_xlabel('Dataset Size (samples)', fontsize=12, fontweight='bold')
ax.set_ylabel('Predicted Execution Time (ms)', fontsize=12, fontweight='bold')
ax.set_title('Execution Time Scaling Prediction (Polynomial Fit in Log-Space)', fontsize=14, fontweight='bold')
ax.set_xscale('log')
ax.set_yscale('log')
ax.grid(alpha=0.3, which='both')
add_scaling_legends(ax, used_configurations, include_predicted_points=True,
                    config_loc='upper left', style_loc='lower right', config_ncol=2)

plt.tight_layout()
save_figure('../outputs/figures/execution_time_prediction.svg', bbox_inches='tight', dpi=200)
plt.show()

# Display prediction table
print("\n" + "="*100)
print("EXECUTION TIME SCALING PREDICTIONS")
print("="*100)
print()

for configuration, pred_data in execution_time_predictions.items():
    print(f"📊 {configuration}:")
    print(f"   Effective exponent at largest observed size: {pred_data['exponent']:.3f}")
    print(f"   Predicted execution_times:")
    for size, execution_time in pred_data['predictions'].items():
        print(f"      - {size:,} samples: {execution_time:,.1f} ms ({execution_time/1000:.1f}s)")
    print()
No description has been provided for this image
====================================================================================================
EXECUTION TIME SCALING PREDICTIONS
====================================================================================================

📊 Baseline (JS):
   Effective exponent at largest observed size: 0.883
   Predicted execution_times:
      - 2,000 samples: 9,823.1 ms (9.8s)
      - 5,000 samples: 40,307.4 ms (40.3s)
      - 10,000 samples: 161,312.3 ms (161.3s)

📊 Configuration incorporating Distance:
   Effective exponent at largest observed size: 0.917
   Predicted execution_times:
      - 2,000 samples: 10,269.5 ms (10.3s)
      - 5,000 samples: 45,387.4 ms (45.4s)
      - 10,000 samples: 196,495.9 ms (196.5s)

📊 Configuration incorporating Tree:
   Effective exponent at largest observed size: 0.785
   Predicted execution_times:
      - 2,000 samples: 8,698.9 ms (8.7s)
      - 5,000 samples: 29,909.2 ms (29.9s)
      - 10,000 samples: 99,986.0 ms (100.0s)

📊 Configuration incorporating Matrix:
   Effective exponent at largest observed size: 0.864
   Predicted execution_times:
      - 2,000 samples: 9,467.0 ms (9.5s)
      - 5,000 samples: 37,688.3 ms (37.7s)
      - 10,000 samples: 146,400.9 ms (146.4s)

📊 Configuration incorporating NN Descent:
   Effective exponent at largest observed size: 0.935
   Predicted execution_times:
      - 2,000 samples: 10,444.5 ms (10.4s)
      - 5,000 samples: 47,774.7 ms (47.8s)
      - 10,000 samples: 214,314.7 ms (214.3s)

📊 Configuration incorporating Optimizer:
   Effective exponent at largest observed size: 1.743
   Predicted execution_times:
      - 2,000 samples: 16,475.2 ms (16.5s)
      - 5,000 samples: 211,762.0 ms (211.8s)
      - 10,000 samples: 2,423,859.8 ms (2423.9s)

📊 Fully WASM-enabled configuration:
   Effective exponent at largest observed size: 1.710
   Predicted execution_times:
      - 2,000 samples: 15,283.0 ms (15.3s)
      - 5,000 samples: 179,665.9 ms (179.7s)
      - 10,000 samples: 1,862,995.6 ms (1863.0s)

Speedup by Dataset Size¶

In [28]:
# Calculate speedup by size
def calculate_speedup_by_size(df, baseline='Baseline (JS)'):
    results = []
    for (size, machine), group in df.groupby(['dataset_size', 'machine_type']):
        baseline_data = group[group['configuration_name'] == baseline]['execution_time_ms']
        if len(baseline_data) == 0:
            continue
        baseline_median = baseline_data.median()
        
        for configuration in group['configuration_name'].unique():
            if configuration == baseline:
                continue
            configuration_data = group[group['configuration_name'] == configuration]
            if len(configuration_data) == 0:
                continue
            configuration_median = configuration_data['execution_time_ms'].median()
            speedup = baseline_median / configuration_median
            
            results.append({
                'dataset_size': size,
                'configuration': configuration,
                'speedup': speedup
            })
    return pd.DataFrame(results)

speedup_by_size = calculate_speedup_by_size(df_analysis)

# Visualize speedup trends with smooth curves
fig, ax = plt.subplots(figsize=(12, 7))

for configuration in [f for f in configuration_order if f in speedup_by_size['configuration'].unique()]:
    color = configuration_colors.get(configuration, '#9d9d9d')
    configuration_data = speedup_by_size[speedup_by_size['configuration'] == configuration].sort_values('dataset_size')
    if len(configuration_data) == 0:
        continue
    
    sizes = configuration_data['dataset_size'].values
    speedups = configuration_data['speedup'].values
    
    # Observed measurements are shown as markers; the solid line is the smoothed observed trend.
    if len(configuration_data) >= 3:
        try:
            # Polynomial fit in log-space for x-axis (since it's log scale)
            log_sizes = np.log10(sizes)
            
            # Use degree 2 polynomial for smoothness
            poly_degree = min(2, len(configuration_data) - 1)
            poly_coeffs = np.polyfit(log_sizes, speedups, poly_degree)
            poly_func = np.poly1d(poly_coeffs)
            
            # Generate smooth curve
            log_sizes_smooth = np.linspace(log_sizes.min(), log_sizes.max(), 100)
            sizes_smooth = 10 ** log_sizes_smooth
            speedups_smooth = poly_func(log_sizes_smooth)
            
            ax.plot(sizes_smooth, speedups_smooth, linestyle=observed_linestyle,
                    linewidth=observed_linewidth, alpha=observed_alpha, color=color,
                    label=configuration_plot_label(configuration))
            ax.scatter(sizes, speedups, s=observed_markersize ** 2,
                       marker=observed_marker, alpha=observed_alpha, color=color, zorder=3)
        except:
            # Fall back to a direct observed line if smoothing fails.
            ax.plot(sizes, speedups, marker=observed_marker, linestyle=observed_linestyle,
                    linewidth=observed_linewidth, markersize=observed_markersize,
                    label=configuration_plot_label(configuration), alpha=observed_alpha,
                    color=color, zorder=3)
    else:
        ax.plot(sizes, speedups, marker=observed_marker, linestyle=observed_linestyle,
                linewidth=observed_linewidth, markersize=observed_markersize,
                label=configuration_plot_label(configuration), alpha=observed_alpha,
                color=color, zorder=3)

ax.axhline(y=1.0, color=reference_line_color, linestyle='--', linewidth=2, alpha=0.7, label='Baseline (JS) (1.0x)')
ax.set_xlabel('Dataset Size (samples)', fontsize=12, fontweight='bold')
ax.set_ylabel('Speedup vs Baseline (JS)', fontsize=12, fontweight='bold')
ax.set_title('Speedup Trend by Dataset Size (Smoothed)', fontsize=14, fontweight='bold')
ax.legend(fontsize=10)
ax.grid(alpha=0.3)
ax.set_xscale('log')

plt.tight_layout()
save_figure('../outputs/figures/speedup_by_size.svg', bbox_inches='tight', dpi=200)
plt.show()
No description has been provided for this image

7.2 Speedup Trend Prediction¶

By analyzing how speedup (WASM vs baseline) changes with dataset size, we can predict whether WASM optimizations become more or less beneficial at scale. Key questions:

  • Do WASM configurations show increasing returns as datasets grow?
  • Are there diminishing returns or convergence at larger scales?
  • Which configurations are most scalable?

The analysis below models speedup trends and extrapolates to predict performance gains for 2K, 5K, and 10K sample datasets.

In [201]:
# Predict speedup trends for larger datasets
# Helper function to predict execution time from polynomial coefficients
def predict_execution_time_from_poly(size, poly_coeffs):
    """Predict execution time using polynomial model in log-space"""
    log_size = np.log10(size)
    poly_func = np.poly1d(poly_coeffs)
    log_execution_time = poly_func(log_size)
    return 10 ** log_execution_time

# Use execution time predictions from previous cell to compute predicted speedups
speedup_predictions = {}
predicted_sizes = np.array([2000, 5000, 10000])

fig, ax = plt.subplots(figsize=(14, 8))
used_configurations = []

# Get baseline predictions
baseline_configuration = 'Baseline (JS)'
if baseline_configuration in execution_time_predictions:
    baseline_pred = execution_time_predictions[baseline_configuration]
    
    for configuration in [f for f in configuration_order if f in speedup_by_size['configuration'].unique()]:
        color = configuration_colors.get(configuration, '#9d9d9d')
        configuration_data = speedup_by_size[speedup_by_size['configuration'] == configuration].sort_values('dataset_size')
        if len(configuration_data) < 3:
            continue
        
        sizes = configuration_data['dataset_size'].values
        speedups = configuration_data['speedup'].values
        
        # Predict speedup using ratio of predicted execution_times
        try:
            if configuration in execution_time_predictions:
                configuration_pred = execution_time_predictions[configuration]
                predicted_speedups = []
                for size in predicted_sizes:
                    baseline_time = predict_execution_time_from_poly(size, baseline_pred['poly_coeffs'])
                    configuration_time = predict_execution_time_from_poly(size, configuration_pred['poly_coeffs'])
                    predicted_speedup = baseline_time / configuration_time
                    predicted_speedups.append(predicted_speedup)
                
                speedup_predictions[configuration] = dict(zip(predicted_sizes, predicted_speedups))
                used_configurations.append(configuration)
                
                # Observed measurements
                ax.plot(sizes, speedups, marker=observed_marker, linestyle=observed_linestyle,
                        linewidth=observed_linewidth, markersize=observed_markersize,
                        alpha=observed_alpha, color=color, zorder=3)
                
                # Fitted and extrapolated trend
                all_sizes = np.logspace(np.log10(sizes.min()), np.log10(10000), 100)
                fitted_speedups = []
                for s in all_sizes:
                    b_time = predict_execution_time_from_poly(s, baseline_pred['poly_coeffs'])
                    configuration_time = predict_execution_time_from_poly(s, configuration_pred['poly_coeffs'])
                    fitted_speedups.append(b_time / configuration_time)
                ax.plot(all_sizes, fitted_speedups, linestyle=fitted_linestyle,
                        linewidth=fitted_linewidth, alpha=fitted_alpha, color=color)
                
                # Predicted points beyond the observed range
                ax.scatter(predicted_sizes, predicted_speedups, s=predicted_marker_size,
                           marker=predicted_marker, linewidth=2.5, alpha=predicted_alpha,
                           color=color, zorder=4)
        
        except Exception as e:
            print(f"Warning: Could not fit speedup model for {configuration}: {e}")
            continue

ax.axhline(y=1.0, color=reference_line_color, linestyle='--', linewidth=2, alpha=0.5)
ax.set_xlabel('Dataset Size (samples)', fontsize=12, fontweight='bold')
ax.set_ylabel('Predicted Speedup vs Baseline (JS)', fontsize=12, fontweight='bold')
ax.set_title('Speedup Trend Prediction (Polynomial Fit Extrapolation)', fontsize=14, fontweight='bold')
ax.set_xscale('log')
ax.grid(alpha=0.3)
add_scaling_legends(ax, used_configurations, include_predicted_points=True,
                    include_reference_line=True, config_loc='upper left',
                    style_loc='lower right', config_ncol=2)

plt.tight_layout()
save_figure('../outputs/figures/speedup_prediction.svg', bbox_inches='tight', dpi=200)
plt.show()

# Display speedup predictions
print("\n" + "="*100)
print("SPEEDUP TREND PREDICTIONS")
print("="*100)
print()

for configuration, predictions in sorted(speedup_predictions.items(), 
                                   key=lambda x: x[1].get(10000, 0), 
                                   reverse=True):
    print(f"🚀 {configuration}:")
    print(f"   Predicted speedups at scale:")
    for size, speedup in predictions.items():
        improvement_pct = (speedup - 1) * 100
        print(f"      - {size:,} samples: {speedup:.2f}x ({improvement_pct:+.1f}% vs baseline)")
    print()

print("="*100)
print("KEY INSIGHTS:")
print("="*100)
# Analyze trends
if speedup_predictions:
    print("\n📈 Scalability Analysis:")
    for configuration, preds in speedup_predictions.items():
        sizes_ordered = sorted(preds.keys())
        speedups_ordered = [preds[s] for s in sizes_ordered]
        if len(speedups_ordered) >= 2:
            trend = speedups_ordered[-1] - speedups_ordered[0]
            if trend > 0.1:
                print(f"   • {configuration}: INCREASING returns at scale (+{trend:.2f}x from 2K→10K)")
            elif trend < -0.1:
                print(f"   • {configuration}: DIMINISHING returns at scale ({trend:.2f}x from 2K→10K)")
            else:
                print(f"   • {configuration}: STABLE performance across scales")
    print()
No description has been provided for this image
====================================================================================================
SPEEDUP TREND PREDICTIONS
====================================================================================================

🚀 Configuration incorporating Tree:
   Predicted speedups at scale:
      - 2,000 samples: 1.13x (+12.9% vs baseline)
      - 5,000 samples: 1.35x (+34.8% vs baseline)
      - 10,000 samples: 1.61x (+61.3% vs baseline)

🚀 Configuration incorporating Matrix:
   Predicted speedups at scale:
      - 2,000 samples: 1.04x (+3.8% vs baseline)
      - 5,000 samples: 1.07x (+6.9% vs baseline)
      - 10,000 samples: 1.10x (+10.2% vs baseline)

🚀 Configuration incorporating Distance:
   Predicted speedups at scale:
      - 2,000 samples: 0.96x (-4.3% vs baseline)
      - 5,000 samples: 0.89x (-11.2% vs baseline)
      - 10,000 samples: 0.82x (-17.9% vs baseline)

🚀 Configuration incorporating NN Descent:
   Predicted speedups at scale:
      - 2,000 samples: 0.94x (-5.9% vs baseline)
      - 5,000 samples: 0.84x (-15.6% vs baseline)
      - 10,000 samples: 0.75x (-24.7% vs baseline)

🚀 Fully WASM-enabled configuration:
   Predicted speedups at scale:
      - 2,000 samples: 0.64x (-35.7% vs baseline)
      - 5,000 samples: 0.22x (-77.6% vs baseline)
      - 10,000 samples: 0.09x (-91.3% vs baseline)

🚀 Configuration incorporating Optimizer:
   Predicted speedups at scale:
      - 2,000 samples: 0.60x (-40.4% vs baseline)
      - 5,000 samples: 0.19x (-81.0% vs baseline)
      - 10,000 samples: 0.07x (-93.3% vs baseline)

====================================================================================================
KEY INSIGHTS:
====================================================================================================

📈 Scalability Analysis:
   • Configuration incorporating Distance: DIMINISHING returns at scale (-0.14x from 2K→10K)
   • Configuration incorporating Tree: INCREASING returns at scale (+0.48x from 2K→10K)
   • Configuration incorporating Matrix: STABLE performance across scales
   • Configuration incorporating NN Descent: DIMINISHING returns at scale (-0.19x from 2K→10K)
   • Configuration incorporating Optimizer: DIMINISHING returns at scale (-0.53x from 2K→10K)
   • Fully WASM-enabled configuration: DIMINISHING returns at scale (-0.56x from 2K→10K)

7.3 Memory Scaling by Dataset Size¶

Analysis of how memory consumption changes with dataset size across different WASM configurations.

In [202]:
# Memory scaling with dataset size
if 'memory_delta_mb' in df_analysis:
    fig, ax = plt.subplots(figsize=(10, 6))
    
    # Select representative dataset sizes
    all_sizes = sorted(df_analysis['dataset_size'].unique())
    if len(all_sizes) >= 4:
        # Pick first, middle two, and last
        key_sizes = [all_sizes[0], all_sizes[len(all_sizes)//3], 
                    all_sizes[2*len(all_sizes)//3], all_sizes[-1]]
    else:
        key_sizes = all_sizes
    
    x_pos = np.arange(len(key_sizes))
    width = 0.8 / len(configuration_order)
    
    # Compute baseline memory for comparison
    baseline_data = df_analysis[df_analysis['configuration_name'] == baseline_label]
    baseline_memory_series = baseline_data.groupby('dataset_size')['memory_delta_mb'].median().sort_index()
    
    for i, configuration in enumerate(configuration_order):
        configuration_data = df_analysis[df_analysis['configuration_name'] == configuration]
        mem_by_size = configuration_data.groupby('dataset_size')['memory_delta_mb'].median()
        
        mem_values = [mem_by_size.get(size, 0) for size in key_sizes]
        
        offset = (i - len(configuration_order)/2) * width + width/2
        bars = ax.bar(x_pos + offset, mem_values, width, 
                      label=configuration_plot_label(configuration), alpha=0.8,
                      color=configuration_colors.get(configuration, '#9d9d9d'))
        
        # Add value labels on bars
        for j, (bar, val) in enumerate(zip(bars, mem_values)):
            if val != 0:  # Show labels for both positive and negative values
                height = bar.get_height()
                # Position label above for positive, below for negative
                va = 'bottom' if val > 0 else 'top'
                y_pos = height if val > 0 else 0
                ax.text(bar.get_x() + bar.get_width()/2., y_pos,
                       f'{val:.0f}', ha='center', va=va, 
                       fontsize=8, rotation=0)
    
    ax.set_xlabel('Dataset Size (samples)', fontsize=12, fontweight='bold')
    ax.set_ylabel('Memory Delta (MB)', fontsize=12, fontweight='bold')
    ax.set_title('Memory Usage at Key Dataset Sizes', fontsize=14, fontweight='bold')
    ax.set_xticks(x_pos)
    ax.set_xticklabels([f'{size:,}' for size in key_sizes], rotation=15, ha='right')
    ax.legend(fontsize=9, loc='upper left', ncol=2)
    ax.grid(axis='y', alpha=0.3, linestyle='--')
    ax.axhline(y=0, color=reference_line_color, linestyle='-', linewidth=0.8, alpha=0.3)
    
    plt.tight_layout()
    save_figure('../outputs/figures/memory_scaling_by_size.svg', bbox_inches='tight', dpi=200)
    plt.show()
    
    # Summary statistics
    print("\nMemory Scaling Summary:")
    print("="*80)
    
    for configuration in configuration_order:
        configuration_data = df_analysis[df_analysis['configuration_name'] == configuration]
        if len(configuration_data) == 0:
            continue
        size_memory = configuration_data.groupby('dataset_size')['memory_delta_mb'].median().sort_index()
        if len(size_memory) >= 2:
            mem_min = size_memory.iloc[0]
            mem_max = size_memory.iloc[-1]
            mem_ratio = mem_max / mem_min if mem_min > 0 else float('inf')
            
            # Calculate efficiency metrics
            eff_min = size_memory.index[0] / mem_min if mem_min > 0 else 0
            eff_max = size_memory.index[-1] / mem_max if mem_max > 0 else 0
            eff_trend = "improving" if eff_max > eff_min else "degrading"
            
            print(f"{configuration}:")
            print(f"  Memory Usage: {mem_min:.1f} MB → {mem_max:.1f} MB (×{mem_ratio:.2f})")
            print(f"  Efficiency: {eff_min:.0f} → {eff_max:.0f} samples/MB ({eff_trend})")
            
            # Compare to baseline
            if configuration != baseline_label and len(baseline_memory_series) > 0:
                common_sizes = baseline_memory_series.index.intersection(size_memory.index)
                if len(common_sizes) >= 2:
                    overhead_small = ((size_memory.loc[common_sizes[0]] - baseline_memory_series.loc[common_sizes[0]]) 
                                     / baseline_memory_series.loc[common_sizes[0]]) * 100
                    overhead_large = ((size_memory.loc[common_sizes[-1]] - baseline_memory_series.loc[common_sizes[-1]]) 
                                     / baseline_memory_series.loc[common_sizes[-1]]) * 100
                    print(f"  Overhead vs Baseline: {overhead_small:+.1f}% → {overhead_large:+.1f}%")
            print()
else:
    print("Memory delta data not available in dataset.")
No description has been provided for this image
Memory Scaling Summary:
================================================================================
Baseline (JS):
  Memory Usage: 7.9 MB → 16.5 MB (×2.09)
  Efficiency: 10 → 60 samples/MB (improving)

Configuration incorporating Distance:
  Memory Usage: 4.8 MB → 10.3 MB (×2.13)
  Efficiency: 17 → 97 samples/MB (improving)
  Overhead vs Baseline: -39.0% → -37.7%

Configuration incorporating Tree:
  Memory Usage: 7.4 MB → 17.6 MB (×2.39)
  Efficiency: 11 → 57 samples/MB (improving)
  Overhead vs Baseline: -6.8% → +6.7%

Configuration incorporating Matrix:
  Memory Usage: 6.4 MB → 16.6 MB (×2.62)
  Efficiency: 13 → 60 samples/MB (improving)
  Overhead vs Baseline: -19.8% → +0.5%

Configuration incorporating NN Descent:
  Memory Usage: 13.5 MB → 21.7 MB (×1.61)
  Efficiency: 6 → 46 samples/MB (improving)
  Overhead vs Baseline: +70.0% → +31.5%

Configuration incorporating Optimizer:
  Memory Usage: 1.7 MB → 4.2 MB (×2.45)
  Efficiency: 47 → 240 samples/MB (improving)
  Overhead vs Baseline: -78.5% → -74.8%

Fully WASM-enabled configuration:
  Memory Usage: -3.0 MB → 22.7 MB (×inf)
  Efficiency: 0 → 44 samples/MB (improving)
  Overhead vs Baseline: -137.5% → +37.6%

7.4 Quality Preservation by Dataset Size¶

Analysis of embedding quality (trustworthiness) across dataset sizes to determine if WASM configurations maintain quality as data scales.

In [203]:
# Quality scaling with dataset size - HEATMAP VERSION
if 'trustworthiness' in df_analysis:
    # Prepare data for heatmap
    quality_pivot = df_analysis.pivot_table(
        values='trustworthiness', 
        index='configuration_name', 
        columns='dataset_size', 
        aggfunc='median'
    )
    
    # Reorder rows by configuration_order
    quality_pivot = quality_pivot.reindex([f for f in configuration_order if f in quality_pivot.index])
    
    # Create figure with single subplot
    fig, ax = plt.subplots(1, 1, figsize=(14, 5))
    
    # Calculate quality delta (vs baseline)
    baseline_quality = df_analysis[df_analysis['configuration_name'] == baseline_label].groupby('dataset_size')['trustworthiness'].median()
    
    # Calculate delta for each configuration
    quality_delta_df = pd.DataFrame(index=quality_pivot.index, columns=quality_pivot.columns)
    for configuration in quality_pivot.index:
        if configuration == baseline_label:
            quality_delta_df.loc[configuration, :] = 0.0
        else:
            for size in quality_pivot.columns:
                if size in baseline_quality.index and not np.isnan(quality_pivot.loc[configuration, size]):
                    delta_pct = ((quality_pivot.loc[configuration, size] - baseline_quality[size]) / baseline_quality[size]) * 100
                    quality_delta_df.loc[configuration, size] = delta_pct
    
    quality_delta_df = quality_delta_df.astype(float)
    
    # Plot delta heatmap
    vmax = max(abs(quality_delta_df.min().min()), abs(quality_delta_df.max().max()))
    im = ax.imshow(quality_delta_df.values, cmap='RdBu_r', aspect='auto', vmin=-vmax, vmax=vmax)
    
    # Set ticks and labels
    ax.set_xticks(range(len(quality_delta_df.columns)))
    ax.set_xticklabels([f'{int(s):,}' for s in quality_delta_df.columns], rotation=45, ha='right')
    ax.set_yticks(range(len(quality_delta_df.index)))
    ax.set_yticklabels([configuration_plot_label(configuration) for configuration in quality_delta_df.index], fontsize=10)
    
    # Add text annotations with delta percentages
    for i in range(len(quality_delta_df.index)):
        for j in range(len(quality_delta_df.columns)):
            val = quality_delta_df.values[i, j]
            if not np.isnan(val):
                text_color = 'white' if abs(val) > vmax * 0.5 else 'black'
                ax.text(j, i, f'{val:+.2f}%', ha='center', va='center', 
                           color=text_color, fontsize=9, fontweight='bold')
    
    ax.set_xlabel('Dataset Size (samples)', fontsize=12, fontweight='bold')
    ax.set_ylabel('Configuration', fontsize=12, fontweight='bold')
    ax.set_title('Quality Delta vs Baseline (JS) (%)', fontsize=14, fontweight='bold')
    
    # Add colorbar
    cbar = plt.colorbar(im, ax=ax)
    cbar.set_label('% Change', fontsize=11, fontweight='bold')
    
    plt.tight_layout()
    save_figure('../outputs/figures/quality_scaling_by_size.svg', bbox_inches='tight', dpi=200)
    plt.show()
    
    # Summary statistics
    print("\nQuality Preservation Summary:")
    print("="*80)
    print("All configurations should maintain quality within ±1% of baseline across dataset sizes.\n")
    
    for configuration in configuration_order:
        if configuration == baseline_label:
            continue
        configuration_data = df_analysis[df_analysis['configuration_name'] == configuration]
        if len(configuration_data) == 0:
            continue
        
        configuration_quality_by_size = configuration_data.groupby('dataset_size')['trustworthiness'].median()
        common_sizes = baseline_quality.index.intersection(configuration_quality_by_size.index)
        
        if len(common_sizes) > 0:
            quality_deltas = configuration_quality_by_size.loc[common_sizes] - baseline_quality.loc[common_sizes]
            quality_deltas_pct = (quality_deltas / baseline_quality.loc[common_sizes]) * 100
            
            print(f"{configuration}:")
            print(f"  Quality delta range: {quality_deltas_pct.min():+.3f}% to {quality_deltas_pct.max():+.3f}%")
            print(f"  Average delta: {quality_deltas_pct.mean():+.3f}%")
            status = "✓ Preserved" if abs(quality_deltas_pct.mean()) < 1.0 else "⚠ Degraded"
            print(f"  Status: {status}")
            print()
else:
    print("Trustworthiness data not available in dataset.")
No description has been provided for this image
Quality Preservation Summary:
================================================================================
All configurations should maintain quality within ±1% of baseline across dataset sizes.

Configuration incorporating Distance:
  Quality delta range: -0.154% to +0.723%
  Average delta: +0.155%
  Status: ✓ Preserved

Configuration incorporating Tree:
  Quality delta range: -0.009% to +0.169%
  Average delta: +0.063%
  Status: ✓ Preserved

Configuration incorporating Matrix:
  Quality delta range: +0.009% to +0.118%
  Average delta: +0.082%
  Status: ✓ Preserved

Configuration incorporating NN Descent:
  Quality delta range: -0.446% to +0.105%
  Average delta: -0.163%
  Status: ✓ Preserved

Configuration incorporating Optimizer:
  Quality delta range: -2.440% to -0.097%
  Average delta: -1.243%
  Status: ⚠ Degraded

Fully WASM-enabled configuration:
  Quality delta range: -2.574% to -0.072%
  Average delta: -1.296%
  Status: ⚠ Degraded

7.5 FPS & Responsiveness by Dataset Size¶

Analysis of how user experience metrics (FPS and interaction latency) change with dataset size.

In [32]:
# FPS and Responsiveness scaling with dataset size
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Plot 1: FPS delta vs baseline
if 'fps_avg' in df_analysis:
    baseline_fps = df_analysis[df_analysis['configuration_name'] == baseline_label].groupby('dataset_size')['fps_avg'].median()
    
    for configuration in configuration_order:
        if configuration == baseline_label:
            continue
        configuration_data = df_analysis[df_analysis['configuration_name'] == configuration]
        if len(configuration_data) == 0:
            continue
        
        configuration_fps_by_size = configuration_data.groupby('dataset_size')['fps_avg'].median()
        common_sizes = baseline_fps.index.intersection(configuration_fps_by_size.index)
        
        if len(common_sizes) == 0:
            continue
        
        fps_delta_pct = ((configuration_fps_by_size.loc[common_sizes] - baseline_fps.loc[common_sizes]) / baseline_fps.loc[common_sizes]) * 100
        
        color = configuration_colors.get(configuration, '#9d9d9d')
        axes[0].plot(common_sizes, fps_delta_pct, marker=observed_marker,
                     linestyle=observed_linestyle, linewidth=observed_linewidth,
                     markersize=observed_markersize, label=configuration_plot_label(configuration),
                     alpha=observed_alpha, color=color)
    
    axes[0].axhline(y=0, color=reference_line_color, linestyle='--', linewidth=2, alpha=0.5, label=configuration_plot_label('Baseline'))
    axes[0].set_xlabel('Dataset Size (samples)', fontsize=12, fontweight='bold')
    axes[0].set_ylabel('FPS Change vs Baseline (JS) (%)', fontsize=12, fontweight='bold')
    axes[0].set_title('FPS Impact by Dataset Size', fontsize=14, fontweight='bold')
    axes[0].legend(fontsize=9, ncol=2)
    axes[0].grid(alpha=0.3)
    axes[0].set_xscale('log')
else:
    axes[0].axis('off')

# Plot 2: Responsiveness delta vs baseline
if 'responsiveness_ms' in df_analysis:
    baseline_resp = df_analysis[df_analysis['configuration_name'] == baseline_label].groupby('dataset_size')['responsiveness_ms'].median()
    
    for configuration in configuration_order:
        if configuration == baseline_label:
            continue
        configuration_data = df_analysis[df_analysis['configuration_name'] == configuration]
        if len(configuration_data) == 0:
            continue
        
        configuration_resp_by_size = configuration_data.groupby('dataset_size')['responsiveness_ms'].median()
        common_sizes = baseline_resp.index.intersection(configuration_resp_by_size.index)
        
        if len(common_sizes) == 0:
            continue
        
        resp_delta_pct = ((configuration_resp_by_size.loc[common_sizes] - baseline_resp.loc[common_sizes]) / baseline_resp.loc[common_sizes]) * 100
        
        color = configuration_colors.get(configuration, '#9d9d9d')
        axes[1].plot(common_sizes, resp_delta_pct, marker=observed_marker,
                     linestyle=observed_linestyle, linewidth=observed_linewidth,
                     markersize=observed_markersize, label=configuration_plot_label(configuration),
                     alpha=observed_alpha, color=color)
    
    axes[1].axhline(y=0, color=reference_line_color, linestyle='--', linewidth=2, alpha=0.5, label=configuration_plot_label('Baseline'))
    axes[1].set_xlabel('Dataset Size (samples)', fontsize=12, fontweight='bold')
    axes[1].set_ylabel('Latency Change vs Baseline (JS) (%)', fontsize=12, fontweight='bold')
    axes[1].set_title('Latency Impact by Dataset Size', fontsize=14, fontweight='bold')
    axes[1].legend(fontsize=9, ncol=2)
    axes[1].grid(alpha=0.3)
    axes[1].set_xscale('log')
else:
    axes[1].axis('off')

plt.tight_layout()
save_figure('../outputs/figures/fps_responsiveness_scaling_by_size.svg', bbox_inches='tight', dpi=200)
plt.show()

# Summary statistics
print("\nUX Metrics by Dataset Size Summary:")
print("="*80)

if 'fps_avg' in df_analysis and 'responsiveness_ms' in df_analysis:
    for configuration in configuration_order:
        configuration_data = df_analysis[df_analysis['configuration_name'] == configuration]
        if len(configuration_data) == 0:
            continue
        
        size_fps = configuration_data.groupby('dataset_size')['fps_avg'].median().sort_index()
        size_resp = configuration_data.groupby('dataset_size')['responsiveness_ms'].median().sort_index()
        
        if len(size_fps) >= 2 and len(size_resp) >= 2:
            fps_trend = size_fps.iloc[-1] - size_fps.iloc[0]
            resp_trend = size_resp.iloc[-1] - size_resp.iloc[0]
            
            print(f"{configuration}:")
            print(f"  FPS: {size_fps.iloc[0]:.1f} (small) → {size_fps.iloc[-1]:.1f} (large) | Trend: {fps_trend:+.1f}")
            print(f"  Latency: {size_resp.iloc[0]:.1f}ms (small) → {size_resp.iloc[-1]:.1f}ms (large) | Trend: {resp_trend:+.1f}ms")
            print()
else:
    print("UX metrics data not completely available.")
No description has been provided for this image
UX Metrics by Dataset Size Summary:
================================================================================
Baseline (JS):
  FPS: 48.2 (small) → 55.6 (large) | Trend: +7.4
  Latency: 21.0ms (small) → 67.6ms (large) | Trend: +46.6ms

Configuration incorporating Distance:
  FPS: 48.0 (small) → 55.6 (large) | Trend: +7.6
  Latency: 21.6ms (small) → 70.0ms (large) | Trend: +48.4ms

Configuration incorporating Tree:
  FPS: 47.6 (small) → 55.9 (large) | Trend: +8.3
  Latency: 20.0ms (small) → 63.7ms (large) | Trend: +43.7ms

Configuration incorporating Matrix:
  FPS: 49.6 (small) → 55.6 (large) | Trend: +6.0
  Latency: 20.6ms (small) → 64.6ms (large) | Trend: +44.0ms

Configuration incorporating NN Descent:
  FPS: 48.2 (small) → 55.6 (large) | Trend: +7.3
  Latency: 20.6ms (small) → 66.3ms (large) | Trend: +45.7ms

Configuration incorporating Optimizer:
  FPS: 0.0 (small) → 37.4 (large) | Trend: +37.4
  Latency: 25.5ms (small) → 68.7ms (large) | Trend: +43.1ms

Fully WASM-enabled configuration:
  FPS: 0.0 (small) → 37.1 (large) | Trend: +37.1
  Latency: 22.1ms (small) → 62.1ms (large) | Trend: +40.0ms

7.6 Multi-Metric Scaling Summary¶

Comprehensive overview of how all metrics scale together with dataset size, highlighting trade-offs and identifying which configurations maintain best overall performance characteristics at scale.

In [205]:
# Create comprehensive scaling summary table
scaling_summary = []

for configuration in configuration_order:
    configuration_data = df_analysis[df_analysis['configuration_name'] == configuration]
    if len(configuration_data) == 0:
        continue
    
    row = {'Configuration': configuration}
    
    # Get small and large dataset metrics
    small_data = configuration_data[configuration_data['Scope'] == 'small']
    large_data = configuration_data[configuration_data['Scope'] == 'large']
    
    if len(small_data) > 0 and len(large_data) > 0:
        # Execution time
        execution_time_small = small_data['execution_time_ms'].median()
        execution_time_large = large_data['execution_time_ms'].median()
        row['Execution Time Growth'] = f"{execution_time_large / execution_time_small:.2f}x"
        
        # Memory
        if 'memory_delta_mb' in df_analysis:
            mem_small = small_data['memory_delta_mb'].median()
            mem_large = large_data['memory_delta_mb'].median()
            row['Memory Growth'] = f"{mem_large / mem_small:.2f}x"
        
        # Quality
        if 'trustworthiness' in df_analysis:
            qual_small = small_data['trustworthiness'].median()
            qual_large = large_data['trustworthiness'].median()
            qual_change = ((qual_large - qual_small) / qual_small) * 100
            row['Quality Δ'] = f"{qual_change:+.2f}%"
        
        # FPS
        if 'fps_avg' in df_analysis:
            fps_small = small_data['fps_avg'].median()
            fps_large = large_data['fps_avg'].median()
            fps_change = ((fps_large - fps_small) / fps_small) * 100
            row['FPS Δ'] = f"{fps_change:+.1f}%"
        
        # Responsiveness
        if 'responsiveness_ms' in df_analysis:
            resp_small = small_data['responsiveness_ms'].median()
            resp_large = large_data['responsiveness_ms'].median()
            resp_growth = resp_large / resp_small
            row['Latency Growth'] = f"{resp_growth:.2f}x"
        
        scaling_summary.append(row)

scaling_summary_df = pd.DataFrame(scaling_summary)

print("\n" + "="*100)
print("MULTI-METRIC SCALING SUMMARY (Small → Large Datasets)")
print("="*100)
print()
display(scaling_summary_df)

print("\nInterpretation Guide:")
print("  • Execution Time Growth: How much slower for large datasets (lower is better)")
print("  • Memory Growth: How much more memory needed (lower is better)")
print("  • Quality Δ: Change in trustworthiness (near 0% is ideal)")
print("  • FPS Δ: Change in frame rate (negative = slower rendering)")
print("  • Latency Growth: How much latency increases (lower is better)")
print()

# Identify best scalers
if len(scaling_summary_df) > 0:
    print("="*100)
    print("BEST SCALABILITY BY METRIC:")
    print("="*100)
    
    if 'Execution Time Growth' in scaling_summary_df:
        execution_time_values = scaling_summary_df['Execution Time Growth'].str.replace('x', '').astype(float)
        best_execution_time_scaler = scaling_summary_df.iloc[execution_time_values.idxmin()]['Configuration']
        print(f"  ⚡ Best Execution Time Scaling: {best_execution_time_scaler}")
    
    if 'Memory Growth' in scaling_summary_df:
        memory_values = scaling_summary_df['Memory Growth'].str.replace('x', '').astype(float)
        best_memory_scaler = scaling_summary_df.iloc[memory_values.idxmin()]['Configuration']
        print(f"  💾 Best Memory Scaling: {best_memory_scaler}")
    
    if 'Quality Δ' in scaling_summary_df:
        quality_values = scaling_summary_df['Quality Δ'].str.replace('%', '').astype(float).abs()
        best_quality_scaler = scaling_summary_df.iloc[quality_values.idxmin()]['Configuration']
        print(f"  🎯 Most Stable Quality: {best_quality_scaler}")
    
    if 'FPS Δ' in scaling_summary_df:
        fps_values = scaling_summary_df['FPS Δ'].str.replace('%', '').astype(float)
        best_fps_scaler = scaling_summary_df.iloc[fps_values.idxmax()]['Configuration']
        print(f"  🎬 Best FPS Scaling: {best_fps_scaler}")
    
    print()
    print("💡 Recommendation: Configurations with low growth factors maintain better performance at scale.")
====================================================================================================
MULTI-METRIC SCALING SUMMARY (Small → Large Datasets)
====================================================================================================

Configuration Execution Time Growth Memory Growth Quality Δ FPS Δ Latency Growth
0 Baseline (JS) 1.69x 5.46x -13.30% +1.8% 4.82x
1 Configuration incorporating Distance 1.65x 6.41x -14.08% +1.1% 4.46x
2 Configuration incorporating Tree 1.64x 3.03x -13.77% +3.3% 4.64x
3 Configuration incorporating Matrix 1.67x 3.78x -14.14% +0.9% 4.78x
4 Configuration incorporating NN Descent 1.62x 1.84x -14.06% +1.8% 4.17x
5 Configuration incorporating Optimizer 4.62x 18.40x -14.52% +inf% 3.57x
6 Fully WASM-enabled configuration 5.32x 2.06x -14.74% +inf% 3.97x
Interpretation Guide:
  • Execution Time Growth: How much slower for large datasets (lower is better)
  • Memory Growth: How much more memory needed (lower is better)
  • Quality Δ: Change in trustworthiness (near 0% is ideal)
  • FPS Δ: Change in frame rate (negative = slower rendering)
  • Latency Growth: How much latency increases (lower is better)

====================================================================================================
BEST SCALABILITY BY METRIC:
====================================================================================================
  ⚡ Best Execution Time Scaling: Configuration incorporating NN Descent
  💾 Best Memory Scaling: Configuration incorporating NN Descent
  🎯 Most Stable Quality: Baseline (JS)
  🎬 Best FPS Scaling: Configuration incorporating Optimizer

💡 Recommendation: Configurations with low growth factors maintain better performance at scale.

9. Overall Rankings: Composite Performance Scores¶

Rank configurations using a weighted composite score across all metrics.

In [206]:
# Calculate composite performance scores
def calculate_composite_scores(df, baseline='Baseline (JS)'):
    results = []
    
    for configuration in df['configuration_name'].unique():
        if configuration == baseline:
            continue
        
        configuration_data = df[df['configuration_name'] == configuration]
        baseline_data = df[df['configuration_name'] == baseline]
        
        # Execution-time speedup
        speedup = baseline_data['execution_time_ms'].median() / configuration_data['execution_time_ms'].median()
        
        # Quality ratio
        quality_ratio = configuration_data['trustworthiness'].median() / baseline_data['trustworthiness'].median() if 'trustworthiness' in df else 1.0
        
        # FPS ratio
        fps_ratio = configuration_data['fps_avg'].median() / baseline_data['fps_avg'].median() if 'fps_avg' in df else 1.0
        
        # Memory impact (lower is better, normalize to 0-1 scale)
        memory_delta = configuration_data['memory_delta_mb'].median() if 'memory_delta_mb' in df else 0
        memory_score = max(0, 1 - abs(memory_delta) / 100)  # Normalize
        
        # Composite score: weighted average
        # Weights: 50% speedup, 25% quality, 15% FPS, 10% memory
        composite = (0.50 * speedup + 0.25 * quality_ratio + 0.15 * fps_ratio + 0.10 * memory_score)
        
        results.append({
            'configuration': configuration,
            'speedup': speedup,
            'quality_ratio': quality_ratio,
            'fps_ratio': fps_ratio,
            'memory_score': memory_score,
            'composite_score': composite
        })
    
    return pd.DataFrame(results).sort_values('composite_score', ascending=False)

rankings = calculate_composite_scores(df_analysis)

print("Overall Performance Rankings:")
print("="*80)
display(rankings.round(3))

print("\nTop 3 Configurations:")
for i, (idx, row) in enumerate(rankings.head(3).iterrows(), 1):
    print(f"{i}. {row['configuration']} (score: {row['composite_score']:.3f})")
    print(f"   - Speedup: {row['speedup']:.2f}x")
    print(f"   - Quality ratio: {row['quality_ratio']:.3f}")
    print(f"   - FPS ratio: {row['fps_ratio']:.3f}")
Overall Performance Rankings:
================================================================================
configuration speedup quality_ratio fps_ratio memory_score composite_score
4 Fully WASM-enabled configuration 1.570 0.999 0.666 0.873 1.222
5 Configuration incorporating Optimizer 1.490 0.998 0.690 0.937 1.192
2 Configuration incorporating Matrix 1.028 1.001 1.001 0.893 1.004
1 Configuration incorporating Tree 1.010 1.001 0.992 0.902 0.994
0 Configuration incorporating Distance 1.002 1.000 1.000 0.877 0.989
3 Configuration incorporating NN Descent 1.007 1.001 0.992 0.831 0.986
Top 3 Configurations:
1. Fully WASM-enabled configuration (score: 1.222)
   - Speedup: 1.57x
   - Quality ratio: 0.999
   - FPS ratio: 0.666
2. Configuration incorporating Optimizer (score: 1.192)
   - Speedup: 1.49x
   - Quality ratio: 0.998
   - FPS ratio: 0.690
3. Configuration incorporating Matrix (score: 1.004)
   - Speedup: 1.03x
   - Quality ratio: 1.001
   - FPS ratio: 1.001
In [207]:
# Rankings visualization
if len(rankings) > 0:
    fig, ax = plt.subplots(figsize=(10, 7))
    
    colors = [configuration_colors.get(configuration, '#9d9d9d') for configuration in rankings['configuration']]
    bars = ax.barh([configuration_plot_label(configuration) for configuration in rankings['configuration']], rankings['composite_score'], color=colors, alpha=0.85)
    
    # Add value labels
    for bar in bars:
        width = bar.get_width()
        ax.text(width, bar.get_y() + bar.get_height()/2., f'{width:.3f}',
               ha='left', va='center', fontsize=10, fontweight='bold',
               bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8))
    
    ax.set_xlabel('Composite Score (Higher is Better)', fontsize=12, fontweight='bold')
    ax.set_title('Overall Configuration Rankings', fontsize=14, fontweight='bold')
    ax.grid(axis='x', alpha=0.3)
    
    plt.tight_layout()
    save_figure('../outputs/figures/overall_rankings.svg', bbox_inches='tight', dpi=200)
    plt.show()
No description has been provided for this image
In [208]:
# Ensure aggregated_table exists before export
if 'aggregated_table' not in globals():
    aggregated_table = df_analysis.groupby(['Scope', 'configuration_name']).agg({
        'execution_time_ms': 'median',
        'memory_delta_mb': 'median',
        'trustworthiness': 'median',
        'fps_avg': 'median',
        'responsiveness_ms': 'median'
    }).round(2)

    # Calculate speedup for each Scope × Configuration combination
    speedup_data = []
    for scope in df_analysis['Scope'].unique():
        scope_data = df_analysis[df_analysis['Scope'] == scope]
        baseline_execution_time = scope_data[scope_data['configuration_name'] == baseline_label]['execution_time_ms'].median()
        
        if pd.notna(baseline_execution_time) and baseline_execution_time > 0:
            for configuration in scope_data['configuration_name'].unique():
                configuration_execution_time = scope_data[scope_data['configuration_name'] == configuration]['execution_time_ms'].median()
                if pd.notna(configuration_execution_time) and configuration_execution_time > 0:
                    speedup = baseline_execution_time / configuration_execution_time
                    speedup_data.append({
                        'Scope': scope,
                        'configuration_name': configuration,
                        'speedup': speedup
                    })

    speedup_table = pd.DataFrame(speedup_data)
    if len(speedup_table) > 0:
        aggregated_table = aggregated_table.reset_index()
        aggregated_table = aggregated_table.merge(
            speedup_table, 
            on=['Scope', 'configuration_name'], 
            how='left'
        )
        aggregated_table = aggregated_table.set_index(['Scope', 'configuration_name'])

        # Reorder columns for clarity
        column_order = ['execution_time_ms', 'speedup', 'trustworthiness', 'fps_avg', 'responsiveness_ms', 'memory_delta_mb']
        aggregated_table = aggregated_table[column_order]

        # Rename columns for better readability
        aggregated_table.columns = [
            'Execution Time (ms)', 
            'Speedup (×)', 
            'Quality (Trust.)', 
            'FPS', 
            'Latency (ms)', 
            'Memory (MB)'
        ]

# Export aggregated table for thesis
import os
os.makedirs('../outputs/tables', exist_ok=True)

aggregated_table.to_csv('../outputs/tables/aggregated_comparison_table.csv')
print("✓ Saved aggregated comparison table to ../outputs/tables/aggregated_comparison_table.csv")

# Summary statistics across all scopes
print("\n" + "="*100)
print("SUMMARY: Average Performance Across All Scopes")
print("="*100)

overall_summary = aggregated_table.groupby(level='configuration_name').mean().round(2)
overall_summary = overall_summary.reindex([f for f in configuration_order if f in overall_summary.index])

display(overall_summary)

print("\nKey Findings:")
best_speedup = overall_summary['Speedup (×)'].idxmax()
best_quality = overall_summary['Quality (Trust.)'].idxmax()
best_fps = overall_summary['FPS'].idxmax()
best_latency = overall_summary['Latency (ms)'].idxmin()

print(f"  • Best Average Speedup: {best_speedup} ({overall_summary.loc[best_speedup, 'Speedup (×)']:.2f}x)")
print(f"  • Best Average Quality: {best_quality} ({overall_summary.loc[best_quality, 'Quality (Trust.)']:.3f})")
print(f"  • Best Average FPS: {best_fps} ({overall_summary.loc[best_fps, 'FPS']:.1f})")
print(f"  • Best Average Latency: {best_latency} ({overall_summary.loc[best_latency, 'Latency (ms)']:.1f} ms)")
✓ Saved aggregated comparison table to ../outputs/tables/aggregated_comparison_table.csv

====================================================================================================
SUMMARY: Average Performance Across All Scopes
====================================================================================================
Execution Time (ms) Speedup (×) Quality (Trust.) FPS Latency (ms) Memory (MB)
configuration_name
Baseline (JS) 3511.83 1.00 0.90 55.88 33.74 12.43
Configuration incorporating Distance 3554.90 0.99 0.90 56.02 34.86 10.20
Configuration incorporating Tree 3463.57 1.01 0.90 55.67 32.19 13.85
Configuration incorporating Matrix 3458.37 1.01 0.91 56.06 32.00 11.66
Configuration incorporating NN Descent 3572.50 0.98 0.90 55.73 33.38 18.35
Configuration incorporating Optimizer 2191.83 2.05 0.89 28.24 36.07 6.30
Fully WASM-enabled configuration 2089.13 2.28 0.89 28.34 31.56 14.32
Key Findings:
  • Best Average Speedup: Fully WASM-enabled configuration (2.28x)
  • Best Average Quality: Configuration incorporating Matrix (0.910)
  • Best Average FPS: Configuration incorporating Matrix (56.1)
  • Best Average Latency: Fully WASM-enabled configuration (31.6 ms)
In [209]:
# Create heatmap visualization of the aggregated table
fig, axes = plt.subplots(2, 3, figsize=(20, 12))
axes = axes.flatten()

metrics = ['Execution Time (ms)', 'Speedup (×)', 'Quality (Trust.)', 'FPS', 'Latency (ms)', 'Memory (MB)']
cmaps = ['YlOrRd_r', 'RdYlGn', 'RdYlGn', 'RdYlGn', 'YlOrRd', 'RdYlGn_r']
vmin_vmax = [
    None,  # Execution time - use data range
    (0.5, 2.0),  # Speedup - center around 1.0
    None,  # Quality - use data range
    None,  # FPS - use data range
    None,  # Latency - use data range
    None,  # Memory - use data range
]

for idx, (metric, cmap, vlim) in enumerate(zip(metrics, cmaps, vmin_vmax)):
    if metric not in aggregated_table.columns:
        axes[idx].axis('off')
        continue
    
    # Pivot for heatmap
    heatmap_data = aggregated_table.reset_index().pivot(
        index='Scope', 
        columns='configuration_name', 
        values=metric
    )
    
    # Reorder columns to match configuration_order
    cols_present = [f for f in configuration_order if f in heatmap_data.columns]
    heatmap_data = heatmap_data[cols_present]
    heatmap_data = heatmap_data.rename(columns=configuration_plot_label)
    
    # Create heatmap
    if vlim:
        sns.heatmap(heatmap_data, annot=True, fmt='.2f', cmap=cmap, 
                   ax=axes[idx], cbar_kws={'label': metric},
                   vmin=vlim[0], vmax=vlim[1], center=(vlim[0] + vlim[1]) / 2)
    else:
        sns.heatmap(heatmap_data, annot=True, fmt='.2f', cmap=cmap, 
                   ax=axes[idx], cbar_kws={'label': metric})
    
    axes[idx].set_title(f'{metric} by Scope × Configuration', fontsize=13, fontweight='bold')
    axes[idx].set_xlabel('WASM Configuration', fontsize=11)
    axes[idx].set_ylabel('Scope', fontsize=11)
    axes[idx].tick_params(axis='x', rotation=45)
    for label in axes[idx].get_xticklabels():
        label.set_color(configuration_plot_palette.get(label.get_text(), reference_line_color))

plt.tight_layout()
save_figure('../outputs/figures/aggregated_comparison_heatmaps.svg', bbox_inches='tight', dpi=200)
plt.show()

print("Heatmap Color Interpretation:")
print("  Green = Better performance | Red = Worse performance")
print("  Darker colors = More extreme values")
No description has been provided for this image
Heatmap Color Interpretation:
  Green = Better performance | Red = Worse performance
  Darker colors = More extreme values
In [210]:
# Create comprehensive aggregated table: Scope × Configuration with all metrics
aggregated_table = df_analysis.groupby(['Scope', 'configuration_name']).agg({
    'execution_time_ms': 'median',
    'memory_delta_mb': 'median',
    'trustworthiness': 'median',
    'fps_avg': 'median',
    'responsiveness_ms': 'median'
}).round(2)

# Calculate speedup for each Scope × Configuration combination
speedup_data = []
for scope in df_analysis['Scope'].unique():
    scope_data = df_analysis[df_analysis['Scope'] == scope]
    baseline_execution_time = scope_data[scope_data['configuration_name'] == baseline_label]['execution_time_ms'].median()
    
    if pd.notna(baseline_execution_time) and baseline_execution_time > 0:
        for configuration in scope_data['configuration_name'].unique():
            configuration_execution_time = scope_data[scope_data['configuration_name'] == configuration]['execution_time_ms'].median()
            if pd.notna(configuration_execution_time) and configuration_execution_time > 0:
                speedup = baseline_execution_time / configuration_execution_time
                speedup_data.append({
                    'Scope': scope,
                    'configuration_name': configuration,
                    'speedup': speedup
                })

speedup_table = pd.DataFrame(speedup_data)
speedup_pivot = speedup_table.pivot(index='Scope', columns='configuration_name', values='speedup')

# Merge speedup into aggregated table
aggregated_table = aggregated_table.reset_index()
aggregated_table = aggregated_table.merge(
    speedup_table, 
    on=['Scope', 'configuration_name'], 
    how='left'
)
aggregated_table = aggregated_table.set_index(['Scope', 'configuration_name'])

# Reorder columns for clarity
column_order = ['execution_time_ms', 'speedup', 'trustworthiness', 'fps_avg', 'responsiveness_ms', 'memory_delta_mb']
aggregated_table = aggregated_table[column_order]

# Rename columns for better readability
aggregated_table.columns = [
    'Execution Time (ms)', 
    'Speedup (×)', 
    'Quality (Trust.)', 
    'FPS', 
    'Latency (ms)', 
    'Memory (MB)'
]

print("="*100)
print("AGGREGATED COMPARISON TABLE: Median Metrics by Scope × WASM Configuration")
print("="*100)
print()

# Display the full table
display(aggregated_table.round(2))

print()
print("Table Interpretation:")
print("  • Execution time: Lower is better (faster execution)")
print("  • Speedup: Higher is better (>1.0 = faster than baseline)")
print("  • Quality: Higher is better (trustworthiness score)")
print("  • FPS: Higher is better (smoother visualization)")
print("  • Latency: Lower is better (more responsive)")
print("  • Memory: Context-dependent (delta from baseline)")
====================================================================================================
AGGREGATED COMPARISON TABLE: Median Metrics by Scope × WASM Configuration
====================================================================================================

Execution Time (ms) Speedup (×) Quality (Trust.) FPS Latency (ms) Memory (MB)
Scope configuration_name
large Fully WASM-enabled configuration 3715.45 1.22 0.79 37.10 62.13 22.74
Baseline (JS) 4535.25 1.00 0.81 55.56 67.63 16.53
Configuration incorporating Distance 4562.20 0.99 0.80 55.60 70.00 10.29
Configuration incorporating Matrix 4447.45 1.02 0.81 55.57 64.64 16.61
Configuration incorporating NN Descent 4590.80 0.99 0.80 55.55 66.34 21.74
Configuration incorporating Optimizer 3766.50 1.20 0.79 37.37 68.67 4.17
Configuration incorporating Tree 4380.40 1.04 0.81 55.87 63.67 17.64
mid Fully WASM-enabled configuration 1853.45 1.79 0.97 47.93 16.92 9.14
Baseline (JS) 3319.80 1.00 0.97 57.49 19.56 17.73
Configuration incorporating Distance 3340.85 0.99 0.97 57.48 18.88 18.71
Configuration incorporating Matrix 3269.05 1.02 0.97 57.54 17.85 13.97
Configuration incorporating NN Descent 3286.65 1.01 0.97 57.05 17.89 21.46
Configuration incorporating Optimizer 1993.35 1.67 0.97 47.36 20.30 14.50
Configuration incorporating Tree 3341.35 0.99 0.97 57.03 19.17 18.10
small Fully WASM-enabled configuration 698.50 3.84 0.92 0.00 15.64 11.07
Baseline (JS) 2680.45 1.00 0.93 54.59 14.04 3.03
Configuration incorporating Distance 2761.65 0.97 0.94 54.98 15.69 1.61
Configuration incorporating Matrix 2658.60 1.01 0.94 55.08 13.52 4.40
Configuration incorporating NN Descent 2840.05 0.94 0.93 54.58 15.92 11.84
Configuration incorporating Optimizer 815.65 3.29 0.92 0.00 19.24 0.23
Configuration incorporating Tree 2668.95 1.00 0.93 54.10 13.73 5.82
Table Interpretation:
  • Execution time: Lower is better (faster execution)
  • Speedup: Higher is better (>1.0 = faster than baseline)
  • Quality: Higher is better (trustworthiness score)
  • FPS: Higher is better (smoother visualization)
  • Latency: Lower is better (more responsive)
  • Memory: Context-dependent (delta from baseline)

9.5 Aggregated Comparison Table¶

Comprehensive comparison of all metrics organized by Scope and WASM Configuration. This table provides a single reference for comparing performance characteristics across all dimensions.

In [211]:
# Create tables directory if needed
os.makedirs('../outputs/tables', exist_ok=True)

10. Export Results¶

Save all analysis results to CSV files for thesis inclusion.

In [212]:
# Create summaries directory if it doesn't exist
os.makedirs('../outputs/summaries', exist_ok=True)

# Export summary tables
if len(speedup_df) > 0:
    speedup_df.to_csv('../outputs/summaries/speedup_analysis.csv', index=False)
    print("✓ Saved speedup_analysis.csv")


if len(rankings) > 0:
    rankings.to_csv('../outputs/summaries/configuration_rankings.csv', index=False)
    print("✓ Saved configuration_rankings.csv")

# Export metric-specific summaries
summary_stats.to_csv('../outputs/summaries/metrics_summary.csv')
print("✓ Saved metrics_summary.csv")

print("\n" + "="*80)
print("All analysis results exported to ../outputs/summaries/")
print("All figures saved to ../outputs/figures/")
print("="*80)
✓ Saved speedup_analysis.csv
✓ Saved configuration_rankings.csv
✓ Saved metrics_summary.csv

================================================================================
All analysis results exported to ../outputs/summaries/
All figures saved to ../outputs/figures/
================================================================================

11. Final Conclusions & Recommendations¶

This section translates the notebook's multi-dimensional performance analysis into actionable guidance for choosing WASM configurations.

Recommendation Basis: The following recommendations are based on aggregated and scope-specific metrics including execution time, trustworthiness, FPS, latency, and memory usage. These metrics are combined using composite scores computed in earlier sections to identify optimal configurations for different use cases and dataset sizes.

11.1 Best Configurations by Optimization Goal¶

No single configuration dominates all dimensions. Choose based on your priorities:

In [213]:
# Determine best configurations for each optimization goal
best_configs = {}

# 1. Raw Performance (Speedup)
if len(speedup_df) > 0:
    best_speedup = speedup_df.groupby('configuration')['speedup'].median().sort_values(ascending=False).head(3)
    best_configs['Raw Performance (Speedup)'] = best_speedup

# 2. Quality Preservation (skipped - detailed quality-delta analysis removed)
# The explicit quality-delta dataframe was removed per request; rely on median trustworthiness instead
if 'trustworthiness' in df_analysis:
    quality_preservation = df_analysis.groupby('configuration_name')['trustworthiness'].median().sort_values(ascending=False).head(3)
    best_configs['Quality (median trustworthiness)'] = quality_preservation

# 3. UI Smoothness (FPS)
if 'fps_avg' in df_analysis:
    best_fps = df_analysis.groupby('configuration_name')['fps_avg'].median().sort_values(ascending=False).head(3)
    best_configs['UI Smoothness (FPS)'] = best_fps

# 4. Responsiveness (Low Latency)
if 'responsiveness_ms' in df_analysis:
    best_latency = df_analysis.groupby('configuration_name')['responsiveness_ms'].median().sort_values().head(3)
    best_configs['Responsiveness (Low Latency)'] = best_latency

# 5. Memory Efficiency (minimal delta)
if 'memory_delta_mb' in df_analysis:
    memory_efficiency = df_analysis.groupby('configuration_name')['memory_delta_mb'].apply(
        lambda x: abs(x).mean()
    ).sort_values().head(3)
    best_configs['Memory Efficiency'] = memory_efficiency

# Display recommendations
print("="*100)
print("BEST CONFIGURATIONS BY OPTIMIZATION GOAL")
print("="*100)
print()

for goal, rankings in best_configs.items():
    print(f"🎯 {goal}:")
    for rank, (config, value) in enumerate(rankings.items(), 1):
        if 'Speedup' in goal:
            print(f"   {rank}. {config}: {value:.2f}x faster")
        elif 'Quality' in goal:
            print(f"   {rank}. {config}: {value:.4f} median trustworthiness")
        elif 'FPS' in goal:
            print(f"   {rank}. {config}: {value:.1f} FPS")
        elif 'Latency' in goal:
            print(f"   {rank}. {config}: {value:.1f} ms")
        elif 'Memory' in goal:
            print(f"   {rank}. {config}: {value:.1f} MB avg delta")
    print()
====================================================================================================
BEST CONFIGURATIONS BY OPTIMIZATION GOAL
====================================================================================================

🎯 Raw Performance (Speedup):
   1. Fully WASM-enabled configuration: 1.74x faster
   2. Configuration incorporating Optimizer: 1.66x faster
   3. Configuration incorporating Matrix: 1.02x faster

🎯 Quality (median trustworthiness):
   1. Configuration incorporating Tree: 0.9701 median trustworthiness
   2. Configuration incorporating NN Descent: 0.9698 median trustworthiness
   3. Configuration incorporating Matrix: 0.9696 median trustworthiness

🎯 UI Smoothness (FPS):
   1. Configuration incorporating Matrix: 57.5 FPS
   2. Baseline (JS): 57.5 FPS
   3. Configuration incorporating Distance: 57.5 FPS

🎯 Responsiveness (Low Latency):
   1. Configuration incorporating Matrix: 19.5 ms
   2. Configuration incorporating NN Descent: 20.4 ms
   3. Configuration incorporating Tree: 20.8 ms

🎯 Memory Efficiency:
   1. Configuration incorporating Optimizer: 8.7 MB avg delta
   2. Configuration incorporating Distance: 12.3 MB avg delta
   3. Configuration incorporating Matrix: 12.9 MB avg delta

11.2 Recommendations by Dataset Size¶

Different WASM configurations perform optimally at different scales:

In [214]:
# Analyze performance by dataset scope
scope_recommendations = {}

for scope in sorted(df_analysis['Scope'].unique()):
    scope_data = df_analysis[df_analysis['Scope'] == scope]
    baseline_data = scope_data[scope_data['configuration_name'] == baseline_label]
    
    if len(baseline_data) == 0:
        continue
    
    baseline_execution_time = baseline_data['execution_time_ms'].median()
    
    # Calculate composite score for this scope
    scope_scores = []
    for configuration in scope_data['configuration_name'].unique():
        if configuration == baseline_label:
            continue
        
        configuration_data = scope_data[scope_data['configuration_name'] == configuration]
        
        # Execution-time speedup
        execution_time = configuration_data['execution_time_ms'].median()
        speedup = baseline_execution_time / execution_time if execution_time > 0 else 0
        
        # Quality preservation (1.0 = perfect)
        baseline_quality = baseline_data['trustworthiness'].median() if 'trustworthiness' in baseline_data else 1.0
        configuration_quality = configuration_data['trustworthiness'].median() if 'trustworthiness' in configuration_data else 1.0
        quality_ratio = configuration_quality / baseline_quality if baseline_quality > 0 else 1.0
        
        # FPS ratio
        baseline_fps = baseline_data['fps_avg'].median() if 'fps_avg' in baseline_data else 60
        configuration_fps = configuration_data['fps_avg'].median() if 'fps_avg' in configuration_data else 60
        fps_ratio = configuration_fps / baseline_fps if baseline_fps > 0 else 1.0
        
        # Composite score (balanced weights)
        score = 0.5 * speedup + 0.3 * quality_ratio + 0.2 * fps_ratio
        
        scope_scores.append({
            'configuration': configuration,
            'speedup': speedup,
            'quality_ratio': quality_ratio,
            'fps_ratio': fps_ratio,
            'composite_score': score
        })
    
    if scope_scores:
        scope_df = pd.DataFrame(scope_scores).sort_values('composite_score', ascending=False)
        scope_recommendations[scope] = scope_df.head(3)

# Display scope-specific recommendations
print("="*100)
print("RECOMMENDATIONS BY DATASET SIZE (SCOPE)")
print("="*100)
print()

for scope, recs in scope_recommendations.items():
    print(f"📊 {scope.upper()} Datasets:")
    print()
    
    for rank, (idx, row) in enumerate(recs.iterrows(), 1):
        print(f"   {rank}. {row['configuration']}")
        print(f"      - Speedup: {row['speedup']:.2f}x")
        print(f"      - Quality Ratio: {row['quality_ratio']:.3f} (1.0 = perfect preservation)")
        print(f"      - FPS Ratio: {row['fps_ratio']:.2f}x")
        print(f"      - Composite Score: {row['composite_score']:.3f}")
        print()
    
    print("-" * 100)
    print()
====================================================================================================
RECOMMENDATIONS BY DATASET SIZE (SCOPE)
====================================================================================================

📊 LARGE Datasets:

   1. Fully WASM-enabled configuration
      - Speedup: 1.22x
      - Quality Ratio: 0.976 (1.0 = perfect preservation)
      - FPS Ratio: 0.67x
      - Composite Score: 1.037

   2. Configuration incorporating Optimizer
      - Speedup: 1.20x
      - Quality Ratio: 0.976 (1.0 = perfect preservation)
      - FPS Ratio: 0.67x
      - Composite Score: 1.029

   3. Configuration incorporating Tree
      - Speedup: 1.04x
      - Quality Ratio: 1.000 (1.0 = perfect preservation)
      - FPS Ratio: 1.01x
      - Composite Score: 1.019

----------------------------------------------------------------------------------------------------

📊 MID Datasets:

   1. Fully WASM-enabled configuration
      - Speedup: 1.79x
      - Quality Ratio: 0.999 (1.0 = perfect preservation)
      - FPS Ratio: 0.83x
      - Composite Score: 1.362

   2. Configuration incorporating Optimizer
      - Speedup: 1.67x
      - Quality Ratio: 0.999 (1.0 = perfect preservation)
      - FPS Ratio: 0.82x
      - Composite Score: 1.297

   3. Configuration incorporating Matrix
      - Speedup: 1.02x
      - Quality Ratio: 1.001 (1.0 = perfect preservation)
      - FPS Ratio: 1.00x
      - Composite Score: 1.008

----------------------------------------------------------------------------------------------------

📊 SMALL Datasets:

   1. Fully WASM-enabled configuration
      - Speedup: 3.84x
      - Quality Ratio: 0.992 (1.0 = perfect preservation)
      - FPS Ratio: 0.00x
      - Composite Score: 2.216

   2. Configuration incorporating Optimizer
      - Speedup: 3.29x
      - Quality Ratio: 0.990 (1.0 = perfect preservation)
      - FPS Ratio: 0.00x
      - Composite Score: 1.940

   3. Configuration incorporating Matrix
      - Speedup: 1.01x
      - Quality Ratio: 1.010 (1.0 = perfect preservation)
      - FPS Ratio: 1.01x
      - Composite Score: 1.009

----------------------------------------------------------------------------------------------------

11.3 Explicit Trade-off Statements¶

Critical Understanding: There is no "winner everywhere"

Each WASM configuration represents specific trade-offs in the performance-quality-memory space:

In [215]:
# Generate trade-off analysis for each configuration
print("="*100)
print("TRADE-OFF ANALYSIS: What You Gain vs What You Pay")
print("="*100)
print()

for configuration in sorted(df_analysis['configuration_name'].unique()):
    if configuration == baseline_label:
        continue
    
    print(f"⚖️  {configuration}")
    print("-" * 100)
    
    configuration_data = df_analysis[df_analysis['configuration_name'] == configuration]
    baseline_data = df_analysis[df_analysis['configuration_name'] == baseline_label]
    
    # Execution time analysis
    speedup_val = baseline_data['execution_time_ms'].median() / configuration_data['execution_time_ms'].median()
    if speedup_val > 1.1:
        print(f"   ✅ GAIN: {(speedup_val - 1) * 100:.1f}% faster execution (speedup: {speedup_val:.2f}x)")
    elif speedup_val < 0.9:
        print(f"   ❌ COST: {(1 - speedup_val) * 100:.1f}% slower execution (speedup: {speedup_val:.2f}x)")
    else:
        print(f"   ⚪ NEUTRAL: Similar execution time to baseline (speedup: {speedup_val:.2f}x)")
    
    # Quality analysis
    if 'trustworthiness' in configuration_data:
        quality_delta = (configuration_data['trustworthiness'].median() - baseline_data['trustworthiness'].median())
        quality_pct = (quality_delta / baseline_data['trustworthiness'].median()) * 100
        if abs(quality_pct) < 1:
            print(f"   ✅ Quality effectively preserved ({quality_pct:+.2f}%)")
        elif quality_delta > 0:
            print(f"   ✅ Quality slightly improved ({quality_pct:+.2f}%)")
        else:
            print(f"   ⚠️  Quality degradation ({quality_pct:.2f}%) - may impact embedding fidelity")
    
    # FPS analysis
    if 'fps_avg' in configuration_data:
        fps_delta_pct = ((configuration_data['fps_avg'].median() - baseline_data['fps_avg'].median()) 
                         / baseline_data['fps_avg'].median()) * 100
        if fps_delta_pct > 10:
            print(f"   ✅ Smoother UI ({fps_delta_pct:+.1f}% FPS increase)")
        elif fps_delta_pct < -10:
            print(f"   ❌ COST: Reduced smoothness ({fps_delta_pct:.1f}% FPS decrease)")
        else:
            print(f"   ⚪ Similar UI smoothness ({fps_delta_pct:+.1f}% FPS)")
    
    # Latency analysis
    if 'responsiveness_ms' in configuration_data:
        latency_delta = configuration_data['responsiveness_ms'].median() - baseline_data['responsiveness_ms'].median()
        latency_pct = (latency_delta / baseline_data['responsiveness_ms'].median()) * 100
        if latency_delta < -5:
            print(f"   ✅ More responsive ({latency_delta:.1f}ms faster, {latency_pct:.1f}%)")
        elif latency_delta > 5:
            print(f"   ❌ COST: Increased latency (+{latency_delta:.1f}ms, {latency_pct:+.1f}%)")
        else:
            print(f"   ⚪ Similar responsiveness ({latency_delta:+.1f}ms, {latency_pct:+.1f}%)")
    
    # Memory analysis
    if 'memory_delta_mb' in configuration_data:
        mem_delta = configuration_data['memory_delta_mb'].median() - baseline_data['memory_delta_mb'].median()
        if abs(mem_delta) < 5:
            print(f"   ⚪ Negligible memory impact ({mem_delta:+.1f}MB)")
        elif mem_delta > 0:
            print(f"   ❌ COST: Increased memory footprint (+{mem_delta:.1f}MB) - WASM linear memory overhead")
        else:
            print(f"   ✅ Reduced memory usage ({mem_delta:.1f}MB)")
    
    print()

print("="*100)
print("KEY INSIGHT: Performance optimization is a multi-objective problem.")
print("Choose configurations that align with your specific constraints and priorities.")
print("="*100)
====================================================================================================
TRADE-OFF ANALYSIS: What You Gain vs What You Pay
====================================================================================================

⚖️  Fully WASM-enabled configuration
----------------------------------------------------------------------------------------------------
   ✅ GAIN: 57.0% faster execution (speedup: 1.57x)
   ✅ Quality effectively preserved (-0.13%)
   ❌ COST: Reduced smoothness (-33.4% FPS decrease)
   ⚪ Similar responsiveness (-0.6ms, -2.6%)
   ⚪ Negligible memory impact (-1.8MB)

⚖️  Configuration incorporating Distance
----------------------------------------------------------------------------------------------------
   ⚪ NEUTRAL: Similar execution time to baseline (speedup: 1.00x)
   ✅ Quality effectively preserved (+0.03%)
   ⚪ Similar UI smoothness (-0.0% FPS)
   ⚪ Similar responsiveness (+0.0ms, +0.2%)
   ⚪ Negligible memory impact (-2.2MB)

⚖️  Configuration incorporating Matrix
----------------------------------------------------------------------------------------------------
   ⚪ NEUTRAL: Similar execution time to baseline (speedup: 1.03x)
   ✅ Quality effectively preserved (+0.09%)
   ⚪ Similar UI smoothness (+0.1% FPS)
   ⚪ Similar responsiveness (-2.0ms, -9.2%)
   ⚪ Negligible memory impact (-3.8MB)

⚖️  Configuration incorporating NN Descent
----------------------------------------------------------------------------------------------------
   ⚪ NEUTRAL: Similar execution time to baseline (speedup: 1.01x)
   ✅ Quality effectively preserved (+0.11%)
   ⚪ Similar UI smoothness (-0.8% FPS)
   ⚪ Similar responsiveness (-1.1ms, -4.9%)
   ⚪ Negligible memory impact (+2.4MB)

⚖️  Configuration incorporating Optimizer
----------------------------------------------------------------------------------------------------
   ✅ GAIN: 49.0% faster execution (speedup: 1.49x)
   ✅ Quality effectively preserved (-0.22%)
   ❌ COST: Reduced smoothness (-31.0% FPS decrease)
   ⚪ Similar responsiveness (+3.2ms, +14.7%)
   ✅ Reduced memory usage (-8.2MB)

⚖️  Configuration incorporating Tree
----------------------------------------------------------------------------------------------------
   ⚪ NEUTRAL: Similar execution time to baseline (speedup: 1.01x)
   ✅ Quality effectively preserved (+0.14%)
   ⚪ Similar UI smoothness (-0.8% FPS)
   ⚪ Similar responsiveness (-0.7ms, -3.1%)
   ⚪ Negligible memory impact (-4.7MB)

====================================================================================================
KEY INSIGHT: Performance optimization is a multi-objective problem.
Choose configurations that align with your specific constraints and priorities.
====================================================================================================

11.4 Decision Framework¶

Use this framework to select the optimal WASM configuration for your use case:

Scenario 1: Research/Scientific Computing¶

  • Priority: Embedding quality > Performance
  • Recommendation: Choose configurations with quality_delta closest to zero
  • Trade-off acceptance: Can tolerate moderate slowdown for quality assurance

Scenario 2: Interactive Data Exploration¶

  • Priority: Responsiveness (FPS + Low Latency) > Raw speed
  • Recommendation: Optimize for FPS and latency metrics
  • Trade-off acceptance: Slightly slower overall execution acceptable if interactions feel smooth

Scenario 3: Batch Processing / Production Pipelines¶

  • Priority: Raw performance (speedup) > Memory
  • Recommendation: Choose highest speedup configuration
  • Trade-off acceptance: Higher memory usage acceptable in server environments

Scenario 4: Resource-Constrained Environments¶

  • Priority: Memory efficiency > Performance
  • Recommendation: Minimize memory delta configurations
  • Trade-off acceptance: Slower execution to stay within memory budgets

Scenario 5: Balanced General Use¶

  • Priority: Composite score across all metrics
  • Recommendation: Use the overall rankings from Section 9
  • Trade-off acceptance: Average performance across dimensions without extreme compromises

11.5 Summary of Key Findings¶

No configuration is universally optimal. The "best" choice depends on:

  1. Your bottleneck: CPU-bound tasks benefit most from configurations incorporating Distance or NN Descent
  2. Dataset scale: Larger datasets see greater benefits from certain configurations
  3. Use context: Interactive vs batch, research vs production, client vs server
  4. Quality tolerance: Whether exact reproducibility matters for your application
  5. Resource constraints: Available memory, target devices, performance budgets

General Guidelines:

  • Fully WASM-enabled configuration configuration often provides best overall speedup but at highest memory cost
  • Configurations incorporating individual components allow fine-grained trade-off control
  • Quality is generally preserved (|Δ| < 1%) across most configurations
  • FPS improvements are dataset and scope-dependent
  • Memory overhead from WASM is consistent and predictable

Statistical Note: Performance differences reported throughout this notebook are based on median values across multiple benchmark runs, with analysis of variance and percentile distributions to ensure robustness.

12. Notebook Summary¶

This notebook provides a comprehensive analysis of UMAP performance with WASM optimizations across multiple dimensions:

Analysis Sections:¶

  1. Baseline Analysis (Section 2.5): Pure JavaScript performance characteristics
  2. Execution Time & Speedup (Section 3): Execution time and performance gains vs baseline
  3. Memory Usage (Section 4): Memory consumption patterns and WASM overhead
  4. Embedding Quality (Section 5): Trustworthiness preservation and quality deltas
  5. Responsiveness (Section 6): FPS, interaction latency, and p50/p95 percentiles
  6. Dataset Size Effects (Section 7): Scaling behavior across small/medium/large datasets
  7. Overall Rankings (Section 9): Composite performance scores and aggregated comparison tables
  8. Final Conclusions (Section 11): Actionable recommendations by use case and dataset size

Key Outputs:¶

  • Figures: 15+ publication-quality visualizations in ../outputs/figures/
  • Tables: CSV and LaTeX tables in ../outputs/tables/ and ../outputs/summaries/
  • Recommendations: Specific guidance for selecting optimal WASM configurations

For Quick Insights:¶

  • Section 11 provides complete recommendations organized by optimization goal and dataset size
  • Section 9 contains the comprehensive aggregated comparison table
  • Section 11.3 explicitly states all trade-offs without claiming any "universal winner"

Main Finding: No single WASM configuration is optimal for all scenarios. Performance optimization requires understanding specific use case requirements and acceptable trade-offs.