Data Access and Manipulation Tutorial#
This tutorial explains how to access and manipulate experimental data beyond the basic run_table_data
view. While run_table_data
provides a convenient DataFrame view of your experiments, itβs important to understand that this is just a view of the data, not the actual data store itself.
For detailed data exploration, custom analysis, and data manipulation, you need to use Mintoβs underlying data access interfaces.
1. Understanding Data Views vs. Data Store#
Letβs start by creating some experimental data and understanding the difference between views and the actual data store.
from minto import Experiment
import numpy as np
import pandas as pd
# Create an experiment with multiple runs
exp = Experiment(
name="data_access_tutorial",
auto_saving=True,
)
# Generate sample data across multiple runs
algorithms = ["SimulatedAnnealing", "QuantumAnnealing", "GeneticAlgorithm"]
temperatures = [0.1, 0.5, 1.0, 2.0]
for i, algorithm in enumerate(algorithms):
for j, temp in enumerate(temperatures):
run = exp.run()
with run:
# Log parameters
run.log_parameter("algorithm", algorithm)
run.log_parameter("temperature", temp)
run.log_parameter("max_iterations", 1000 + i * 100)
run.log_parameter("seed", 42 + i + j)
# Log complex objects (these won't appear in run_table_data)
config_data = {
"solver_config": {
"tolerance": 1e-6,
"max_time": 300,
"parallel": True
},
"problem_metadata": {
"variables": 100 + i * 10,
"constraints": 50 + i * 5,
"density": 0.1 + i * 0.05
}
}
run.log_object("configuration", config_data)
# Simulate optimization results
np.random.seed(42 + i + j)
energy = np.random.uniform(-100, -50) * (1 + temp)
run.log_parameter("final_energy", energy)
# Log solution metadata (simplified for compatibility)
solution_metadata = {
"solution_vector": np.random.choice([0, 1], size=20).tolist(),
"energy": energy,
"is_feasible": True
}
run.log_object("best_solution", solution_metadata)
# Log sample analysis (simplified for compatibility)
sample_analysis = {
"num_samples": 100,
"energy_range": [energy - 10, energy + 5],
"best_energy": energy,
"avg_energy": energy + 2.0
}
run.log_object("sample_analysis", sample_analysis)
print(f"Created experiment with {len(exp.runs)} runs")
[2025-08-01 21:54:05] π Starting experiment 'data_access_tutorial'
[2025-08-01 21:54:05] ββ π Environment: OS: Linux 6.6.93+, CPU: Intel(R) Xeon(R) CPU @ 2.80GHz (4 cores), Memory: 15.6 GB, Python: 3.11.10
[2025-08-01 21:54:05] ββ π Environment Information
[2025-08-01 21:54:05] ββ OS: Linux 6.6.93+
[2025-08-01 21:54:05] ββ Platform: Linux-6.6.93+-x86_64-with-glibc2.35
[2025-08-01 21:54:05] ββ CPU: Intel(R) Xeon(R) CPU @ 2.80GHz (4 cores)
[2025-08-01 21:54:05] ββ Memory: 15.6 GB
[2025-08-01 21:54:05] ββ Architecture: x86_64
[2025-08-01 21:54:05] ββ Python: 3.11.10
[2025-08-01 21:54:05] ββ Key Package Versions:
[2025-08-01 21:54:05] ββ π Created run #0
[2025-08-01 21:54:05] ββ π Parameter: algorithm = SimulatedAnnealing
[2025-08-01 21:54:05] ββ π Parameter: temperature = 0.1
[2025-08-01 21:54:05] ββ π Parameter: max_iterations = 1000
[2025-08-01 21:54:05] ββ π Parameter: seed = 42
[2025-08-01 21:54:05] ββ π Object 'configuration' (dict): 2 keys
[2025-08-01 21:54:05] ββ π Parameter: final_energy = -89.40029346339507
[2025-08-01 21:54:05] ββ π Object 'best_solution' (dict): 3 keys
[2025-08-01 21:54:05] ββ π Object 'sample_analysis' (dict): 4 keys
[2025-08-01 21:54:05] ββ β
Run #0 completed (0.0s)
[2025-08-01 21:54:05] ββ π Created run #1
[2025-08-01 21:54:05] ββ π Parameter: algorithm = SimulatedAnnealing
[2025-08-01 21:54:05] ββ π Parameter: temperature = 0.5
[2025-08-01 21:54:05] ββ π Parameter: max_iterations = 1000
[2025-08-01 21:54:05] ββ π Parameter: seed = 43
[2025-08-01 21:54:05] ββ π Object 'configuration' (dict): 2 keys
[2025-08-01 21:54:05] ββ π Parameter: final_energy = -141.37090752076656
[2025-08-01 21:54:05] ββ π Object 'best_solution' (dict): 3 keys
[2025-08-01 21:54:05] ββ π Object 'sample_analysis' (dict): 4 keys
[2025-08-01 21:54:05] ββ β
Run #1 completed (0.1s)
[2025-08-01 21:54:05] ββ π Created run #2
[2025-08-01 21:54:05] ββ π Parameter: algorithm = SimulatedAnnealing
[2025-08-01 21:54:05] ββ π Parameter: temperature = 1.0
[2025-08-01 21:54:05] ββ π Parameter: max_iterations = 1000
[2025-08-01 21:54:05] ββ π Parameter: seed = 44
[2025-08-01 21:54:05] ββ π Object 'configuration' (dict): 2 keys
[2025-08-01 21:54:05] ββ π Parameter: final_energy = -116.51578513343506
[2025-08-01 21:54:05] ββ π Object 'best_solution' (dict): 3 keys
[2025-08-01 21:54:05] ββ π Object 'sample_analysis' (dict): 4 keys
[2025-08-01 21:54:05] ββ β
Run #2 completed (0.0s)
[2025-08-01 21:54:05] ββ π Created run #3
[2025-08-01 21:54:05] ββ π Parameter: algorithm = SimulatedAnnealing
[2025-08-01 21:54:05] ββ π Parameter: temperature = 2.0
[2025-08-01 21:54:05] ββ π Parameter: max_iterations = 1000
[2025-08-01 21:54:05] ββ π Parameter: seed = 45
[2025-08-01 21:54:05] ββ π Object 'configuration' (dict): 2 keys
[2025-08-01 21:54:05] ββ π Parameter: final_energy = -151.64827297865997
[2025-08-01 21:54:05] ββ π Object 'best_solution' (dict): 3 keys
[2025-08-01 21:54:05] ββ π Object 'sample_analysis' (dict): 4 keys
[2025-08-01 21:54:05] ββ β
Run #3 completed (0.0s)
[2025-08-01 21:54:05] ββ π Created run #4
[2025-08-01 21:54:05] ββ π Parameter: algorithm = QuantumAnnealing
[2025-08-01 21:54:05] ββ π Parameter: temperature = 0.1
[2025-08-01 21:54:05] ββ π Parameter: max_iterations = 1100
[2025-08-01 21:54:05] ββ π Parameter: seed = 43
[2025-08-01 21:54:05] ββ π Object 'configuration' (dict): 2 keys
[2025-08-01 21:54:05] ββ π Parameter: final_energy = -103.67199884856215
[2025-08-01 21:54:05] ββ π Object 'best_solution' (dict): 3 keys
[2025-08-01 21:54:05] ββ π Object 'sample_analysis' (dict): 4 keys
[2025-08-01 21:54:05] ββ β
Run #4 completed (0.0s)
[2025-08-01 21:54:05] ββ π Created run #5
[2025-08-01 21:54:05] ββ π Parameter: algorithm = QuantumAnnealing
[2025-08-01 21:54:05] ββ π Parameter: temperature = 0.5
[2025-08-01 21:54:05] ββ π Parameter: max_iterations = 1100
[2025-08-01 21:54:05] ββ π Parameter: seed = 44
[2025-08-01 21:54:05] ββ π Object 'configuration' (dict): 2 keys
[2025-08-01 21:54:05] ββ π Parameter: final_energy = -87.38683885007629
[2025-08-01 21:54:05] ββ π Object 'best_solution' (dict): 3 keys
[2025-08-01 21:54:05] ββ π Object 'sample_analysis' (dict): 4 keys
[2025-08-01 21:54:05] ββ β
Run #5 completed (0.0s)
[2025-08-01 21:54:05] ββ π Created run #6
[2025-08-01 21:54:05] ββ π Parameter: algorithm = QuantumAnnealing
[2025-08-01 21:54:05] ββ π Parameter: temperature = 1.0
[2025-08-01 21:54:05] ββ π Parameter: max_iterations = 1100
[2025-08-01 21:54:05] ββ π Parameter: seed = 45
[2025-08-01 21:54:05] ββ π Object 'configuration' (dict): 2 keys
[2025-08-01 21:54:05] ββ π Parameter: final_energy = -101.09884865243998
[2025-08-01 21:54:05] ββ π Object 'best_solution' (dict): 3 keys
[2025-08-01 21:54:05] ββ π Object 'sample_analysis' (dict): 4 keys
[2025-08-01 21:54:05] ββ β
Run #6 completed (0.1s)
[2025-08-01 21:54:05] ββ π Created run #7
[2025-08-01 21:54:05] ββ π Parameter: algorithm = QuantumAnnealing
[2025-08-01 21:54:05] ββ π Parameter: temperature = 2.0
[2025-08-01 21:54:05] ββ π Parameter: max_iterations = 1100
[2025-08-01 21:54:05] ββ π Parameter: seed = 46
[2025-08-01 21:54:05] ββ π Object 'configuration' (dict): 2 keys
[2025-08-01 21:54:05] ββ π Parameter: final_energy = -182.4251473791374
[2025-08-01 21:54:05] ββ π Object 'best_solution' (dict): 3 keys
[2025-08-01 21:54:05] ββ π Object 'sample_analysis' (dict): 4 keys
[2025-08-01 21:54:05] ββ β
Run #7 completed (0.0s)
[2025-08-01 21:54:05] ββ π Created run #8
[2025-08-01 21:54:05] ββ π Parameter: algorithm = GeneticAlgorithm
[2025-08-01 21:54:05] ββ π Parameter: temperature = 0.1
[2025-08-01 21:54:05] ββ π Parameter: max_iterations = 1200
[2025-08-01 21:54:05] ββ π Parameter: seed = 44
[2025-08-01 21:54:05] ββ π Object 'configuration' (dict): 2 keys
[2025-08-01 21:54:05] ββ π Parameter: final_energy = -64.08368182338928
[2025-08-01 21:54:05] ββ π Object 'best_solution' (dict): 3 keys
[2025-08-01 21:54:05] ββ π Object 'sample_analysis' (dict): 4 keys
[2025-08-01 21:54:05] ββ β
Run #8 completed (0.0s)
[2025-08-01 21:54:05] ββ π Created run #9
[2025-08-01 21:54:05] ββ π Parameter: algorithm = GeneticAlgorithm
[2025-08-01 21:54:05] ββ π Parameter: temperature = 0.5
[2025-08-01 21:54:05] ββ π Parameter: max_iterations = 1200
[2025-08-01 21:54:05] ββ π Parameter: seed = 45
[2025-08-01 21:54:05] ββ π Object 'configuration' (dict): 2 keys
[2025-08-01 21:54:05] ββ π Parameter: final_energy = -75.82413648932999
[2025-08-01 21:54:05] ββ π Object 'best_solution' (dict): 3 keys
[2025-08-01 21:54:05] ββ π Object 'sample_analysis' (dict): 4 keys
[2025-08-01 21:54:05] ββ β
Run #9 completed (0.0s)
[2025-08-01 21:54:05] ββ π Created run #10
[2025-08-01 21:54:05] ββ π Parameter: algorithm = GeneticAlgorithm
[2025-08-01 21:54:05] ββ π Parameter: temperature = 1.0
[2025-08-01 21:54:05] ββ π Parameter: max_iterations = 1200
[2025-08-01 21:54:05] ββ π Parameter: seed = 46
[2025-08-01 21:54:05] ββ π Object 'configuration' (dict): 2 keys
[2025-08-01 21:54:05] ββ π Parameter: final_energy = -121.61676491942494
[2025-08-01 21:54:05] ββ π Object 'best_solution' (dict): 3 keys
[2025-08-01 21:54:05] ββ π Object 'sample_analysis' (dict): 4 keys
[2025-08-01 21:54:05] ββ β
Run #10 completed (0.1s)
[2025-08-01 21:54:05] ββ π Created run #11
[2025-08-01 21:54:05] ββ π Parameter: algorithm = GeneticAlgorithm
[2025-08-01 21:54:05] ββ π Parameter: temperature = 2.0
[2025-08-01 21:54:05] ββ π Parameter: max_iterations = 1200
[2025-08-01 21:54:05] ββ π Parameter: seed = 47
[2025-08-01 21:54:05] ββ π Object 'configuration' (dict): 2 keys
[2025-08-01 21:54:05] ββ π Parameter: final_energy = -282.97672921595256
[2025-08-01 21:54:05] ββ π Object 'best_solution' (dict): 3 keys
[2025-08-01 21:54:05] ββ π Object 'sample_analysis' (dict): 4 keys
[2025-08-01 21:54:05] ββ β
Run #11 completed (0.0s)
Created experiment with 12 runs
2. The run_table_data View#
First, letβs see what the standard run_table_data
view shows us:
# Get the standard table view
table_view = exp.get_run_table()
print("Standard run_table_data view:")
print(f"Shape: {table_view.shape}")
print(f"Columns: {list(table_view.columns)}")
print("\nFirst few rows:")
display(table_view.head())
print("\nβ οΈ Notice: Complex objects (configuration, best_solution, all_samples) are not visible in this view!")
Standard run_table_data view:
Shape: (12, 7)
Columns: [('parameter', 'algorithm'), ('parameter', 'temperature'), ('parameter', 'max_iterations'), ('parameter', 'seed'), ('parameter', 'final_energy'), ('metadata', 'run_id'), ('metadata', 'elapsed_time')]
First few rows:
parameter | metadata | ||||||
---|---|---|---|---|---|---|---|
algorithm | temperature | max_iterations | seed | final_energy | run_id | elapsed_time | |
run_id | |||||||
0 | SimulatedAnnealing | 0.1 | 1000 | 42 | -89.400293 | 0 | 0.006421 |
1 | SimulatedAnnealing | 0.5 | 1000 | 43 | -141.370908 | 1 | 0.058560 |
2 | SimulatedAnnealing | 1.0 | 1000 | 44 | -116.515785 | 2 | 0.006171 |
3 | SimulatedAnnealing | 2.0 | 1000 | 45 | -151.648273 | 3 | 0.005079 |
4 | QuantumAnnealing | 0.1 | 1100 | 43 | -103.671999 | 4 | 0.005783 |
β οΈ Notice: Complex objects (configuration, best_solution, all_samples) are not visible in this view!
3. Accessing Individual Run Data#
To access the complete data for each run, including complex objects, you need to work with individual run objects:
# Get all runs
runs = exp.runs
print(f"Total runs available: {len(runs)}")
# Access data from the first run
first_run = runs[0]
print("\nRun Index: 0") # DataStore doesn't have run_id, use index instead
# Access parameters (same as in table view)
print("\n=== Parameters ===")
for key, value in first_run.parameters.items():
print(f"{key}: {value}")
# Access objects (NOT visible in table view)
print("\n=== Objects (not in table view) ===")
for key, value in first_run.objects.items():
print(f"{key}: {type(value)} - {str(value)[:100]}...")
print("\nβ οΈ Note: In the actual Minto API, exp.runs returns DataStore objects")
print(" DataStore contains parameters, objects, solutions, and samplesets")
print(" But no direct run_id - use list index instead")
Total runs available: 12
Run Index: 0
=== Parameters ===
algorithm: SimulatedAnnealing
temperature: 0.1
max_iterations: 1000
seed: 42
final_energy: -89.40029346339507
=== Objects (not in table view) ===
configuration: <class 'dict'> - {'solver_config': {'tolerance': 1e-06, 'max_time': 300, 'parallel': True}, 'problem_metadata': {'var...
best_solution: <class 'dict'> - {'solution_vector': [0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0], 'energy': -89.4002...
sample_analysis: <class 'dict'> - {'num_samples': 100, 'energy_range': [-99.40029346339507, -84.40029346339507], 'best_energy': -89.40...
β οΈ Note: In the actual Minto API, exp.runs returns DataStore objects
DataStore contains parameters, objects, solutions, and samplesets
But no direct run_id - use list index instead
4. Building Custom Data Views#
You can create custom DataFrame views that include the data you need:
def create_custom_dataframe(experiment):
"""Create a custom DataFrame with additional data not in run_table_data."""
data = []
for i, run_datastore in enumerate(experiment.runs):
row = {
"run_index": i, # Use index since DataStore doesn't have run_id
# Basic parameters
"algorithm": run_datastore.parameters.get("algorithm"),
"temperature": run_datastore.parameters.get("temperature"),
"final_energy": run_datastore.parameters.get("final_energy"),
# Extract nested configuration data
"solver_tolerance": run_datastore.objects.get("configuration", {}).get("solver_config", {}).get("tolerance"),
"problem_variables": run_datastore.objects.get("configuration", {}).get("problem_metadata", {}).get("variables"),
"problem_density": run_datastore.objects.get("configuration", {}).get("problem_metadata", {}).get("density"),
# Solution analysis (from objects, not solutions)
"solution_length": len(run_datastore.objects.get("best_solution", {}).get("solution_vector", [])),
"solution_ones_count": sum(run_datastore.objects.get("best_solution", {}).get("solution_vector", [])),
# Sample analysis (from objects)
"num_samples": run_datastore.objects.get("sample_analysis", {}).get("num_samples", 0),
"best_energy": run_datastore.objects.get("sample_analysis", {}).get("best_energy", 0),
}
data.append(row)
return pd.DataFrame(data)
# Create custom view
custom_df = create_custom_dataframe(exp)
print("Custom DataFrame with extracted nested data:")
print(f"Shape: {custom_df.shape}")
print(f"Columns: {list(custom_df.columns)}")
display(custom_df.head())
Custom DataFrame with extracted nested data:
Shape: (12, 11)
Columns: ['run_index', 'algorithm', 'temperature', 'final_energy', 'solver_tolerance', 'problem_variables', 'problem_density', 'solution_length', 'solution_ones_count', 'num_samples', 'best_energy']
run_index | algorithm | temperature | final_energy | solver_tolerance | problem_variables | problem_density | solution_length | solution_ones_count | num_samples | best_energy | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | SimulatedAnnealing | 0.1 | -89.400293 | 0.000001 | 100 | 0.10 | 20 | 7 | 100 | -89.400293 |
1 | 1 | SimulatedAnnealing | 0.5 | -141.370908 | 0.000001 | 100 | 0.10 | 20 | 13 | 100 | -141.370908 |
2 | 2 | SimulatedAnnealing | 1.0 | -116.515785 | 0.000001 | 100 | 0.10 | 20 | 14 | 100 | -116.515785 |
3 | 3 | SimulatedAnnealing | 2.0 | -151.648273 | 0.000001 | 100 | 0.10 | 20 | 11 | 100 | -151.648273 |
4 | 4 | QuantumAnnealing | 0.1 | -103.671999 | 0.000001 | 110 | 0.15 | 20 | 13 | 100 | -103.671999 |
5. Filtering and Querying Runs#
You can filter runs based on complex criteria that arenβt possible with the simple table view:
def filter_runs_by_criteria(experiment, **criteria):
"""Filter runs based on complex criteria."""
filtered_runs = []
for i, run_datastore in enumerate(experiment.runs):
include_run = True
# Check parameter criteria
if "algorithm" in criteria:
if run_datastore.parameters.get("algorithm") != criteria["algorithm"]:
include_run = False
if "min_energy" in criteria:
if run_datastore.parameters.get("final_energy", float("inf")) > criteria["min_energy"]:
include_run = False
# Check object criteria (nested data)
if "min_variables" in criteria:
variables = run_datastore.objects.get("configuration", {}).get("problem_metadata", {}).get("variables", 0)
if variables < criteria["min_variables"]:
include_run = False
# Check solution criteria (from objects)
if "min_solution_density" in criteria:
solution_vector = run_datastore.objects.get("best_solution", {}).get("solution_vector", [])
if solution_vector:
density = sum(solution_vector) / len(solution_vector)
if density < criteria["min_solution_density"]:
include_run = False
if include_run:
filtered_runs.append((i, run_datastore)) # Return both index and datastore
return filtered_runs
# Example: Find SimulatedAnnealing runs with good energy and high variable count
filtered = filter_runs_by_criteria(
exp,
algorithm="SimulatedAnnealing",
min_energy=-120,
min_variables=105,
min_solution_density=0.3
)
print(f"Found {len(filtered)} runs matching complex criteria:")
for i, run_datastore in filtered:
print(f" Run {i}: {run_datastore.parameters.get('algorithm')} "
f"(energy: {run_datastore.parameters.get('final_energy'):.2f}, "
f"vars: {run_datastore.objects.get('configuration', {}).get('problem_metadata', {}).get('variables')})")
Found 0 runs matching complex criteria:
6. Analyzing Complex Data Structures#
For detailed analysis of solutions, samplesets, and other complex data:
def analyze_solutions(experiment):
"""Perform detailed analysis on solution data."""
solution_analysis = []
for i, run_datastore in enumerate(experiment.runs):
solution_data = run_datastore.objects.get("best_solution", {})
solution_vector = solution_data.get("solution_vector", [])
if solution_vector:
analysis = {
"run_index": i,
"algorithm": run_datastore.parameters.get("algorithm"),
"solution_length": len(solution_vector),
"ones_count": sum(solution_vector),
"zeros_count": len(solution_vector) - sum(solution_vector),
"density": sum(solution_vector) / len(solution_vector),
"energy": run_datastore.parameters.get("final_energy"),
# Pattern analysis
"alternating_pattern": sum(1 for j in range(len(solution_vector)-1)
if solution_vector[j] != solution_vector[j+1]),
"longest_run_of_ones": max(len(list(g)) for k, g in
__import__("itertools").groupby(solution_vector) if k == 1) if 1 in solution_vector else 0,
}
solution_analysis.append(analysis)
return pd.DataFrame(solution_analysis)
def analyze_sample_data(experiment):
"""Perform statistical analysis on sample data."""
sample_stats = []
for i, run_datastore in enumerate(experiment.runs):
sample_analysis = run_datastore.objects.get("sample_analysis", {})
if sample_analysis:
stats = {
"run_index": i,
"algorithm": run_datastore.parameters.get("algorithm"),
"num_samples": sample_analysis.get("num_samples", 0),
"best_energy": sample_analysis.get("best_energy", 0),
"avg_energy": sample_analysis.get("avg_energy", 0),
"energy_range": sample_analysis.get("energy_range", [0, 0]),
"energy_span": (sample_analysis.get("energy_range", [0, 0])[1] -
sample_analysis.get("energy_range", [0, 0])[0]),
}
sample_stats.append(stats)
return pd.DataFrame(sample_stats)
# Perform analyses
solution_df = analyze_solutions(exp)
sample_df = analyze_sample_data(exp)
print("Solution Analysis:")
display(solution_df.head())
print("\nSample Data Analysis:")
display(sample_df.head())
# Correlation analysis
if not solution_df.empty:
print("\nCorrelation between solution density and energy:")
correlation = solution_df[["density", "energy"]].corr()
print(correlation)
Solution Analysis:
run_index | algorithm | solution_length | ones_count | zeros_count | density | energy | alternating_pattern | longest_run_of_ones | |
---|---|---|---|---|---|---|---|---|---|
0 | 0 | SimulatedAnnealing | 20 | 7 | 13 | 0.35 | -89.400293 | 10 | 3 |
1 | 1 | SimulatedAnnealing | 20 | 13 | 7 | 0.65 | -141.370908 | 8 | 4 |
2 | 2 | SimulatedAnnealing | 20 | 14 | 6 | 0.70 | -116.515785 | 8 | 5 |
3 | 3 | SimulatedAnnealing | 20 | 11 | 9 | 0.55 | -151.648273 | 11 | 4 |
4 | 4 | QuantumAnnealing | 20 | 13 | 7 | 0.65 | -103.671999 | 8 | 4 |
Sample Data Analysis:
run_index | algorithm | num_samples | best_energy | avg_energy | energy_range | energy_span | |
---|---|---|---|---|---|---|---|
0 | 0 | SimulatedAnnealing | 100 | -89.400293 | -87.400293 | [-99.40029346339507, -84.40029346339507] | 15.0 |
1 | 1 | SimulatedAnnealing | 100 | -141.370908 | -139.370908 | [-151.37090752076656, -136.37090752076656] | 15.0 |
2 | 2 | SimulatedAnnealing | 100 | -116.515785 | -114.515785 | [-126.51578513343506, -111.51578513343506] | 15.0 |
3 | 3 | SimulatedAnnealing | 100 | -151.648273 | -149.648273 | [-161.64827297865997, -146.64827297865997] | 15.0 |
4 | 4 | QuantumAnnealing | 100 | -103.671999 | -101.671999 | [-113.67199884856215, -98.67199884856215] | 15.0 |
Correlation between solution density and energy:
density energy
density 1.000000 -0.005797
energy -0.005797 1.000000
7. Modifying and Updating Run Data#
You can also modify existing run data or add new computed metrics:
print("β οΈ Note: This example shows conceptual data modification.")
print("In practice, DataStore objects from exp.runs are read-only views.")
print("To add computed metrics, you'd need to work with active Run objects during execution.")
def analyze_existing_data(experiment):
"""Analyze existing data and compute derived metrics."""
analysis_results = []
for i, run_datastore in enumerate(experiment.runs):
# Compute quality score based on energy and solution properties
energy = run_datastore.parameters.get("final_energy", 0)
solution_data = run_datastore.objects.get("best_solution", {})
solution_vector = solution_data.get("solution_vector", [])
if solution_vector:
density = sum(solution_vector) / len(solution_vector)
# Custom quality metric (example)
quality_score = abs(energy) * (1 - abs(density - 0.5)) # Prefer balanced solutions with low energy
# Collect analysis
analysis = {
"run_index": i,
"algorithm": run_datastore.parameters.get("algorithm"),
"temperature": run_datastore.parameters.get("temperature"),
"energy": energy,
"solution_density": density,
"quality_score": quality_score,
"balance_score": 1 - abs(density - 0.5) * 2, # 1 = perfectly balanced
"entropy": -sum(p * np.log2(p) for p in [density, 1-density] if p > 0),
}
analysis_results.append(analysis)
# Add runtime analysis
algorithm = run_datastore.parameters.get("algorithm")
temperature = run_datastore.parameters.get("temperature")
if algorithm and temperature:
# Simulated performance prediction (example)
predicted_runtime = {
"SimulatedAnnealing": 1.0 / temperature,
"QuantumAnnealing": 0.5 / temperature,
"GeneticAlgorithm": 2.0 / temperature
}.get(algorithm, 1.0)
analysis_results[-1]["predicted_runtime"] = predicted_runtime
return pd.DataFrame(analysis_results)
# Analyze existing data
analysis_df = analyze_existing_data(exp)
print("Analysis with computed metrics:")
display(analysis_df[["algorithm", "temperature", "energy", "quality_score", "predicted_runtime"]].head())
β οΈ Note: This example shows conceptual data modification.
In practice, DataStore objects from exp.runs are read-only views.
To add computed metrics, you'd need to work with active Run objects during execution.
Analysis with computed metrics:
algorithm | temperature | energy | quality_score | predicted_runtime | |
---|---|---|---|---|---|
0 | SimulatedAnnealing | 0.1 | -89.400293 | 75.990249 | 10.0 |
1 | SimulatedAnnealing | 0.5 | -141.370908 | 120.165271 | 2.0 |
2 | SimulatedAnnealing | 1.0 | -116.515785 | 93.212628 | 1.0 |
3 | SimulatedAnnealing | 2.0 | -151.648273 | 144.065859 | 0.5 |
4 | QuantumAnnealing | 0.1 | -103.671999 | 88.121199 | 5.0 |
8. Export and Save Processed Data#
Save your processed data for further analysis or sharing:
# Create comprehensive analysis report
def create_analysis_report(experiment):
"""Create a comprehensive analysis report."""
report = {
"experiment_summary": {
"name": experiment.name,
"total_runs": len(experiment.runs),
"algorithms_tested": list(set(run_datastore.parameters.get("algorithm") for run_datastore in experiment.runs)),
"temperature_range": [min(run_datastore.parameters.get("temperature", 0) for run_datastore in experiment.runs),
max(run_datastore.parameters.get("temperature", 0) for run_datastore in experiment.runs)]
},
"performance_by_algorithm": {},
"detailed_run_data": []
}
# Algorithm performance summary
algorithms = set(run_datastore.parameters.get("algorithm") for run_datastore in experiment.runs)
for algorithm in algorithms:
algo_runs = [(i, run_datastore) for i, run_datastore in enumerate(experiment.runs)
if run_datastore.parameters.get("algorithm") == algorithm]
energies = [run_datastore.parameters.get("final_energy", 0) for _, run_datastore in algo_runs]
report["performance_by_algorithm"][algorithm] = {
"run_count": len(algo_runs),
"avg_energy": np.mean(energies),
"best_energy": min(energies),
"energy_std": np.std(energies)
}
# Detailed run data
for i, run_datastore in enumerate(experiment.runs):
run_data = {
"run_index": i,
"parameters": dict(run_datastore.parameters),
"config_summary": {
"variables": run_datastore.objects.get("configuration", {}).get("problem_metadata", {}).get("variables"),
"tolerance": run_datastore.objects.get("configuration", {}).get("solver_config", {}).get("tolerance")
},
"solution_summary": {
"energy": run_datastore.objects.get("best_solution", {}).get("energy"),
"is_feasible": run_datastore.objects.get("best_solution", {}).get("is_feasible")
}
}
report["detailed_run_data"].append(run_data)
return report
# Generate report
analysis_report = create_analysis_report(exp)
print("Analysis Report Summary:")
print(f"Experiment: {analysis_report['experiment_summary']['name']}")
print(f"Total runs: {analysis_report['experiment_summary']['total_runs']}")
print(f"Algorithms tested: {analysis_report['experiment_summary']['algorithms_tested']}")
print("\nPerformance by Algorithm:")
for algo, stats in analysis_report["performance_by_algorithm"].items():
print(f" {algo}:")
print(f" Runs: {stats['run_count']}")
print(f" Avg Energy: {stats['avg_energy']:.2f}")
print(f" Best Energy: {stats['best_energy']:.2f}")
print(f" Energy Std: {stats['energy_std']:.2f}")
# Save processed DataFrames
print("\nπ You can save processed data:")
print("custom_df.to_csv('custom_analysis.csv')")
print("solution_df.to_parquet('solution_analysis.parquet')")
print("analysis_df.to_csv('computed_metrics.csv')")
print("import json; json.dump(analysis_report, open('analysis_report.json', 'w'))")
Analysis Report Summary:
Experiment: data_access_tutorial
Total runs: 12
Algorithms tested: ['GeneticAlgorithm', 'QuantumAnnealing', 'SimulatedAnnealing']
Performance by Algorithm:
GeneticAlgorithm:
Runs: 4
Avg Energy: -136.13
Best Energy: -282.98
Energy Std: 87.47
QuantumAnnealing:
Runs: 4
Avg Energy: -118.65
Best Energy: -182.43
Energy Std: 37.34
SimulatedAnnealing:
Runs: 4
Avg Energy: -124.73
Best Energy: -151.65
Energy Std: 24.07
π You can save processed data:
custom_df.to_csv('custom_analysis.csv')
solution_df.to_parquet('solution_analysis.parquet')
analysis_df.to_csv('computed_metrics.csv')
import json; json.dump(analysis_report, open('analysis_report.json', 'w'))
9. Working with Persistent Data Storage#
Understanding how to work with saved experiments and reload data:
import minto
# Save the current experiment
exp.save()
print(f"Experiment saved to: {exp.savedir}")
# Load experiment from file (simulating a new session)
# Load the saved experiment
loaded_exp = minto.Experiment.load_from_dir(exp.savedir)
print(f"\nLoaded experiment: {loaded_exp.name}")
print(f"Number of runs: {len(loaded_exp.runs)}")
# Verify all data is preserved
first_loaded_run = loaded_exp.runs[0]
print("\nFirst run data preserved:")
print(f" Parameters: {len(first_loaded_run.parameters)} items")
print(f" Objects: {len(first_loaded_run.objects)} items")
# Show that complex objects are fully preserved
config = first_loaded_run.objects.get("configuration", {})
print("\nComplex configuration object preserved:")
print(f" Solver config keys: {list(config.get('solver_config', {}).keys())}")
print(f" Problem metadata keys: {list(config.get('problem_metadata', {}).keys())}")
[2025-08-01 21:54:28] π― Experiment 'data_access_tutorial' completed: 12 runs, total time: 23.0s
Experiment saved to: .minto_experiments/data_access_tutorial_20250801215405
[2025-08-01 21:54:28] π Starting experiment 'data_access_tutorial'
[2025-08-01 21:54:28] ββ π Environment: OS: Linux 6.6.93+, CPU: Intel(R) Xeon(R) CPU @ 2.80GHz (4 cores), Memory: 15.6 GB, Python: 3.11.10
[2025-08-01 21:54:28] ββ π Environment Information
[2025-08-01 21:54:28] ββ OS: Linux 6.6.93+
[2025-08-01 21:54:28] ββ Platform: Linux-6.6.93+-x86_64-with-glibc2.35
[2025-08-01 21:54:28] ββ CPU: Intel(R) Xeon(R) CPU @ 2.80GHz (4 cores)
[2025-08-01 21:54:28] ββ Memory: 15.6 GB
[2025-08-01 21:54:28] ββ Architecture: x86_64
[2025-08-01 21:54:28] ββ Python: 3.11.10
[2025-08-01 21:54:28] ββ Key Package Versions:
Loaded experiment: data_access_tutorial
Number of runs: 12
First run data preserved:
Parameters: 5 items
Objects: 3 items
Complex configuration object preserved:
Solver config keys: ['tolerance', 'max_time', 'parallel']
Problem metadata keys: ['variables', 'constraints', 'density']
Key Takeaways#
π run_table_data is a View, Not the Data#
exp.get_run_table()
provides a convenient DataFrame viewIt only shows parameters as columns
Objects, solutions, and samplesets are not included
Complex nested data structures are not visible
β οΈ Important API Note#
exp.runs
returns a list of DataStore objects (not Run objects)DataStore objects contain:
parameters
,objects
,solutions
,samplesets
DataStore objects do not have
run_id
- use list index for identificationThis is different from active Run objects used during experiment execution
ποΈ Accessing Complete Data#
Use
exp.runs
to access individual DataStore objects (list of DataStore)Each DataStore has:
.parameters
,.objects
,.solutions
,.samplesets
All logged data is preserved, including complex structures
Use list index instead of run_id for DataStore identification
π οΈ Custom Analysis#
Create custom DataFrames for your specific analysis needs
Filter runs based on complex criteria using DataStore objects
Extract and analyze nested data structures from objects
Compute derived metrics in separate analysis DataFrames
πΎ Data Persistence#
All data (parameters, objects, solutions, samplesets) is saved
Reloaded experiments preserve the complete DataStore structure
You can build analysis pipelines that work with saved data
π Best Practices#
Use
run_table_data
for quick overviews and simple parameter analysisAccess individual DataStore objects via
exp.runs
for detailed analysisBuild custom DataFrames that include the specific data you need
Save processed analysis results alongside your experiments
Remember that
exp.runs
returns DataStore objects, not Run objectsUse list indices to identify specific runs since DataStore doesnβt have run_id
This approach gives you the flexibility to perform both quick exploratory analysis and deep, detailed investigations of your optimization experiments using the actual Minto API.