Data Access and Manipulation Tutorial

Data Access and Manipulation Tutorial#

This tutorial explains how to access and manipulate experimental data beyond the basic run_table_data view. While run_table_data provides a convenient DataFrame view of your experiments, it’s important to understand that this is just a view of the data, not the actual data store itself.

For detailed data exploration, custom analysis, and data manipulation, you need to use Minto’s underlying data access interfaces.

1. Understanding Data Views vs. Data Store#

Let’s start by creating some experimental data and understanding the difference between views and the actual data store.

from minto import Experiment
import numpy as np
import pandas as pd

# Create an experiment with multiple runs
exp = Experiment(
    name="data_access_tutorial",
    auto_saving=True,
)

# Generate sample data across multiple runs
algorithms = ["SimulatedAnnealing", "QuantumAnnealing", "GeneticAlgorithm"]
temperatures = [0.1, 0.5, 1.0, 2.0]

for i, algorithm in enumerate(algorithms):
    for j, temp in enumerate(temperatures):
        run = exp.run()
        with run:
            # Log parameters
            run.log_parameter("algorithm", algorithm)
            run.log_parameter("temperature", temp)
            run.log_parameter("max_iterations", 1000 + i * 100)
            run.log_parameter("seed", 42 + i + j)

            # Log complex objects (these won't appear in run_table_data)
            config_data = {
                "solver_config": {
                    "tolerance": 1e-6,
                    "max_time": 300,
                    "parallel": True
                },
                "problem_metadata": {
                    "variables": 100 + i * 10,
                    "constraints": 50 + i * 5,
                    "density": 0.1 + i * 0.05
                }
            }
            run.log_object("configuration", config_data)

            # Simulate optimization results
            np.random.seed(42 + i + j)
            energy = np.random.uniform(-100, -50) * (1 + temp)

            run.log_parameter("final_energy", energy)

            # Log solution metadata (simplified for compatibility)
            solution_metadata = {
                "solution_vector": np.random.choice([0, 1], size=20).tolist(),
                "energy": energy,
                "is_feasible": True
            }
            run.log_object("best_solution", solution_metadata)

            # Log sample analysis (simplified for compatibility)
            sample_analysis = {
                "num_samples": 100,
                "energy_range": [energy - 10, energy + 5],
                "best_energy": energy,
                "avg_energy": energy + 2.0
            }
            run.log_object("sample_analysis", sample_analysis)

print(f"Created experiment with {len(exp.runs)} runs")

[2025-08-01 21:54:05] 🚀 Starting experiment 'data_access_tutorial'
[2025-08-01 21:54:05]   ├─ 📊 Environment: OS: Linux 6.6.93+, CPU: Intel(R) Xeon(R) CPU @ 2.80GHz (4 cores), Memory: 15.6 GB, Python: 3.11.10
[2025-08-01 21:54:05]   ├─ 📊 Environment Information
[2025-08-01 21:54:05]       ├─ OS: Linux 6.6.93+
[2025-08-01 21:54:05]       ├─ Platform: Linux-6.6.93+-x86_64-with-glibc2.35
[2025-08-01 21:54:05]       ├─ CPU: Intel(R) Xeon(R) CPU @ 2.80GHz (4 cores)
[2025-08-01 21:54:05]       ├─ Memory: 15.6 GB
[2025-08-01 21:54:05]       ├─ Architecture: x86_64
[2025-08-01 21:54:05]       ├─ Python: 3.11.10
[2025-08-01 21:54:05]       ├─ Key Package Versions:
[2025-08-01 21:54:05]   ├─ 🏃 Created run #0
[2025-08-01 21:54:05]       ├─ 📝 Parameter: algorithm = SimulatedAnnealing
[2025-08-01 21:54:05]       ├─ 📝 Parameter: temperature = 0.1
[2025-08-01 21:54:05]       ├─ 📝 Parameter: max_iterations = 1000
[2025-08-01 21:54:05]       ├─ 📝 Parameter: seed = 42
[2025-08-01 21:54:05]       ├─ 📝 Object 'configuration' (dict): 2 keys
[2025-08-01 21:54:05]       ├─ 📝 Parameter: final_energy = -89.40029346339507
[2025-08-01 21:54:05]       ├─ 📝 Object 'best_solution' (dict): 3 keys
[2025-08-01 21:54:05]       ├─ 📝 Object 'sample_analysis' (dict): 4 keys
[2025-08-01 21:54:05]   ├─ ✅ Run #0 completed (0.0s)
[2025-08-01 21:54:05]   ├─ 🏃 Created run #1
[2025-08-01 21:54:05]       ├─ 📝 Parameter: algorithm = SimulatedAnnealing
[2025-08-01 21:54:05]       ├─ 📝 Parameter: temperature = 0.5
[2025-08-01 21:54:05]       ├─ 📝 Parameter: max_iterations = 1000
[2025-08-01 21:54:05]       ├─ 📝 Parameter: seed = 43
[2025-08-01 21:54:05]       ├─ 📝 Object 'configuration' (dict): 2 keys
[2025-08-01 21:54:05]       ├─ 📝 Parameter: final_energy = -141.37090752076656
[2025-08-01 21:54:05]       ├─ 📝 Object 'best_solution' (dict): 3 keys
[2025-08-01 21:54:05]       ├─ 📝 Object 'sample_analysis' (dict): 4 keys
[2025-08-01 21:54:05]   ├─ ✅ Run #1 completed (0.1s)
[2025-08-01 21:54:05]   ├─ 🏃 Created run #2
[2025-08-01 21:54:05]       ├─ 📝 Parameter: algorithm = SimulatedAnnealing
[2025-08-01 21:54:05]       ├─ 📝 Parameter: temperature = 1.0
[2025-08-01 21:54:05]       ├─ 📝 Parameter: max_iterations = 1000
[2025-08-01 21:54:05]       ├─ 📝 Parameter: seed = 44
[2025-08-01 21:54:05]       ├─ 📝 Object 'configuration' (dict): 2 keys
[2025-08-01 21:54:05]       ├─ 📝 Parameter: final_energy = -116.51578513343506
[2025-08-01 21:54:05]       ├─ 📝 Object 'best_solution' (dict): 3 keys
[2025-08-01 21:54:05]       ├─ 📝 Object 'sample_analysis' (dict): 4 keys
[2025-08-01 21:54:05]   ├─ ✅ Run #2 completed (0.0s)
[2025-08-01 21:54:05]   ├─ 🏃 Created run #3
[2025-08-01 21:54:05]       ├─ 📝 Parameter: algorithm = SimulatedAnnealing
[2025-08-01 21:54:05]       ├─ 📝 Parameter: temperature = 2.0
[2025-08-01 21:54:05]       ├─ 📝 Parameter: max_iterations = 1000
[2025-08-01 21:54:05]       ├─ 📝 Parameter: seed = 45
[2025-08-01 21:54:05]       ├─ 📝 Object 'configuration' (dict): 2 keys
[2025-08-01 21:54:05]       ├─ 📝 Parameter: final_energy = -151.64827297865997
[2025-08-01 21:54:05]       ├─ 📝 Object 'best_solution' (dict): 3 keys
[2025-08-01 21:54:05]       ├─ 📝 Object 'sample_analysis' (dict): 4 keys
[2025-08-01 21:54:05]   ├─ ✅ Run #3 completed (0.0s)
[2025-08-01 21:54:05]   ├─ 🏃 Created run #4
[2025-08-01 21:54:05]       ├─ 📝 Parameter: algorithm = QuantumAnnealing
[2025-08-01 21:54:05]       ├─ 📝 Parameter: temperature = 0.1
[2025-08-01 21:54:05]       ├─ 📝 Parameter: max_iterations = 1100
[2025-08-01 21:54:05]       ├─ 📝 Parameter: seed = 43
[2025-08-01 21:54:05]       ├─ 📝 Object 'configuration' (dict): 2 keys
[2025-08-01 21:54:05]       ├─ 📝 Parameter: final_energy = -103.67199884856215
[2025-08-01 21:54:05]       ├─ 📝 Object 'best_solution' (dict): 3 keys
[2025-08-01 21:54:05]       ├─ 📝 Object 'sample_analysis' (dict): 4 keys
[2025-08-01 21:54:05]   ├─ ✅ Run #4 completed (0.0s)
[2025-08-01 21:54:05]   ├─ 🏃 Created run #5
[2025-08-01 21:54:05]       ├─ 📝 Parameter: algorithm = QuantumAnnealing
[2025-08-01 21:54:05]       ├─ 📝 Parameter: temperature = 0.5
[2025-08-01 21:54:05]       ├─ 📝 Parameter: max_iterations = 1100
[2025-08-01 21:54:05]       ├─ 📝 Parameter: seed = 44
[2025-08-01 21:54:05]       ├─ 📝 Object 'configuration' (dict): 2 keys
[2025-08-01 21:54:05]       ├─ 📝 Parameter: final_energy = -87.38683885007629
[2025-08-01 21:54:05]       ├─ 📝 Object 'best_solution' (dict): 3 keys
[2025-08-01 21:54:05]       ├─ 📝 Object 'sample_analysis' (dict): 4 keys
[2025-08-01 21:54:05]   ├─ ✅ Run #5 completed (0.0s)
[2025-08-01 21:54:05]   ├─ 🏃 Created run #6
[2025-08-01 21:54:05]       ├─ 📝 Parameter: algorithm = QuantumAnnealing
[2025-08-01 21:54:05]       ├─ 📝 Parameter: temperature = 1.0
[2025-08-01 21:54:05]       ├─ 📝 Parameter: max_iterations = 1100
[2025-08-01 21:54:05]       ├─ 📝 Parameter: seed = 45
[2025-08-01 21:54:05]       ├─ 📝 Object 'configuration' (dict): 2 keys
[2025-08-01 21:54:05]       ├─ 📝 Parameter: final_energy = -101.09884865243998
[2025-08-01 21:54:05]       ├─ 📝 Object 'best_solution' (dict): 3 keys
[2025-08-01 21:54:05]       ├─ 📝 Object 'sample_analysis' (dict): 4 keys
[2025-08-01 21:54:05]   ├─ ✅ Run #6 completed (0.1s)
[2025-08-01 21:54:05]   ├─ 🏃 Created run #7
[2025-08-01 21:54:05]       ├─ 📝 Parameter: algorithm = QuantumAnnealing
[2025-08-01 21:54:05]       ├─ 📝 Parameter: temperature = 2.0
[2025-08-01 21:54:05]       ├─ 📝 Parameter: max_iterations = 1100
[2025-08-01 21:54:05]       ├─ 📝 Parameter: seed = 46
[2025-08-01 21:54:05]       ├─ 📝 Object 'configuration' (dict): 2 keys
[2025-08-01 21:54:05]       ├─ 📝 Parameter: final_energy = -182.4251473791374
[2025-08-01 21:54:05]       ├─ 📝 Object 'best_solution' (dict): 3 keys
[2025-08-01 21:54:05]       ├─ 📝 Object 'sample_analysis' (dict): 4 keys
[2025-08-01 21:54:05]   ├─ ✅ Run #7 completed (0.0s)
[2025-08-01 21:54:05]   ├─ 🏃 Created run #8
[2025-08-01 21:54:05]       ├─ 📝 Parameter: algorithm = GeneticAlgorithm
[2025-08-01 21:54:05]       ├─ 📝 Parameter: temperature = 0.1
[2025-08-01 21:54:05]       ├─ 📝 Parameter: max_iterations = 1200
[2025-08-01 21:54:05]       ├─ 📝 Parameter: seed = 44
[2025-08-01 21:54:05]       ├─ 📝 Object 'configuration' (dict): 2 keys
[2025-08-01 21:54:05]       ├─ 📝 Parameter: final_energy = -64.08368182338928
[2025-08-01 21:54:05]       ├─ 📝 Object 'best_solution' (dict): 3 keys
[2025-08-01 21:54:05]       ├─ 📝 Object 'sample_analysis' (dict): 4 keys
[2025-08-01 21:54:05]   ├─ ✅ Run #8 completed (0.0s)
[2025-08-01 21:54:05]   ├─ 🏃 Created run #9
[2025-08-01 21:54:05]       ├─ 📝 Parameter: algorithm = GeneticAlgorithm
[2025-08-01 21:54:05]       ├─ 📝 Parameter: temperature = 0.5
[2025-08-01 21:54:05]       ├─ 📝 Parameter: max_iterations = 1200
[2025-08-01 21:54:05]       ├─ 📝 Parameter: seed = 45
[2025-08-01 21:54:05]       ├─ 📝 Object 'configuration' (dict): 2 keys
[2025-08-01 21:54:05]       ├─ 📝 Parameter: final_energy = -75.82413648932999
[2025-08-01 21:54:05]       ├─ 📝 Object 'best_solution' (dict): 3 keys
[2025-08-01 21:54:05]       ├─ 📝 Object 'sample_analysis' (dict): 4 keys
[2025-08-01 21:54:05]   ├─ ✅ Run #9 completed (0.0s)
[2025-08-01 21:54:05]   ├─ 🏃 Created run #10
[2025-08-01 21:54:05]       ├─ 📝 Parameter: algorithm = GeneticAlgorithm
[2025-08-01 21:54:05]       ├─ 📝 Parameter: temperature = 1.0
[2025-08-01 21:54:05]       ├─ 📝 Parameter: max_iterations = 1200
[2025-08-01 21:54:05]       ├─ 📝 Parameter: seed = 46
[2025-08-01 21:54:05]       ├─ 📝 Object 'configuration' (dict): 2 keys
[2025-08-01 21:54:05]       ├─ 📝 Parameter: final_energy = -121.61676491942494
[2025-08-01 21:54:05]       ├─ 📝 Object 'best_solution' (dict): 3 keys
[2025-08-01 21:54:05]       ├─ 📝 Object 'sample_analysis' (dict): 4 keys
[2025-08-01 21:54:05]   ├─ ✅ Run #10 completed (0.1s)
[2025-08-01 21:54:05]   ├─ 🏃 Created run #11
[2025-08-01 21:54:05]       ├─ 📝 Parameter: algorithm = GeneticAlgorithm
[2025-08-01 21:54:05]       ├─ 📝 Parameter: temperature = 2.0
[2025-08-01 21:54:05]       ├─ 📝 Parameter: max_iterations = 1200
[2025-08-01 21:54:05]       ├─ 📝 Parameter: seed = 47
[2025-08-01 21:54:05]       ├─ 📝 Object 'configuration' (dict): 2 keys
[2025-08-01 21:54:05]       ├─ 📝 Parameter: final_energy = -282.97672921595256
[2025-08-01 21:54:05]       ├─ 📝 Object 'best_solution' (dict): 3 keys
[2025-08-01 21:54:05]       ├─ 📝 Object 'sample_analysis' (dict): 4 keys
[2025-08-01 21:54:05]   ├─ ✅ Run #11 completed (0.0s)
Created experiment with 12 runs

2. The run_table_data View#

First, let’s see what the standard run_table_data view shows us:

# Get the standard table view
table_view = exp.get_run_table()
print("Standard run_table_data view:")
print(f"Shape: {table_view.shape}")
print(f"Columns: {list(table_view.columns)}")
print("\nFirst few rows:")
display(table_view.head())

print("\n⚠️  Notice: Complex objects (configuration, best_solution, all_samples) are not visible in this view!")

Standard run_table_data view:
Shape: (12, 7)
Columns: [('parameter', 'algorithm'), ('parameter', 'temperature'), ('parameter', 'max_iterations'), ('parameter', 'seed'), ('parameter', 'final_energy'), ('metadata', 'run_id'), ('metadata', 'elapsed_time')]

First few rows:

	parameter					metadata
	algorithm	temperature	max_iterations	seed	final_energy	run_id	elapsed_time
run_id
0	SimulatedAnnealing	0.1	1000	42	-89.400293	0	0.006421
1	SimulatedAnnealing	0.5	1000	43	-141.370908	1	0.058560
2	SimulatedAnnealing	1.0	1000	44	-116.515785	2	0.006171
3	SimulatedAnnealing	2.0	1000	45	-151.648273	3	0.005079
4	QuantumAnnealing	0.1	1100	43	-103.671999	4	0.005783

⚠️  Notice: Complex objects (configuration, best_solution, all_samples) are not visible in this view!

3. Accessing Individual Run Data#

To access the complete data for each run, including complex objects, you need to work with individual run objects:

# Get all runs
runs = exp.runs
print(f"Total runs available: {len(runs)}")

# Access data from the first run
first_run = runs[0]
print("\nRun Index: 0")  # DataStore doesn't have run_id, use index instead

# Access parameters (same as in table view)
print("\n=== Parameters ===")
for key, value in first_run.parameters.items():
    print(f"{key}: {value}")

# Access objects (NOT visible in table view)
print("\n=== Objects (not in table view) ===")
for key, value in first_run.objects.items():
    print(f"{key}: {type(value)} - {str(value)[:100]}...")

print("\n⚠️  Note: In the actual Minto API, exp.runs returns DataStore objects")
print("    DataStore contains parameters, objects, solutions, and samplesets")
print("    But no direct run_id - use list index instead")

Total runs available: 12

Run Index: 0

=== Parameters ===
algorithm: SimulatedAnnealing
temperature: 0.1
max_iterations: 1000
seed: 42
final_energy: -89.40029346339507

=== Objects (not in table view) ===
configuration: <class 'dict'> - {'solver_config': {'tolerance': 1e-06, 'max_time': 300, 'parallel': True}, 'problem_metadata': {'var...
best_solution: <class 'dict'> - {'solution_vector': [0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0], 'energy': -89.4002...
sample_analysis: <class 'dict'> - {'num_samples': 100, 'energy_range': [-99.40029346339507, -84.40029346339507], 'best_energy': -89.40...

⚠️  Note: In the actual Minto API, exp.runs returns DataStore objects
    DataStore contains parameters, objects, solutions, and samplesets
    But no direct run_id - use list index instead

4. Building Custom Data Views#

You can create custom DataFrame views that include the data you need:

def create_custom_dataframe(experiment):
    """Create a custom DataFrame with additional data not in run_table_data."""
    data = []

    for i, run_datastore in enumerate(experiment.runs):
        row = {
            "run_index": i,  # Use index since DataStore doesn't have run_id
            # Basic parameters
            "algorithm": run_datastore.parameters.get("algorithm"),
            "temperature": run_datastore.parameters.get("temperature"),
            "final_energy": run_datastore.parameters.get("final_energy"),

            # Extract nested configuration data
            "solver_tolerance": run_datastore.objects.get("configuration", {}).get("solver_config", {}).get("tolerance"),
            "problem_variables": run_datastore.objects.get("configuration", {}).get("problem_metadata", {}).get("variables"),
            "problem_density": run_datastore.objects.get("configuration", {}).get("problem_metadata", {}).get("density"),

            # Solution analysis (from objects, not solutions)
            "solution_length": len(run_datastore.objects.get("best_solution", {}).get("solution_vector", [])),
            "solution_ones_count": sum(run_datastore.objects.get("best_solution", {}).get("solution_vector", [])),

            # Sample analysis (from objects)
            "num_samples": run_datastore.objects.get("sample_analysis", {}).get("num_samples", 0),
            "best_energy": run_datastore.objects.get("sample_analysis", {}).get("best_energy", 0),
        }
        data.append(row)

    return pd.DataFrame(data)

# Create custom view
custom_df = create_custom_dataframe(exp)
print("Custom DataFrame with extracted nested data:")
print(f"Shape: {custom_df.shape}")
print(f"Columns: {list(custom_df.columns)}")
display(custom_df.head())

Custom DataFrame with extracted nested data:
Shape: (12, 11)
Columns: ['run_index', 'algorithm', 'temperature', 'final_energy', 'solver_tolerance', 'problem_variables', 'problem_density', 'solution_length', 'solution_ones_count', 'num_samples', 'best_energy']

	run_index	algorithm	temperature	final_energy	solver_tolerance	problem_variables	problem_density	solution_length	solution_ones_count	num_samples	best_energy
0	0	SimulatedAnnealing	0.1	-89.400293	0.000001	100	0.10	20	7	100	-89.400293
1	1	SimulatedAnnealing	0.5	-141.370908	0.000001	100	0.10	20	13	100	-141.370908
2	2	SimulatedAnnealing	1.0	-116.515785	0.000001	100	0.10	20	14	100	-116.515785
3	3	SimulatedAnnealing	2.0	-151.648273	0.000001	100	0.10	20	11	100	-151.648273
4	4	QuantumAnnealing	0.1	-103.671999	0.000001	110	0.15	20	13	100	-103.671999

5. Filtering and Querying Runs#

You can filter runs based on complex criteria that aren’t possible with the simple table view:

def filter_runs_by_criteria(experiment, **criteria):
    """Filter runs based on complex criteria."""
    filtered_runs = []

    for i, run_datastore in enumerate(experiment.runs):
        include_run = True

        # Check parameter criteria
        if "algorithm" in criteria:
            if run_datastore.parameters.get("algorithm") != criteria["algorithm"]:
                include_run = False

        if "min_energy" in criteria:
            if run_datastore.parameters.get("final_energy", float("inf")) > criteria["min_energy"]:
                include_run = False

        # Check object criteria (nested data)
        if "min_variables" in criteria:
            variables = run_datastore.objects.get("configuration", {}).get("problem_metadata", {}).get("variables", 0)
            if variables < criteria["min_variables"]:
                include_run = False

        # Check solution criteria (from objects)
        if "min_solution_density" in criteria:
            solution_vector = run_datastore.objects.get("best_solution", {}).get("solution_vector", [])
            if solution_vector:
                density = sum(solution_vector) / len(solution_vector)
                if density < criteria["min_solution_density"]:
                    include_run = False

        if include_run:
            filtered_runs.append((i, run_datastore))  # Return both index and datastore

    return filtered_runs

# Example: Find SimulatedAnnealing runs with good energy and high variable count
filtered = filter_runs_by_criteria(
    exp,
    algorithm="SimulatedAnnealing",
    min_energy=-120,
    min_variables=105,
    min_solution_density=0.3
)

print(f"Found {len(filtered)} runs matching complex criteria:")
for i, run_datastore in filtered:
    print(f"  Run {i}: {run_datastore.parameters.get('algorithm')} "
          f"(energy: {run_datastore.parameters.get('final_energy'):.2f}, "
          f"vars: {run_datastore.objects.get('configuration', {}).get('problem_metadata', {}).get('variables')})")

Found 0 runs matching complex criteria:

6. Analyzing Complex Data Structures#

For detailed analysis of solutions, samplesets, and other complex data:

def analyze_solutions(experiment):
    """Perform detailed analysis on solution data."""
    solution_analysis = []

    for i, run_datastore in enumerate(experiment.runs):
        solution_data = run_datastore.objects.get("best_solution", {})
        solution_vector = solution_data.get("solution_vector", [])

        if solution_vector:
            analysis = {
                "run_index": i,
                "algorithm": run_datastore.parameters.get("algorithm"),
                "solution_length": len(solution_vector),
                "ones_count": sum(solution_vector),
                "zeros_count": len(solution_vector) - sum(solution_vector),
                "density": sum(solution_vector) / len(solution_vector),
                "energy": run_datastore.parameters.get("final_energy"),
                # Pattern analysis
                "alternating_pattern": sum(1 for j in range(len(solution_vector)-1)
                                         if solution_vector[j] != solution_vector[j+1]),
                "longest_run_of_ones": max(len(list(g)) for k, g in
                                         __import__("itertools").groupby(solution_vector) if k == 1) if 1 in solution_vector else 0,
            }
            solution_analysis.append(analysis)

    return pd.DataFrame(solution_analysis)

def analyze_sample_data(experiment):
    """Perform statistical analysis on sample data."""
    sample_stats = []

    for i, run_datastore in enumerate(experiment.runs):
        sample_analysis = run_datastore.objects.get("sample_analysis", {})
        if sample_analysis:
            stats = {
                "run_index": i,
                "algorithm": run_datastore.parameters.get("algorithm"),
                "num_samples": sample_analysis.get("num_samples", 0),
                "best_energy": sample_analysis.get("best_energy", 0),
                "avg_energy": sample_analysis.get("avg_energy", 0),
                "energy_range": sample_analysis.get("energy_range", [0, 0]),
                "energy_span": (sample_analysis.get("energy_range", [0, 0])[1] -
                              sample_analysis.get("energy_range", [0, 0])[0]),
            }
            sample_stats.append(stats)

    return pd.DataFrame(sample_stats)

# Perform analyses
solution_df = analyze_solutions(exp)
sample_df = analyze_sample_data(exp)

print("Solution Analysis:")
display(solution_df.head())

print("\nSample Data Analysis:")
display(sample_df.head())

# Correlation analysis
if not solution_df.empty:
    print("\nCorrelation between solution density and energy:")
    correlation = solution_df[["density", "energy"]].corr()
    print(correlation)

Solution Analysis:

	run_index	algorithm	solution_length	ones_count	zeros_count	density	energy	alternating_pattern	longest_run_of_ones
0	0	SimulatedAnnealing	20	7	13	0.35	-89.400293	10	3
1	1	SimulatedAnnealing	20	13	7	0.65	-141.370908	8	4
2	2	SimulatedAnnealing	20	14	6	0.70	-116.515785	8	5
3	3	SimulatedAnnealing	20	11	9	0.55	-151.648273	11	4
4	4	QuantumAnnealing	20	13	7	0.65	-103.671999	8	4

Sample Data Analysis:

	run_index	algorithm	num_samples	best_energy	avg_energy	energy_range	energy_span
0	0	SimulatedAnnealing	100	-89.400293	-87.400293	[-99.40029346339507, -84.40029346339507]	15.0
1	1	SimulatedAnnealing	100	-141.370908	-139.370908	[-151.37090752076656, -136.37090752076656]	15.0
2	2	SimulatedAnnealing	100	-116.515785	-114.515785	[-126.51578513343506, -111.51578513343506]	15.0
3	3	SimulatedAnnealing	100	-151.648273	-149.648273	[-161.64827297865997, -146.64827297865997]	15.0
4	4	QuantumAnnealing	100	-103.671999	-101.671999	[-113.67199884856215, -98.67199884856215]	15.0

Correlation between solution density and energy:
          density    energy
density  1.000000 -0.005797
energy  -0.005797  1.000000

7. Modifying and Updating Run Data#

You can also modify existing run data or add new computed metrics:

print("⚠️  Note: This example shows conceptual data modification.")
print("In practice, DataStore objects from exp.runs are read-only views.")
print("To add computed metrics, you'd need to work with active Run objects during execution.")

def analyze_existing_data(experiment):
    """Analyze existing data and compute derived metrics."""
    analysis_results = []

    for i, run_datastore in enumerate(experiment.runs):
        # Compute quality score based on energy and solution properties
        energy = run_datastore.parameters.get("final_energy", 0)
        solution_data = run_datastore.objects.get("best_solution", {})
        solution_vector = solution_data.get("solution_vector", [])

        if solution_vector:
            density = sum(solution_vector) / len(solution_vector)
            # Custom quality metric (example)
            quality_score = abs(energy) * (1 - abs(density - 0.5))  # Prefer balanced solutions with low energy

            # Collect analysis
            analysis = {
                "run_index": i,
                "algorithm": run_datastore.parameters.get("algorithm"),
                "temperature": run_datastore.parameters.get("temperature"),
                "energy": energy,
                "solution_density": density,
                "quality_score": quality_score,
                "balance_score": 1 - abs(density - 0.5) * 2,  # 1 = perfectly balanced
                "entropy": -sum(p * np.log2(p) for p in [density, 1-density] if p > 0),
            }
            analysis_results.append(analysis)

        # Add runtime analysis
        algorithm = run_datastore.parameters.get("algorithm")
        temperature = run_datastore.parameters.get("temperature")
        if algorithm and temperature:
            # Simulated performance prediction (example)
            predicted_runtime = {
                "SimulatedAnnealing": 1.0 / temperature,
                "QuantumAnnealing": 0.5 / temperature,
                "GeneticAlgorithm": 2.0 / temperature
            }.get(algorithm, 1.0)

            analysis_results[-1]["predicted_runtime"] = predicted_runtime

    return pd.DataFrame(analysis_results)

# Analyze existing data
analysis_df = analyze_existing_data(exp)
print("Analysis with computed metrics:")
display(analysis_df[["algorithm", "temperature", "energy", "quality_score", "predicted_runtime"]].head())

⚠️  Note: This example shows conceptual data modification.
In practice, DataStore objects from exp.runs are read-only views.
To add computed metrics, you'd need to work with active Run objects during execution.
Analysis with computed metrics:

	algorithm	temperature	energy	quality_score	predicted_runtime
0	SimulatedAnnealing	0.1	-89.400293	75.990249	10.0
1	SimulatedAnnealing	0.5	-141.370908	120.165271	2.0
2	SimulatedAnnealing	1.0	-116.515785	93.212628	1.0
3	SimulatedAnnealing	2.0	-151.648273	144.065859	0.5
4	QuantumAnnealing	0.1	-103.671999	88.121199	5.0

8. Export and Save Processed Data#

Save your processed data for further analysis or sharing:

# Create comprehensive analysis report
def create_analysis_report(experiment):
    """Create a comprehensive analysis report."""
    report = {
        "experiment_summary": {
            "name": experiment.name,
            "total_runs": len(experiment.runs),
            "algorithms_tested": list(set(run_datastore.parameters.get("algorithm") for run_datastore in experiment.runs)),
            "temperature_range": [min(run_datastore.parameters.get("temperature", 0) for run_datastore in experiment.runs),
                                max(run_datastore.parameters.get("temperature", 0) for run_datastore in experiment.runs)]
        },
        "performance_by_algorithm": {},
        "detailed_run_data": []
    }

    # Algorithm performance summary
    algorithms = set(run_datastore.parameters.get("algorithm") for run_datastore in experiment.runs)
    for algorithm in algorithms:
        algo_runs = [(i, run_datastore) for i, run_datastore in enumerate(experiment.runs)
                    if run_datastore.parameters.get("algorithm") == algorithm]

        energies = [run_datastore.parameters.get("final_energy", 0) for _, run_datastore in algo_runs]

        report["performance_by_algorithm"][algorithm] = {
            "run_count": len(algo_runs),
            "avg_energy": np.mean(energies),
            "best_energy": min(energies),
            "energy_std": np.std(energies)
        }

    # Detailed run data
    for i, run_datastore in enumerate(experiment.runs):
        run_data = {
            "run_index": i,
            "parameters": dict(run_datastore.parameters),
            "config_summary": {
                "variables": run_datastore.objects.get("configuration", {}).get("problem_metadata", {}).get("variables"),
                "tolerance": run_datastore.objects.get("configuration", {}).get("solver_config", {}).get("tolerance")
            },
            "solution_summary": {
                "energy": run_datastore.objects.get("best_solution", {}).get("energy"),
                "is_feasible": run_datastore.objects.get("best_solution", {}).get("is_feasible")
            }
        }
        report["detailed_run_data"].append(run_data)

    return report

# Generate report
analysis_report = create_analysis_report(exp)

print("Analysis Report Summary:")
print(f"Experiment: {analysis_report['experiment_summary']['name']}")
print(f"Total runs: {analysis_report['experiment_summary']['total_runs']}")
print(f"Algorithms tested: {analysis_report['experiment_summary']['algorithms_tested']}")

print("\nPerformance by Algorithm:")
for algo, stats in analysis_report["performance_by_algorithm"].items():
    print(f"  {algo}:")
    print(f"    Runs: {stats['run_count']}")
    print(f"    Avg Energy: {stats['avg_energy']:.2f}")
    print(f"    Best Energy: {stats['best_energy']:.2f}")
    print(f"    Energy Std: {stats['energy_std']:.2f}")

# Save processed DataFrames
print("\n📁 You can save processed data:")
print("custom_df.to_csv('custom_analysis.csv')")
print("solution_df.to_parquet('solution_analysis.parquet')")
print("analysis_df.to_csv('computed_metrics.csv')")
print("import json; json.dump(analysis_report, open('analysis_report.json', 'w'))")

Analysis Report Summary:
Experiment: data_access_tutorial
Total runs: 12
Algorithms tested: ['GeneticAlgorithm', 'QuantumAnnealing', 'SimulatedAnnealing']

Performance by Algorithm:
  GeneticAlgorithm:
    Runs: 4
    Avg Energy: -136.13
    Best Energy: -282.98
    Energy Std: 87.47
  QuantumAnnealing:
    Runs: 4
    Avg Energy: -118.65
    Best Energy: -182.43
    Energy Std: 37.34
  SimulatedAnnealing:
    Runs: 4
    Avg Energy: -124.73
    Best Energy: -151.65
    Energy Std: 24.07

📁 You can save processed data:
custom_df.to_csv('custom_analysis.csv')
solution_df.to_parquet('solution_analysis.parquet')
analysis_df.to_csv('computed_metrics.csv')
import json; json.dump(analysis_report, open('analysis_report.json', 'w'))

9. Working with Persistent Data Storage#

Understanding how to work with saved experiments and reload data:

import minto
# Save the current experiment
exp.save()
print(f"Experiment saved to: {exp.savedir}")

# Load experiment from file (simulating a new session)

# Load the saved experiment
loaded_exp = minto.Experiment.load_from_dir(exp.savedir)
print(f"\nLoaded experiment: {loaded_exp.name}")
print(f"Number of runs: {len(loaded_exp.runs)}")

# Verify all data is preserved
first_loaded_run = loaded_exp.runs[0]
print("\nFirst run data preserved:")
print(f"  Parameters: {len(first_loaded_run.parameters)} items")
print(f"  Objects: {len(first_loaded_run.objects)} items")

# Show that complex objects are fully preserved
config = first_loaded_run.objects.get("configuration", {})
print("\nComplex configuration object preserved:")
print(f"  Solver config keys: {list(config.get('solver_config', {}).keys())}")
print(f"  Problem metadata keys: {list(config.get('problem_metadata', {}).keys())}")

[2025-08-01 21:54:28] 🎯 Experiment 'data_access_tutorial' completed: 12 runs, total time: 23.0s
Experiment saved to: .minto_experiments/data_access_tutorial_20250801215405
[2025-08-01 21:54:28] 🚀 Starting experiment 'data_access_tutorial'
[2025-08-01 21:54:28]   ├─ 📊 Environment: OS: Linux 6.6.93+, CPU: Intel(R) Xeon(R) CPU @ 2.80GHz (4 cores), Memory: 15.6 GB, Python: 3.11.10
[2025-08-01 21:54:28]   ├─ 📊 Environment Information
[2025-08-01 21:54:28]       ├─ OS: Linux 6.6.93+
[2025-08-01 21:54:28]       ├─ Platform: Linux-6.6.93+-x86_64-with-glibc2.35
[2025-08-01 21:54:28]       ├─ CPU: Intel(R) Xeon(R) CPU @ 2.80GHz (4 cores)
[2025-08-01 21:54:28]       ├─ Memory: 15.6 GB
[2025-08-01 21:54:28]       ├─ Architecture: x86_64
[2025-08-01 21:54:28]       ├─ Python: 3.11.10
[2025-08-01 21:54:28]       ├─ Key Package Versions:

Loaded experiment: data_access_tutorial
Number of runs: 12

First run data preserved:
  Parameters: 5 items
  Objects: 3 items

Complex configuration object preserved:
  Solver config keys: ['tolerance', 'max_time', 'parallel']
  Problem metadata keys: ['variables', 'constraints', 'density']

Key Takeaways#

🔍 run_table_data is a View, Not the Data#

exp.get_run_table() provides a convenient DataFrame view
It only shows parameters as columns
Objects, solutions, and samplesets are not included
Complex nested data structures are not visible

⚠️ Important API Note#

exp.runs returns a list of DataStore objects (not Run objects)
DataStore objects contain: parameters, objects, solutions, samplesets
DataStore objects do not have run_id - use list index for identification
This is different from active Run objects used during experiment execution

🗃️ Accessing Complete Data#

Use exp.runs to access individual DataStore objects (list of DataStore)
Each DataStore has: .parameters, .objects, .solutions, .samplesets
All logged data is preserved, including complex structures
Use list index instead of run_id for DataStore identification

🛠️ Custom Analysis#

Create custom DataFrames for your specific analysis needs
Filter runs based on complex criteria using DataStore objects
Extract and analyze nested data structures from objects
Compute derived metrics in separate analysis DataFrames

💾 Data Persistence#

All data (parameters, objects, solutions, samplesets) is saved
Reloaded experiments preserve the complete DataStore structure
You can build analysis pipelines that work with saved data

📊 Best Practices#

Use run_table_data for quick overviews and simple parameter analysis
Access individual DataStore objects via exp.runs for detailed analysis
Build custom DataFrames that include the specific data you need
Save processed analysis results alongside your experiments
Remember that exp.runs returns DataStore objects, not Run objects
Use list indices to identify specific runs since DataStore doesn’t have run_id

This approach gives you the flexibility to perform both quick exploratory analysis and deep, detailed investigations of your optimization experiments using the actual Minto API.