Sharing Various Types of Data in an OMMX Artifact#
In mathematical optimization workflows, it is important to generate and manage a variety of data. Properly handling these data ensures reproducible computational results and allows teams to share information efficiently.
OMMX provides a straightforward and efficient way to manage different data types. Specifically, it defines a data format called an OMMX Artifact, which lets you store, organize, and share various optimization data through the OMMX SDK.
Creating an OMMX Artifact as a File#
OMMX Artifacts can be managed as files or by assigning them container-like names. Here, we’ll show how to save the data as a file. Using the OMMX SDK, we’ll store the data in a new file called my_instance.ommx
. First, we need an ArtifactBuilder
.
import os
from ommx.artifact import ArtifactBuilder
# Specify the name of the OMMX Artifact file
filename = "my_instance.ommx"
# If the file already exists, remove it
if os.path.exists(filename):
os.remove(filename)
# 1. Create a builder to create the OMMX Artifact file
builder = ArtifactBuilder.new_archive_unnamed(filename)
ArtifactBuilder
has several constructors, allowing you to choose whether to manage it by name like a container or as an archive file. If you use a container registry to push and pull like a container, a name is required, but if you use an archive file, a name is not necessary. Here, we use ArtifactBuilder.new_archive_unnamed
to manage it as an archive file.
Constructor |
Description |
---|---|
Manage by name like a container |
|
Manage as both an archive file and a container |
|
Manage as an archive file |
|
Determine the container name according to the GitHub Container Registry |
Regardless of the initialization method, you can save ommx.v1.Instance
and other data in the same way. Let’s add the data prepared above.
# Add ommx.v1.Instance object
desc_instance = builder.add_instance(instance)
# Add ommx.v1.Solution object
desc_solution = builder.add_solution(solution)
# Add pandas.DataFrame object
desc_df = builder.add_dataframe(df, title="Optimal Solution of Knapsack Problem")
# Add an object that can be converted to JSON
desc_json = builder.add_json(data, title="Data of Knapsack Problem")
In OMMX Artifacts, data is stored in layers, each with a dedicated media type. Functions like add_instance
automatically set these media types and add layers. These functions return a Description
object with information about each created layer.
desc_json.to_dict()
{'mediaType': 'application/json',
'digest': 'sha256:6cbfaaa7f97e84d8b46da95b81cf4d5158df3a9bd439f8c60be26adaa16ab3cf',
'size': 78,
'annotations': {'org.ommx.user.title': 'Data of Knapsack Problem'}}
The part added as title="..."
in add_json
is saved as an annotation of the layer. OMMX Artifact is a data format for humans, so this is basically information for humans to read. The ArtifactBuilder.add_*
functions all accept optional keyword arguments and automatically convert them to the org.ommx.user.
namespace.
Finally, call build
to save it to a file.
# 3. Create the OMMX Artifact file
artifact = builder.build()
This artifact
is the same as the one that will be explained in the next section, which is the one you just saved. Let’s check if the file has been created:
! ls $filename
my_instance.ommx
Now you can share this my_instance.ommx
with others using the usual file sharing methods.
Read OMMX Artifact file#
Next, let’s read the OMMX Artifact we saved. When loading an OMMX Artifact in archive format, use Artifact.load_archive
.
from ommx.artifact import Artifact
# Load the OMMX Artifact file locally
artifact = Artifact.load_archive(filename)
OMMX Artifacts store data in layers, with a manifest (catalog) that details their contents. You can check the Descriptor
of each layer, including its Media Type and annotations, without reading the entire archive.
import pandas as pd
# Convert to pandas.DataFrame for better readability
pd.DataFrame({
"Media Type": desc.media_type,
"Size (Bytes)": desc.size
} | desc.annotations
for desc in artifact.layers
)
Media Type | Size (Bytes) | org.ommx.user.title | |
---|---|---|---|
0 | application/org.ommx.v1.instance | 325 | NaN |
1 | application/org.ommx.v1.solution | 266 | NaN |
2 | application/vnd.apache.parquet | 2595 | Optimal Solution of Knapsack Problem |
3 | application/json | 78 | Data of Knapsack Problem |
For instance, to retrieve the JSON in layer 3, use Artifact.get_json
. This function confirms that the Media Type is application/json
and reinstates the bytes into a Python object.
artifact.get_json(artifact.layers[3])
{'v': [10, 13, 18, 31, 7, 15], 'w': [11, 15, 20, 35, 10, 33], 'W': 47, 'N': 6}