Importer component

Use an importer component to import artifacts from remote storage

The KFP SDK v2 provides an importer component as a pre-baked component for a specific use case: importing a machine learning from remote storage to machine learning metadata (MLMD).

Typically, the input artifact to a task is an output from an upstream task. In this case, the artifact can be easily accessed from the upstream task using my_task.outputs['artifact_name']. The artifact is also registered in MLMD when it is created by the upstream task.

If you wish to use an existing artifact that is not generated by a task in the current pipeline or wish to use as an artifact an external file that was not generated by a pipeline at all, you can use an importer component to load an artifact from its URI.

You do not need to write an importer component; it can be imported from the dsl module and used directly:

from kfp.v2 import dsl

@dsl.pipeline(name='pipeline-with-importer')
def my_pipeline():
    task = get_date_string()
    importer_task = dsl.importer(
        artifact_uri='gs://ml-pipeline-playground/shakespeare1.txt',
        artifact_class=dsl.Dataset,
        reimport=True,
        metadata={'date': task.output})
    other_component(dataset=importer_task.output)

In addition to the artifact_uri, you must provide an artifact_class, indicating the type of the artifact.

The importer component permits setting artifact metadata via the metadata argument. Metadata can be constructed with outputs from upstream tasks, as is done for the 'date' value in the example pipeline.

You may also specify a boolean reimport argument. If reimport is False, KFP will use an existing MLMD artifact if it already exists from an earlier importer execution. If reimport is True, KFP will reimport the artifact as a new artifact, irrespective of whether it was previously imported.

Feedback

Was this page helpful?