Importer component
The KFP SDK v2 provides an importer component as a pre-baked component for a specific use case: importing a machine learning from remote storage to machine learning metadata (MLMD).
Typically, the input artifact to a task is an output from an upstream task. In this case, the artifact can be easily accessed from the upstream task using my_task.outputs['artifact_name']
. The artifact is also registered in MLMD when it is created by the upstream task.
If you wish to use an existing artifact that is not generated by a task in the current pipeline or wish to use as an artifact an external file that was not generated by a pipeline at all, you can use an importer component to load an artifact from its URI.
You do not need to write an importer component; it can be imported from the dsl
module and used directly:
from kfp.v2 import dsl
@dsl.pipeline(name='pipeline-with-importer')
def my_pipeline():
task = get_date_string()
importer_task = dsl.importer(
artifact_uri='gs://ml-pipeline-playground/shakespeare1.txt',
artifact_class=dsl.Dataset,
reimport=True,
metadata={'date': task.output})
other_component(dataset=importer_task.output)
In addition to the artifact_uri
, you must provide an artifact_class
, indicating the type of the artifact.
The importer
component permits setting artifact metadata via the metadata
argument. Metadata can be constructed with outputs from upstream tasks, as is done for the 'date'
value in the example pipeline.
You may also specify a boolean reimport
argument. If reimport
is False
, KFP will use an existing MLMD artifact if it already exists from an earlier importer execution. If reimport
is True
, KFP will reimport the artifact as a new artifact, irrespective of whether it was previously imported.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.