We are setting up a process to update the resources for our datasets using the Automation API. I am looking for best practice advice. Here is what we currently do, starting with a dataset_id and a file that we want to upload:
-
Retrieve the
dataset_uidusing the endpoint/api/explore/v2.1/catalog/datasets/{dataset_id}. -
Retrieve the currently used
resource_uidof the dataset using the endpoint/api/automation/v1.0/datasets/{dataset_uid}/resources, with thedataset_uidfrom step 1. -
Upload the file to the dataset using the endpoint
/api/automation/v1.0/datasets/{dataset_uid}/resources/files, again using thedataset_uidfrom step 1. From the response I get thefile_uidof the uploaded file. -
Update the dataset resource using the endpoint
/api/automation/v1.0/datasets/{dataset_uid}/resources/{resource_uid}, with thedataset_uidfrom step 1, theresource_uidfrom step 2, and thefile_uidfrom step 3. -
Republish the dataset using the endpoint
/api/automation/v1.0/datasets/{dataset_uid}/publish, with thedataset_uidfrom step 1.
This process works so far. However, I'm not quite sure if there's a more efficient way. I'm especially interested in the following two points:
-
Can I somehow combine steps 3 and 4 by directly uploading the file to
/api/automation/v1.0/datasets/{dataset_uid}/resources/{resource_uid}? I see in the documentation that instead of adataset_uid, aDatasetFileis also mentioned. However, I'm not quite sure how I would go about doing that. -
Once I upload a new resource file in step 3, I can't find the new file anywhere in the Opendatasoft GUI. Is there a way to manage all the resource files I’ve uploaded so that I don’t clutter the space? Or should I just use the endpoint to clean the cache of the resource, assuming that all uploaded files not referenced in a resource are removed?
Of course, if there are other best practices to streamline the process of updating dataset resources, I would be happy to hear about them. 😊
Many thanks,
Johannes