The motivation behind your proposal is (I think) a
desire to have a unified configuration interface for data collection jobs. This makes
total sense and it's worth pursuing. I just don't think we should stuff everything
into the schema. The schema is just that: a schema. It's a data model.
Much agree with ori here. We would be bloating schema
with properties that have nothing to do with data definition.
agree with both of you, these are data collection settings that do not necessarily belong
in the schema itself if its job is to represent the data model.
As you know, we don’t have a solution for representing schema metadata (other than the
dirty hack of schema talk pages) or data collection options. As a customer, I would value
the ability to specify schema ownership (who should be contacted if something goes wrong),
sampling rates (should the data be collected sampled or unsampled), retention and privacy
options (should the data be retained indefinitely? should the whole log be pruned after
the retention window? are there fields that include PII that should be stripped?) as well
as monitoring where a specific <schema, rev_id> is deployed.
Dario