On 6 February 2016 at 03:52, Physikerwelt <wiki(a)physikerwelt.de> wrote:
Hello services-list,
since my very first contact with JSON, I was asking myself: Why can't
people simply use XML?
In the meantime I got aware of the advantages, especially for fast
prototyping and high performance applications.
However, when applications get larger, more complex and more mature
the absence of schema information is problematic.
That's indeed both the strength and weakness of JSON's free-style format:
it allows you to move fast, but also to shoot yourself in the foot on the
way to your destination.
For example, I found writing the parser for the WikiData dump [1]
quite exhausting. Alternatives like Json-lib work well for testing,
but I was quite worried about stability after hitting a log tail bug
[2]. Moreover, in the PHP Math extension it's often uncomfortable to
figure out which JSON properties are set under certain circumstances
[3]. Yesterday, I discovered another problem related to a missing JSON
schema [4] which finally motivated me to start this effort to discuss
about JSON schema options.
For the communication between services, we use spec files. This is a
great thing. But would it not be even better to use a JSON schema even
within services. So one could throw exceptions right at the place
where the problem occurs. I'm aware that there are approaches for a
JSON schema like [5], but I'm not sure if that is convenient to use in
practice.
For defining a service's public interface we use the Swagger
specification~[1], which itself is a quite close relative of JSON-Schema.
It even uses it directly for field declarations and other things, but it's
more tailored towards defining API interfaces instead of JSON fields.
Recently, we have started working on a new sub-system that delivers and
propagates MW events reliably, called EventBus~[2]. There, each
communication channel accepts only a certain type of event messages, which
are defined using JSON-Schema schemas~[3], which allows us to cleanly
define the contract between the system itself and event producers and
consumers.
But, I think you are right -- ideally schemas should be defined even for
intra-service communications and protocols, as they can serve not only as a
reference point, but for documentation and communication purposes as well.
The downside of doing so, though, is that adhering to the schema internally
means checking it, which slows down execution and hurts performance. So,
there should be balance and we should choose wisely what to "schematise"
and what not to.
To keep the discussion focused, we could use "how HTTP errors are
supposed to look" [6] as a running example to discuss how JSON schema
definition and validation could work.
This is the perfect example of something that should have a defined schema
available, all the more so because the code explicitly validates HTTPError
object's properties~[4]. Error log entries and error responses are
definitely something that needs to be standardised across our services.
I'll write a JSON-Schema document for it.
Thank you for bringing this up!
Cheers,
Marko
[1]
http://swagger.io/specification/
[2]
https://wikitech.wikimedia.org/wiki/EventBus
[3]
https://github.com/wikimedia/mediawiki-event-schemas
[4]
https://github.com/wikimedia/service-template-node/blob/01eb28f90f3cccdf248…
--
Marko Obrovac, PhD
Senior Services Engineer
Wikimedia Foundation