I have a vendor that sucks donkey balls. Their systems break often. An endpoint we rely on will start returning [] and take months to fix. They’ll change a data label in their backend and not notice that it flows into all of their filters and stuff.

I have some alerts when my consumers break, but I think I’d like something more direct. What’s the best way to monitor an external API?

I’m imagining some very basic ML that can pop up and tell me that something has changed, like there are more hosts or categories or whatever than usual, that a structure has gone blank or is missing, that some field has gone to 0 or null across the structure. Heck, that a field name has changed.

Is the best way to basically write tests for everything I can think of, and add more as things break, or is there a better tool? I see API monitoring tools but they are for calculating availability for your own APIs, not for enforcing someone else’s!

  • whotookkarl@lemmy.dbzer0.com
    link
    fedilink
    arrow-up
    2
    ·
    1 day ago

    A couple approaches are setting up a batch process on a frequent interval to call the API and run tests against the responses, another is to have the service consumer publish events to a message bus & monitor the events. It depends on things like do I own both the service and client or just client, can I make changes to the client or just add monitoring externally, and if I can run test requests without creating/updating/destroying data like a read only service, or if I need real requests to observe.

    • Clay_pidgin@sh.itjust.worksOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 day ago

      The main one I have issues with is a read only API. I guess I make it harder on myself from this perspective by not maintaining one big client, but lots of separate single-purpose tools.

      • whotookkarl@lemmy.dbzer0.com
        link
        fedilink
        arrow-up
        2
        ·
        1 day ago

        Yeah then I would setup a call or set of calls on an interval to test the response on, and if a critical test fails send an alert, if there are less critical alerts maybe treat as warnings and send a report periodically. In either case I’d log and archive all of it so if they are bullshitting or violating contact SLAs I’ll have some data to reference.

        • Clay_pidgin@sh.itjust.worksOP
          link
          fedilink
          English
          arrow-up
          2
          ·
          1 day ago

          They do have an API Accuracy SLA but it’s not defined anywhere so we do our best. They’ve only avoided penalties a few months out of the last several years!

          • whotookkarl@lemmy.dbzer0.com
            link
            fedilink
            arrow-up
            3
            ·
            edit-2
            1 day ago

            Oof that is a rough one. If they are just absorbing the penalties it sounds like the penalties need to be increased to make it more financially necessary to change the incentive to actually do the work, but in the meantime I’d just collect and report on as much data as I could.