It's easy to lose sight of the "Northern Star" in the healthcare analytics/data science industry.

With the aide of a telescope, you'd find that "star" is actually a twin-sister solar system, consisting of two existential, prime directives:

  1. Identify/monitor behaviors and patterns (traditional business intelligence)
  2. Promote/discourage/predict behaviors and patterns (machine learning and AI/decision support)

If your hospital or medical system is "bogged down" by regulatory/quality/research reporting, you're probably stuck in a loop on directive #1, and you've got work to do.

"But regulatory reporting takes up a lot of resources! And it's sorta both directives... and real dollars are on the line..."

You're right it does take too much time, but with discipline it oughtn't, and these agencies must have the data. It's 1 of 2 things analytics is good for (identifying/monitoring behaviors), so get it done and make it scalable.

What does scalable actually mean?

Hyper-converged, horizontal infrastructure for servicing big queries and traditional business/operational queries at the same time? No. Well yes, eventually, soon-ish ideally, but not yet, no.

Scalable quality reporting for U.S. acute care hospitals in 2017 means a standardized, version-controlled analytics data layer. That's it.

At Anne Arundel Medical System, we use:

  • Git - Version-control and code reviews
  • Jenkins - Automated data model builds/tests

To manage a standardized set of SQL views/stored procedures/functions -- re-used and co-developed by all members of a data science & analytics team -- for both tasks of identifying/monitoring and promoting interventions.

We also use Jira and Confluence for agile project management and documentation, but I'll leave that for another post.

You've got to walk before you run, and you've got to be able to scale directive #1 if you ever want to get to #2.


Are you doing anything useful -- really, actually, truthfully -- with predictive modeling and machine learning? (e.g. predicted risk-scores with real downstream implications embedded in clinicians' workflow?)

Is your analytics infrastructure scalable based on the definition in this blog post?

Are the answers to these two questions related?