Zach Lipp
he/him
Senior Software Engineer, Lumere
19 February 2020
We want to help our team of expert medical researchers classify hospital purchases
| Field | Data Type | Example |
|---|---|---|
| Cost | Float | 0.01 |
| Description | String | SUT SILK 3-0 SA74H |
| Contract | String | SUTURE PRODUCTS |
| Department | String | SURGERY |
You don’t need to be an expert for some of these
| Field | Data Type | Example |
|---|---|---|
| Contract | String | SUTURE PRODUCTS |
We can use the text descriptions as inputs to classification models. This is called short text classification.
| Modeling | Delivery | Pros | Cons |
|---|---|---|---|
| Jupyter | Excel |
|
|
| ECS | Django |
|
|
| Kubernetes | Django |
|
|
Deployments (Dask workers, scheduler)CronJobs (training, predicting, refreshing training data)PipelineFunctionTransformerColumnTransformerDataFrame.groupby.to_sql()
By adapting old code to meet our new data model and make use of pandas over SQL, we avoided some costly joins and aggregations, leading to a 5-6 orders of magnitude speedup
One success:
We focused on migrating models as-is while independently researching better models
One failure opportunity for improvement:
Dask fails silently and fails often