Using MS Excel in Creative Ways: An example in working with unstructured data
September 7, 2018 | Blog post by Brent Maranzan
MS Excel is still the ubiquitous tool of choice for anyone working in Decision Support. It is like a hammer in the hands of a carpenter, or a scalpel to a surgeon. Healthcare is famous for having large databases to sift through with poor reporting tools and capabilities. Often data analysis boils down to being able to pull data from multiple sources and then find ways to combine disparate data sets into one, comprehensive data file to work with. Over the years, I’ve become (out of both necessity and curiosity) a mostly self-taught MS Excel guru.
A big part of working with MS Excel is the ability to learn the basic concepts and then creatively apply them. Case in point, the other day I was working with a data set from a client that includes a lot of unstructured data. Specifically, the data were from the client’s ERM in the appointment scheduling utility. The client requested a report showing a number of metrics, including the different appointment types by provider, with counts of visits, no-shows, etc… Pretty basic stuff, but unfortunately their system is fairly old and no good “canned’ reports exist to meet their needs.
Fortunately, the system has a very accessible back-end, data extract utility. As is often the case, the plan was to build a reporting template in MS Excel where I would download data on a timely basis (monthly) and then set up an automated template in MS Excel using VBA. The plan ran into a snag, however, when I realized that there were issues with how the data were structured. Many of the fields in the “appointment type” column were blank. The information I needed was in the patient name field instead, as the client’s practice is to use a “dummy” patient name to record things like charting or vacation time in the schedule.
At this point, my options were to
a) give up and say it’s “not possible” to provide the report,
b) go through the 65,000 records manually and re-code the data into something I could use (which I’d have to do every month), or
c) use MS Excel creatively to efficiently accomplish the task.
The approach I took in this case was to first isolate the number of options. I copied and pasted the contents of the patient name data column to a new sheet and used the built-in “Remove Duplicates” function to show me how many unique names I was dealing with. That narrowed things down to 976 options – much better, but still WAY too many to go through manually for my liking. Here’s where the creativity comes in. I used VBA to access the spell check utility in MS Excel to look for words buried in the unstructured names data. This is the code I came up with:
That little bit of code went through the data in seconds and highlighted any words that were spelled correctly (ie, matched the internal wordlist in MS Excel, so likely not a person’s name) and put a “Y” character next to them so that they could easily be filtered. As a result, I only had about 30 options to scan through and review manually, rather than the original 950+! I was then able to easily go through the names and build a utility to pull out the appointment info I needed automatically in the future.
Thinking creatively in MS Excel and using VBA can greatly improve your analysis skills and efficency. If you have any questions about how you might be able to “creatively” approach an analysis challenge you face, we’d love to hear from you. We are also available to do training courses/seminars on MS Excel, VBA or Health Data Analysis in general.