D2 - Big Data Lake Development Project
Project Team
Project PI – George Collier
David Reinkensmeyer, John Dzivak, Mike Jones, Raeda Anderson, Veronica Swanson, Microsoft
Purpose/Aims.
This project has created a sophisticated Big Data system based on Microsoft Azure technologies to manage and analyze data from the Pt Pal and Flint Rehab SEAM platforms. We have been able to load, store, present, and analyze large data sets from both rehab systems.
When the project started, we proposed three development efforts: 1) the creation of a large “data lake” repository for data generated by the Pt Pal and Flint Rehab platforms, 2) the creation of an advanced analytic toolset for use by our RERC team and other qualified investigators, and 3) production and deployment of valuable analytic techniques for patient management, including patient profiles to determine best strategies for maximizing adherence, engagement, and outcomes of home- and community-based therapeutic interventions, and clinical path algorithms to facilitate patients’ progress through the most effective rehabilitation approach.
We are well on the way to delivering these goals. We can automatically extract data from the Pt Pal and the FitMi systems and load them into our cloud data stores, where it is processed. We can present this data in a digestible form to the staff at the Shepherd Center. We have used the Big Data system to visualize and explore the effects of technical and organizational changes on the adherence of patients to exercise prescriptions. Accurate and robust machine learning-based and theoretical models have been built for the FitMi data. We are leveraging our mathematical, statistical, and machine learning-based insights for new and innovative applications in the Smart Coach project which is described elsewhere in this document.
Status – On Target
Data Lake and Big Data Development
We have moved from a simple Data Lake architecture to a comprehensive Big Data system. This integrated Big Data analytics system allows us to ingest data in various formats and forms, organize and process it and provide useful analytic products of several types, including Data Science, Machine Learning, Relational Querying, and Business Intelligence.
Pt Pal-Focused Data Extraction and Management
We have built an automated extract, load, and transform (ELT) process to load daily activity data from Pt Pal into the Research Data system. It is modeled, cleaned, and exposed as Spark Tables in an Azure Synapse Big Data system.
Preliminary steps have been taken to integrate other Pt Pal data (e.g., clinician activity time in Pt Pal) into the Data Lakehouse.
The table below displays the scale of the data we have collected for the Pt Pal system.
·The data is processed daily in a Big Data Pipeline.
The Power BI system is integrated into our tool suite and supplies reporting and dashboarding capabilities.
A series of intensive user-centered design workshops were organized, resulting in sophisticated graphic interactive reports for clinicians and clinician managers.
Another area of investigation has been around the impact of system and organizational changes on the adherence of Pt Pal users to exercise prescriptions.
To better explore and understand the impact of these changes on core measures, we developed a system for creating a comprehensive set of time-series visualizations labeled with events of interest.
Integration of the Flint FitMi system into the Data Lakehouse
The Flint System provides "serious games" (Serious game - Wikipedia) primarily aimed at people working to recover after a stroke. Preliminary analyses have been completed on data from 2,583 patients who completed over 602,000 exercise sessions and 14 million “reps” over three years (Ramos-Munoz et al., 2021).
We have loaded the data from the FitMi system into the Big Data system, where we applied Machine Learning techniques. We have demonstrated several different models that we can robustly predict the behavior of FitMi users
We have also developed theoretical models to predict engagement with the FitMi system. There are multiple, but one currently stands out based on habit formation.
Some of these Machine Learning results have been leveraged as a starting point for the Smart Coach system, described elsewhere in this document.
Key Accomplishments
Presentations and Publications
Anderson, Raeda K., Swanson, V., Rabinowitz, A., Collier, G., DeRuyter, F., Chan, V., and Reinkensmeyer, D. From app to adherence: Promoting home rehabilitation exercises for people with disabilities. Technology, Mind, and Society. Online. (November 2021).
Anderson, Raeda K., George H. Collier, Naveen Khan, and John Dzivak. Exercise adherence and exercise prescription by physical therapists, occupational therapists, as well as speech and language pathologists. Annual Conference of the American Congress of Rehabilitation Medicine. Online. (September 2021).
Anderson, Raeda K., George H. Collier, Naveen Khan, and John Dzivak. Big data and therapy sessions: examination of 4 million therapy sessions. Annual Conference of the American Congress of Rehabilitation Medicine. Online. (September 2021). Poster.
Anderson, Raeda K., George Collier, Naveen Kahn, and John Dzivak. Getting from start to finish: Examination of patient status, therapy prescription, therapy participation, with pain and difficulty on patient likelihood of finishing at home therapy exercises. Rehabilitation Research 2020: Envisioning a Functional Future. National Institutes of Health. Online. (October 2021). Poster.
Backus, Deborah and Collier, George. Enabling New Models of Care thru
Big Data & Technologies, Presented at ACRM, Nov 11, 2022
Backus, Deborah and Collier, George. Integrating Data Sources to Support Therapeutic Interventions, Presented at ACRM, Nov 11, 2022
Backus, Deborah and Collier, George. Applying a variety of models to rehabilitation data collected in the wild. Presentation at DARE. Mar 4, 2023
Collier, George. Advancing Analytics Practice for Rehab. Presented at Rehab Tech Summit, Remote Presentation, Mar 5, 2022
Collier, George. Big Data Concepts as Applied to the SEAM System. Presented at KARM 2022, Incheon, Korea, Oct 28, 2022
Collier, George. Enabling the Future of Responsive and Precision Rehab. Presentation at Rehab Week, Amsterdam NL, Jul 26, 2023
Collier, George. Data in the Wild ACRM Symposium. Accepted for ACRM. Atlanta, GA, Oct 30, 2023
Jones, M., Morris, J., DeRuyter, F. Promoting accessible, mobile rehabilitation for people with disabilities – the mRehab Rehabilitation Engineering Research Center. Presented at CSUN 2020, Anaheim, CA, March 11, 2020.
Jones M. mRehab: New Models of Care Using Information & Communication Technology to Support Rehabilitation in the Home & Community. Presented at the American Congress of Rehabilitation Medicine 2020 Annual Conference, October 2020.
Jones, M. Technology innovation to support MS rehabilitation. Invited presentation at the 4th Annual MS and CND Neuroimmunology Symposium, Oregon Health Sciences University, Portland, OR (Virtual), September 2021.
Jones, M., Collier, G., Reinkensmeyer, D., DeRuyter, F., Djivak, J., Zondervan, D., Morris, J. Big Data analytics and sensor-enhanced activity management to improve effectiveness and efficiency of outpatient medical rehabilitation. Int. J. Environ. Res. Public Health, 2019; 17 (3), 748; doi.org/10.3390/ijerph17030748.
Swanson, V., Chan, V., Cruz-Coble, B., Alcantara, C.M., Scott, D., Jones, M., Zondervan, D.K., Khan, N., Ichimura, J., Reinkensmeyer, D.J. (2021) A Pilot Study of a Sensor Enhanced Activity Management System for Promoting Home Rehabilitation Exercise During the COVID-19 Pandemic: User Experience, Reimbursement, and Recommendations for Implementation. Accepted for publication in Int. J. Environ. Res. Public Health.
Ramos-Muñoz, E., Swanson, V., Johnson, C., Anderson, R., Rabinowitz, A., Zondervan, D., Collier, G., Reinkensmeyer, D. Using large-scale sensor data to identify factors related to perseverance with home exercise. Accepted for Ann. APA Technology, Mind, and Society Conf. November 3-5, 2021. Online.
Teaching and Education
We established a weekly Python and Data Science study group. And worked through multiple online Data Science classes. This activity has led to substantial hands work and programming lab work. Participants have come up to speed on using modern Data Science tools such as Python, Pandas, Anaconda, Scikit-Learn, Matplotlib, Jupyter Notebooks, Spark Clusters, SQL, Relational Databases, and more.
ACRM organizers have requested us to give an educational session at the next ACRM on Big Data and how it can be applied to Rehab.
Challenges and Course Corrections
The project has met and overcomes several challenges.
A required shift from a Data Lake + Data Science architecture to a full Big Data system. This shift was greatly aided by help from Microsoft, including funding and "hands-on-the-keyboard" technical collaboration. We have received multiple rounds of funding from Microsoft for our projects.
The unfamiliarity of some with the underlying concepts of the admittedly highly complex and highly technical infrastructure in use.
The discovery of usable patterns in these complex data sets is challenging, but we feel that we are progressing in this direction.