Twenty years ago, the NYC Health Department built a Master Child Index (MCI), combining its immunization and blood lead registries. The Citywide Immunization Registry is seeded with birth records from the NYC Health Department’s Bureau of Vital Statistics and receives immunization records from healthcare providers in NYC. The blood lead registry receives laboratory and provider reports for blood lead levels in children. These data systems were not always easily linked, so the technology team built an artificial intelligence (AI) model to conduct probabilistic identity matching—a novel approach in 2004. Angel Aponte, Director of Information Strategy for the NYC Health Department’s Division of Disease Control says, “The rationale for building the Master Child Index was to … integrate the data, to provide the healthcare providers with a single view of that child’s health in the agency’s online registry web application.”
The agency built the system “from scratch, using available technologies at the time,” says Aponte. In contrast, today’s technological advances in free, open-source software provide easier access to better AI and machine learning tools, so Aponte’s team no longer needs to build all the foundational technologies. Because of these advances and lessons learned from developing and refining the MCI, the NYC Health Department is using these new technologies to implement more sustainable, next-generation matching software.
Aponte says, “We want to grow the index to be more than just children. We would like to integrate data for all of NYC’s health systems, to help inform us how to best support the health of our population.”
Aponte says they should also implement “data lake house” architecture for pooling data across sources. It would be a flexible technology for integrating data from multiple different sources to support analysis about how to best improve the health of NYC’s population. The NYC Health Department’s Division of Information Technology did a proof of concept for a lake house using Microsoft Azure, and Aponte says they are looking to expand on that.
Aponte notes, “Our job is to protect the population from health risks, innovate, and implement new and better infrastructure. You can move quickly and make mistakes in a development environment with synthetic data, but if you move too quickly with real patient records, you pay a hefty reputational price for mistakes. We have to get results and not put the information that we protect at risk.”
Despite being ahead of the curve for data systems, the NYC Health Department still experiences data challenges other jurisdictions face. Charles (Chip) Ko, Senior Director of Integrated Surveillance and Data in the Bureau of Epidemiology Services, gives the following example: During the COVID-19 pandemic, the NYC Health Department used complex processes to match different datasets—they matched COVID-19 data to data from vital statistics, the immunization registry and other syndromic surveillance programs. According to Ko, this work was vital during the height of the COVID-19 pandemic and “really allowed us to better track and describe the outcomes associated with COVID infection.”
Ko adds, “The problem, though, is that this required so much staff time, conducting all these one-off matches, monitoring those processes and addressing quality assurance issues. The hope with data modernization is that we would be able to get access to that data much more quickly … [so we] can spend more time on in-depth analyses where we can gain insight much more quickly.”
The Data Modernization Initiative (DMI) Unit was created in 2022 to address these data issues and “strive toward connected, adaptable response-ready systems that use public health data to drive action and equity,” says Ko.
The DMI Unit is part of the NYC Health Department’s newly launched Center for Population Health Data Science, a center aimed to catalyze the agency’s role as the “city’s health strategist” with a goal to link public health, healthcare and social service data. The DMI Unit currently includes over 20 employees, with subunits in data governance, data acquisition, and data science and engineering. However, Ko stresses that not all the work happens within the DMI Unit. “It is really a cross-agency collaboration,” he says.
One of the DMI Unit’s first initiatives was to create a data modernization advisory group, with representation across the NYC Health Department. The advisory group meets monthly and includes peer-to-peer conversations to expand understanding of different data workflows or methods across divisions. Different team members share their work, discuss challenges and successes and answer questions.
The advisory group holds informational sessions led by DMI members on topics such as data governance, data trust and data maturity assessments. Ko says the advisory group acts like a conduit for delivering their DMI updates to the other respective divisions across the NYC Health Department.
To ensure the DMI Unit was communicating even more broadly across the agency, the team also set up monthly roundtables for sharing key updates with executive leadership. “That includes our Commissioner of Health, Chief of Staff, Chief Operating Officer and leadership from other divisions who aren’t normally in our regularly scheduled calls. We have a space to raise any concerns we had or where we might need support from the executive leadership team,” says Ko.
The DMI Unit hosts data science and data engineering office hours, which are attended by around 40 people every other week. The format alternates between technical presentations and technical support with the purpose of advancing data science capacity across the NYC Health Department. Ko says, “People drop in and come with questions or pain points they’re trying to address … everyone just comes together and thinks through different solutions or asks ‘Have you tried this? Have you tried that?’ It’s a good opportunity for sharing ideas and methods.”
Ko also says it has been important to find “collaborative champions” across the NYC Health Department, such as Aponte. Ko says Aponte’s experience at the agency and his work on the next-generation matching software project has been valuable for the DMI Unit. “He has much experience at the agency and is working on [projects] that we really want to gain insight on. We’re working with him, learning from his team on how we can implement something similar agency-wide,” says Ko.
In turn, Aponte says partnerships across the NYC Health Department have been helpful for his team’s success.
Aponte says his goal is to “build a pipeline of innovative projects and useful infrastructure projects within one division and then hand them off for greater agency use.” He adds, “If I do a good job of building infrastructure for disease control, they should be able to scale to things that are larger than just disease control in New York City, especially since our population is so large.”
To keep staff and leadership updated, the DMI Unit sends a regular monthly newsletter to the executive leadership team, the advisory group and any staff across the NYC Health Department interested in learning about DMI. The newsletter’s purpose is to highlight data modernization projects and relevant policy updates while increasing agency awareness.
Ana Gutierrez, a Strategic Initiatives Manager in the DMI Unit, creates the newsletter. The newsletter is formatted so the activity updates are aligned with the Centers for Disease Control and Prevention’s (CDC) DMI priorities. “By setting that as the [standards] and then adding updates within it, we’re making connections for recipients to know why we’re doing certain activities. For example, we’re conducting a data maturity assessment, and it aligns with priority one. It helps communicate data modernization in a way that’s not intimidating, so there isn’t much of a roadblock to entering the conversation,” says Gutierrez .
In 2022, the DMI Unit’s advisory group identified five main priorities using the CDC’s self-assessment. These priorities are increasing capacity for data linkages and data standardization, coordinating governance for all data assets at the agency, improving data sharing internally and externally, increasing workforce capacity and improving technology infrastructure.
Currently, the DMI Unit’s focus is planning a blueprint for a central data infrastructure. The design of this infrastructure will be informed by two assessments being led by the team: a data maturity and a data governance assessment. The governance assessment would be used to understand the NYC Health Department’s readiness for an enterprise-wide data catalog while also gathering insights into current governance practices at the agency with the goal of moving toward a centralized data governance framework.
The catalog would include “all data assets that we have at the agency.” Ko says the system would also ideally help them manage data use agreements: “Having them all in one place instead of having them siloed within the different program areas.”
The immense work for a new central data infrastructure includes building internal capacity, investing in a shared cloud infrastructure and building out core data integration pipelines.
On the technology side, Aponte reflects, “Everybody wants to see results, and they want to get to the end goal, and you’ve got this beautiful kind of presentation at the end of it. But there is a significant amount of work and planning that needs to happen to get to that end result.”
Aponte acknowledges that advancements in technology that support public health are not always visible to the public, and that this is historically a challenge for public health.
“Public health does its best work when our population is well protected—we’re often not noticed when things go well, and we tend to get noticed when there’s a problem.”
To make progress in data modernization, Ko says communication is critical: “We found it’s really important to use shared language because data modernization is so broad, and it entails so much. It means something different to everyone.” The NYC Health Department has its own data modernization definition that Ko says is helpful for project focus and describing DMI work to staff across the agency. Ko adds, “Having shared language is also really important in this work because we’re speaking with technical and nontechnical audiences. Trying to find that balance [between technical and nontechnical language] is also important.”
Another factor for success is collaboration. Ko says it is important to create spaces where collaboration can occur and elevate spaces like the community of practices. He also says it is beneficial to identify collaborative champions at the NYC Health Department and learn from their work: “Angel Aponte is a huge collaborative champion here and we’ve learned so much from his team.”
Ko mentions being agile and able to adapt to the needs of the organization is important because data modernization is not going to look the same at every health department.
Aponte says free software is the underlying foundation for everything they have built. He suggests exploring free software such as Python, R and the Linux operating system. He adds: “There are free database technologies as well, free web application technologies from Apache, Tomcat and PostgreSQL. We build public health infrastructure on the foundation of free software.”
Aponte says public health work is interdisciplinary and problems do not necessarily have an elegant solution: “You often have to make the best decision you can with incomplete information, [and] you may not have all the answers right away. But what you need to do is move the needle and improve incrementally. If you keep making incremental progress, you will get to the goal eventually. Every change that you make doesn’t have to be revolutionary.”
Gutierrez agrees, saying the NYC Health Department “has big initiatives that are trying to redesign a lot of the ways we do things, but then we also have smaller conversations that provide small improvements in day-to-day work.”
Ko says starting with an assessment is a great step toward data modernization. He refers to the NYC Health Department’s data maturity assessment: “We really feel that this assessment is going to help us prioritize the projects and the next steps for our DMI roadmap. Maybe that’s a tool that other health departments can use to understand what to tackle first.” He references the Public Health Informatics Institute’s Informatics-Savvy Health Department self-assessment tool as a place to begin.
Ko says, “What we want from data modernization is to gain insights about our city’s population and their health needs—to use those insights to plan and help the health of all New Yorkers.”
“Data modernization is a long-term investment in the public health technology infrastructure. Public health practitioners collect, integrate and distill high-quality, actionable information from various sources to inform program staff and leaders at our health department, in our city, [in] our state and in our country so that they can make the best decisions possible to support our population. We use data to inform public health interventions that improve the health of our population. I think that modernizing our infrastructure will help executives make the best decisions, so our agency can better protect our population,” says Aponte.
To carry on the work of data modernization, Aponte says a sustained level of resources is needed. “We need to commit to providing long-term sustained resources for public health and for our workforce. I’ve seen a cycle in my time: You’ve got an emergency and a large influx of resources, and then the emergency ends, we get things under control, and the resources fall off a cliff and dwindle down until you get back to the baseline that you started and you often end up even lower than the baseline. We need a sustained commitment. It’s not pandemic-level resources. It’s not pre-pandemic-level resources. It’s somewhere in the middle, where we can ensure that we’re as modern and well-prepared as we can be.”