Abstract
The Michigan Education Data Center (MEDC) is a secure data clearinghouse for the State of Michigan’s education data and personally identifiable information (PII) on Michigan’s K-12 population. These data are indexed by a key that uniquely identifies an individual student, allowing researchers to perform analysis across multiple linked datasets. This unique identifier, however, limits the scope of analysis to these internal datasets – What can we do if a researcher brings in an external dataset? How can we help them incorporate this new data if they do not have the same key? MEDC has developed a probabilistic matching process that allows us to link external data with MEDC’s internal datasets in cases where each dataset contains overlapping PII. We will discuss the route by which we developed our process, difficulties encountered, and avenues for future progress.