Yuhong Nan, Semantics-Driven, Learning-Based Privacy Discovery in Mobile Apps




CERIAS Weekly Security Seminar - Purdue University show

Summary: A long-standing challenge in analyzing information leaks within mobile apps is to automatically identify the codeoperating on sensitive data. With all existing solutions relying on System APIs (e.g., IMEI, GPS location) or features of user interfaces (UI), the content from app servers, like user's Facebook profile, payment history, fall through the crack. In this talk, I will introduce ClueFinder, a novel semantics-driven solution for automatic discovery of sensitive user data, including those from the server side. ClueFinder utilizes natural language processing (NLP) to automatically locate the program elements (variables, methods, etc.) of interest, and then performs a learning-based program structure analysis to accurately identify those indeed carrying sensitive content. Using this new technique, we analyzed over 400k popular apps, an unprecedented scale for this type of research. Our findings brings to light the pervasiveness of information leaks, and the channels through which the leaks happen, including unintentional over-sharing across libraries and aggressive data acquisition behaviors. About the speaker: Dr. Yuhong Nan is a Post-Doctoral Research Associate at Purdue University. He earned his Ph.D. in the School of Computer Science from Fudan University, China, with the honor of the 2018 ACM SIGSAC China Doctoral Dissertation Award. His research interests span privacy leakage detection in mobile and IoT platforms, security enhancement for IoT systems, as well as cyber-attack investigation with audit logs.