Dissimilarity Plots: A Visual Exploration Tool for Partitional Clustering
Date: Monday, February 13, 2017
Time: 1:15 p.m. – 2:15 p.m.
Location: Nedderman Hall Room 106
Abstract: Cluster analysis tries to uncover structure in data by assigning each object in the data set to a group (called cluster) so that objects from the same cluster are more similar to each other than to objects from other clusters. Exploring the cluster structure and assessing the quality of the cluster solution have been a research topic since the invention of cluster analysis. This is especially important since all popular cluster algorithms produce a clustering even for data without a “cluster” structure. Many visualization techniques to judging the quality of a clustering and to explore the cluster structure were developed, but they all suffer from certain restrictions. For example, dendrograms cannot be used for non-hierarchical partitions, silhouette plots provide only a diagnostic tool without the ability to explore structure, data dimensionality may render projection-based methods less useful, and graph-based representations hide the internal structure of clusters. In this talk we introduce a new visualization technique called dissimilarity plots which is based on solving the combinatorial optimization problem of seriation for (near) optimal cluster and object placement in matrix shading. Dissimilarity plots are not affected by data dimensionality, allow the user to directly judge cluster quality by visually analyzing the micro-structure within clusters, while they make misspecification of the used number of clusters instantly apparent. Dissimilarity plots are implemented in the R extension package seriation.
Biographical Sketch: Dr. Michael Hahsler is assistant professor of Engineering Management, Information, and Systems (EMIS), Lyle School of Engineering, Southern Methodist University (SMU). He also holds a courtesy appointment with the Department of Computer Science and Engineering, and an adjunct appointment with the Department of Clinical Sciences at UT Southwestern Medical Center. He received his Ph.D. in business informatics from the Vienna University of Economics and Business, Austria, where he worked as an assistant professor and core researcher at the Research Institute for Computational Methods. Dr. Hahsler’s research focuses on methods used in the interdisciplinary field of data science including data mining, data visualization, data streams and combinatorial optimization with applications in bioinformatics, healthcare analytics, quantitative marketing, earth sciences and other engineering disciplines. He has published more than 60 papers in peer-reviewed international journals and conference proceedings and has organized several workshops. He also currently serves as editor of the Journal of Statistical Software, the secretary of the INFORMS Data Mining Section and is the principal developer of several popular data mining related extension packages for R, a free software environment for statistical computing and graphics.
All students and faculty are encouraged to attend. Attendance is expected for GTAs and on-campus GRAs.
There will be signature sheets for GTA’s located in the room. Please sign in to note your attendance.