Prior research exploited the repetitiveness of code changes to enable several tasks such as code completion, bug-fix recommendation, library adaption, etc. These and other novel applications require accurate automated detection of repetitive changes, but the current state-of-the-art is limited to custom-tailored algorithms that detect specific kinds of changes at the syntactic level. Existing algorithms relying on syntactic similarity have lower accuracy, and cannot effectively detect semantic change patterns. In this work, we introduce a novel graph-based mining approach, CPatMiner, to detect previously unknown repetitive changes in the wild, by mining fine-grained semantic code change patterns from a large number of open-source repositories. To overcome unique challenges such as detecting meaningful change patterns and scaling to large repositories, we rely on fine-grained change graphs that capture program dependencies.
We evaluate CPatMiner by mining change patterns in a diverse corpus of 5,000+ open-source projects from GitHub across a population of 170,000+ developers. We use three complementary methods. First, we sent the mined patterns to 108 open-source developers. We found that 70% of respondents recognized those patterns as their meaningful frequent changes. Moreover, 79% of respondents even named the patterns, and 44% wanted future IDEs to automate such repetitive changes. We found that the mined change patterns belong to various development activities: adaptive (9%), perfective (20%), corrective (35%) and preventive (36%, including all refactorings). Second, we compared CPatMiner with the state-of-the-art, AST-based technique, and reported that CPatMiner detects 37 % more meangingful patterns. Third, we use CPatMiner to search for patterns in a corpus of 88 GitHub projects with longer histories consisting of 164M SLOCs. It constructed 322K fine-grained change graphs containing 3M nodes, and detected 17K instances of change patterns from which we provide unique insights on the practice of change patterns among individuals and teams. We found that a large percentage (75%) of the change patterns from individual developers are commonly shared with others, and this holds true for teams. Moreover, we found that the patterns are not intermittent but spread widely over time. Thus, we call for a community-based change pattern database to provide important resources in novel applications.