MIT developed a self-correction system for software

CodePhage is a system developed at Massachusetts Institute of Technology (MIT) with the ability to automatically correct errors on software by borrowing data from other programs from different programming languages ​​without access. Go to their source code, then manually translate the language of the receiving and inserting software to fix the error.

System to fix errors for software by "learning" from other software

This is a great thing for programmers, helping them save thousands of hours of programming, fixing bugs, and creating more stable software.

Errors are always an indispensable part of programmers' lives . Fixing errors can be implemented depending on the extent and some errors only need a few lines of code to be fixable, but determining exactly which error at the command line is an extremely time-consuming process. and boredom, especially for big projects. Now, the software system developed by MIT can solve this and it can do more.

Picture 1 of MIT developed a self-correction system for software

With CodePhage , the system can fix existing errors on its part by checking input variables, then quickly expanding itself to fix more types of errors. More notably, according to researcher Stelios Sidiroglou-Douskos at MIT, the software has the ability to actively translate and code (called "horizontal code transplant", a similar process in genetics) without must access the source code and through different programming languages. Instead, it will "learn" how to fix errors by directly analyzing the files.

How does the system work?

For example, you are writing a simple computer program, asking the user to enter 2 numbers and the output is the trade between the 1st and 2nd numbers. Suppose in the code, you forgot to check that the second number must be different from zero (division by 0 is meaningless). Meanwhile, CodePhage will start with the application (with errors) and 2 inputs, 1 without errors (safe input) and 1 with an error called (unsafe input). By using a very large database to find and read both input correctly, the system will find a "large repository" for a function for the division to be performed "an full ".

Sidiroglou-Douskos said: "We have tons of source code on huge open source databases, millions of projects and many projects that perform similar functions. Even without core functionality. of the program, they also have extra details to share the major project functions. "

The system will differentiate between the " for " program (software to borrow error correction methods) and the receiving software (software with errors that the system wants to fix). The first step is to provide secure input data into the "for" code and automatically monitor the constraints being applied to the input variables. Then, the software will do the same with the second " insecure " second input and compare the two constraint condition sets. Divergence points are defined as conditions that fit the safety input but do not appear on insecure variables, and then the system will understand that the receiving code may be missing the security check function in place. This feature.

Going back to the example above, the safe input is the division with a pattern other than zero, and the unsafe input is the division pattern with 0. The MIT system will automatically detect that condition " a division then the denominator must be different from 0 "- this is consistent with the condition of the safe input, but does not match the insecure input, so there should be a specific check on this condition in the receiving code, equal to There may not be an error.

Realizing that, CodePhage takes all the difference between the input test code in the given software and translates them into the receiving software's programming language. After that, the system continues to try to insert the newly translated code into the code of the receiving software until the unsafe input is handled correctly (and the software still works as expected when checking again test suite)

Professor Martin Rinard, who joined the development team, said: "The long-term vision of the project is that you never need to write short codes written by other people before. The system will find the code you need. set and automatically insert with other segments you need to operate as desired. " He said that this system could help programmers reduce the time and effort for checking input data. Instead, they will use this system to automatically transfer more powerful input check tools from a huge database, including proprietary applications with closed source code.

According to the research team, in modern commercial software , the safety inspection code accounts for 80%. Therefore, the impact of the system to fix errors in terms of time and effort is quite significant. Furthermore, the current system only limits the analysis of test variables, but they say that the same technique can also be used to track, extract and insert any functional code. Other, as long as the system can determine the exact values ​​assigned to the variable in the software for.

In addition, the MIT system can also use test switching between different versions of the same application, supporting the release of how to patch and update applications when errors arise. When testing on 7 open source programs , the team found CodePhage could patch holes at any time and took only 10 minutes to fix an error. In future versions, the group hopes this time will be reduced as quickly as possible. If you are interested in this system, you can read its full report at the following free link (link).