Data driven technologies and machine leaning are among the latest developments and most promising approaches in corrosion science to guide the discovery and design of more effective and environmentally benign corrosion inhibitors and protective coating systems3,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25. However, one of the main challenges dealing with the application of machine learning to understand and design protective systems is building the datasets required for training the predictive models26,27. The collection of experimental data, as well as data management and curation, are among the most time-consuming tasks in the machine learning workflow. Therefore, a web application like the one presented herein, will fulfill two main purposes: (1) it can be used by scientists and engineers working in academia and industry to quickly compare the performance of different corrosion inhibitors and select the most appropriate condition specific corrosion inhibitor for each application; and (2) it will provide a framework to organize cured datasets for different substrates which will trigger further machine learning and data driven developments to design corrosion inhibitors.
A general view of the CORDATA application can be seen in Fig. 1 and accessed free of charge through the following url: https://datacor.shinyapps.io/cordata/. The web application was designed to work on personal computers, tablets and mobile phones, and includes several different functionalities (Fig. 2), such as: (1) search for the appropriate application conditions, such as the type of metal and alloy, the possible synergistic combination of inhibitors, the minimum efficiency, select a range of temperature and pH, and a minimum aggressive salt concentration; (2) quickly check the inhibitor structure and the reference used to obtain its corrosion inhibition efficiency; (3) search specific corrosion inhibitors through an internal search engine; (4) select and compare other properties and aspects of the data, such as the molecular weight, SMILES notation, measurement time, corrosion inhibitor concentration, synergistic inhibitor concentration, experimental methodology, literature reference, and name and institution of the contributor that added each specific data entry; and (5) a user interface with detailed instructions is available for users to submit additional data, request the whole dataset or provide their feedback. A spreadsheet template file can be downloaded for users to include their own data, while the whole updated dataset will be available to contributors, to be used in their own machine learning and data driven research.
At the time of this publication nearly five thousand corrosion inhibition efficiencies and almost four hundred compounds have already been added to the database. The data originates from more than one hundred and twenty publications, mainly for aluminum, copper, magnesium, iron and their main alloys. More specific information about the data included in the database can be found in Table 1.
The total number of efficiency values and compounds are already in a sufficient amount to find efficient corrosion inhibitor solutions for a broad number of application cases and conditions, thus it is expected to be immediately helpful for corrosion scientists and engineers working on the design of more efficient corrosion protective systems. Nevertheless, the data currently included in the application is still only a small part of all the information existing in literature. This number will increase over the years, as more data will be added by the authors and by other research groups that see value contributing to the database, while the web application gains traction among the corrosion science community.