Adobe has made available in open source a tool designed to identify randomly generated strings in any plain text.
Dubbed Stringlifier, the tool was written in Python and leverages machine learning to differentiate random character sequences from normal text sequences.
The open-source project should prove helpful when analyzing security and application logs, or when attempting to discover credentials that might have been accidentally exposed.
Whether it comes down to hashes, API keys, randomly generated passwords, or other types of random strings in source code, logs, or configuration files, Stringlifier should help easily identify them.
The source code for Stringlifier has been published in Adobe’s public GitHub repository, but the software giant also made available a “pip” (Python package installer) installation package with a pre-trained model included.
The team used various approaches to pre-process and convert long strings into numerical form, but these approaches hit a roadblock when encountering random strings, disrupting the clustering algorithm.
By replacing all random character sequences with <RANDOM_STRING>, the team was able to group similar types of command lines easier, even if they employed random hashes in their parameters.
“We hope you find stringlifier useful. The entire source-code is available in Adobe’s GitHub repository. You can also find all of our other open source projects from across Adobe’s security teams in that repository. We look forward to getting feedback and contributions are always welcome,” Adobe notes.
The company also provides information on how to get started with using Stringlifier, as well as on how users can train their own models when looking to detect different types of strings.