In this confidential project, we encountered a unique challenge involving one of our business partners. While we were providing services to this partner, they retained control over the data on their servers. For system monitoring and diagnostics, we needed access to this data, but due to the high-security nature of their operations, they denied our request and offered only limited access via a VPS for data analysis.
However, over time, the restrictions on the VPS increased, contrary to the initial agreement, and we were left with only the ability to take screenshots from the computer’s display to gather basic stats.
The Turning Point:
The partner had previously provided us with a software tool for business management purposes. Upon analysis, we discovered that the software had access to the data we needed. By reverse engineering the application, we extracted the relevant WEB APIs necessary to retrieve the data. The only obstacle was the authentication process, which included a Captcha mechanism—rarely seen in Windows Form applications.
Our Approach to Solving the Captcha:
- Dataset Creation: We started by collecting and creating a dataset of Captcha images similar to those displayed by the software. This formed the foundation for training our model.
- Model Development: Using TensorFlow and Keras, we developed and trained a machine learning model to solve the Captcha. The dataset allowed the model to learn the patterns and recognize the characters within the Captcha images.
- Model Testing: After training, we tested the model, achieving a 60% accuracy rate, which was sufficient for our purposes. This allowed us to bypass the Captcha during login attempts with an acceptable success rate.
- Automation and Data Extraction: We then built a robot that utilized the Captcha-solving model to authenticate and log into the system, extracting the necessary data using the WEB APIs we had uncovered.
Outcome:
Despite the security limitations imposed by our partner, we successfully gained access to the data we required through reverse engineering and Captcha cracking. By developing a custom machine learning model, we bypassed the restrictive Captcha authentication and automated the data extraction process, ensuring our ability to monitor and diagnose our systems effectively.