
PROJECT
/2016/ARCHIVED/Leiurus
AI-powered vulnerability scanner that fingerprints pages and prioritizes attack vectors.
KEYWORDS:
+6
Leiurus is a #vulnerability-assessment tool that goes after the main waste in traditional #DAST (Dynamic Application Security Testing): running every test on every page. Instead, it uses #machine-learning to fingerprint a page and predict which vulnerabilities it is most likely to have. It is the working implementation of our paper Towards Prioritizing Vulnerability Testing, and it tests that paper's idea: pages with a similar structure tend to share the same kinds of vulnerabilities.
It is written in #python as a modular CLI. It covers the whole assessment, from scraping and vectorization to model training and reporting.
System Architecture
The system is a linear pipeline that turns raw URLs into confirmed findings. Input flows from the user through a feature extraction engine, into a #machine-learning model, and finally into an attack engine that checks the predictions.
Logical Workflow
The core logic sits in the ApplicationIdentifier. It turns a raw URL into a feature vector, predicts the vulnerability class, and hands the result to the Attack Engine to verify.
Loading diagram...
Feature Extraction Engine
The most complex part is the Feature Extractor. I used a Factory Pattern here so it is easy to extend. The
FeatureManager does not need to know how a feature is extracted; it reads a config file and instantiates the right class at runtime.Configuration Layer
Features are defined in simple INI files, not hardcoded. This allows users to add new detection signatures without touching the code.
ini
# Example Feature Configuration (features.ini) [identifier_header_pingback_xmlrpc] feature_name = header_contains description = X-Pingback contains xmlrpc.php key = X-Pingback value = xmlrpc.php path = / active = 1 order = 1
Execution Layer
The factory reads
feature_name (header_contains) and instantiates the corresponding class (HeaderContains), passing the key and value as arguments.python
class HeaderContains(FeatureExtractor):
def perform(self):
# Strict equality check for header values to fingerprint backend
if self.key in self.headers:
if self.value in self.headers[self.key]:
return 1
return 0AI Core
The AI module isn't a "black box" but a tunable component. The
ModelManager currently defaults to a Multinomial Naive Bayes algorithm (MultinomialNB). It operates in two modes: Training Mode, where it learns from labeled datasets, and Prediction Mode, where it classifies new targets to prioritize attack vectors using #machine-learning.Attack Engine
Once the AI classifies the target, the Attack Engine takes over. Unlike legacy scanners that rely on blind #fuzzing, Leiurus selects specific "Warheads" (payload lists). It doesn't just predict the type of vulnerability (e.g., SQLi), but also the most likely location (e.g., GET parameter
id vs POST body vs User-Agent header).The Attack Phase
The goal of Leiurus is not just classification but confirmation. The Attack Phase turns a prediction into actual #penetration-testing.
Vulnerability Verification Logic
Once the
ModelManager returns a probability score, the system constructs a targeted Attack Strategy. This strategy dictates not just what to send, but where to send it.- •Payload Selection: Filters payloads based on the predicted backend (e.g., only PostgreSQL payloads if the fingerprint suggests Postgres).
- •Target Selection: The AI identifies the most probable injection vectors. It might skip the URL parameters entirely and focus payload injection attempts solely on the
X-Forwarded-Forheader if the model predicts an internal logging vulnerability.
Loading diagram...
Key Capabilities
The tool leans on automation instead of manual guess-and-check.
Fingerprinting
The system converts a webpage into a numerical fingerprint (vector) based on structural features (URL patterns), response features (HTTP status codes), and content features (keywords in the HTML body). This creates a unique signature for every page type in an application.
Attack Prioritization
This is the main engineering goal. Leiurus prioritizes testing based on context. If the fingerprint resembles a "Search Page," the system prioritizes Reflected XSS and SQL Injection on the query parameters. If it resembles a "Contact Form," it shifts focus to Stored XSS and SMTP Injection in the POST body.
Human-in-the-Loop Training
The system gets smarter over time. It supports an interactive training mode where users can correct the AI's predictions or feed it new datasets via the CLI. By using the
train command, operators can associate specific feature vectors with confirmed vulnerabilities, effectively teaching the system to recognize new or custom attack surfaces.