Architecture Patterns and SOLID Principles ========================================== Dexray Insight has undergone significant architectural refactoring to implement SOLID principles and modern design patterns. This document describes the architectural improvements, design patterns used, and the benefits they provide. SOLID Principles Implementation ------------------------------- The framework now strictly adheres to SOLID principles throughout its architecture: Single Responsibility Principle (SRP) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Before**: Massive methods with multiple responsibilities * ``analyze_apk()`` method: 544 lines handling everything from setup to result aggregation * ``_assess_crypto_keys_exposure()`` method: 942 lines handling string collection, pattern detection, and result formatting * ``_create_full_results()`` method: 211 lines handling all result mapping and object creation **After**: Focused methods with single responsibilities .. code-block:: python # AnalysisEngine refactored into focused methods def analyze_apk(self, apk_path: str, ...) -> FullAnalysisResults: """Orchestrate analysis workflow (82 lines)""" context = self._setup_analysis_context(apk_path, androguard_obj, timestamp) tool_results = self._execute_external_tools(context) module_results = self._execute_analysis_modules(context, requested_modules) security_results = self._perform_security_assessment(context, module_results) return self._create_full_results(module_results, tool_results, security_results, context) **Benefits**: * Each method has a clear, single purpose * Easier to test individual responsibilities * Improved maintainability and debugging * Better code readability and understanding Open/Closed Principle (OCP) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Implementation**: Strategy Pattern for extensible secret detection .. code-block:: python # New strategies can be added without modifying existing code class CustomDetectionStrategy: def detect_secrets(self, strings_with_location): # Custom detection logic pass # Usage in SensitiveDataAssessment def _assess_crypto_keys_exposure(self, analysis_results): pattern_detector = PatternDetectionStrategy(self.detection_patterns, self.logger) # Could be replaced with CustomDetectionStrategy without changing this method detected_secrets = pattern_detector.detect_secrets(enhanced_strings) **Benefits**: * New detection strategies can be added without modifying existing detection logic * Different strategies can be swapped based on configuration or requirements * Extensible architecture supports future enhancements Liskov Substitution Principle (LSP) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Implementation**: All strategy classes implement consistent interfaces .. code-block:: python # All detection strategies can be substituted for each other class BaseDetectionStrategy(ABC): @abstractmethod def detect_secrets(self, strings_with_location) -> List[Dict[str, Any]]: pass class PatternDetectionStrategy(BaseDetectionStrategy): def detect_secrets(self, strings_with_location) -> List[Dict[str, Any]]: # Pattern-based detection implementation class MLDetectionStrategy(BaseDetectionStrategy): # Future extension def detect_secrets(self, strings_with_location) -> List[Dict[str, Any]]: # Machine learning-based detection implementation Interface Segregation Principle (ISP) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Implementation**: Focused interfaces for specific responsibilities .. code-block:: python # Separate interfaces for different aspects class StringCollector(ABC): @abstractmethod def collect_strings(self, analysis_results) -> List[Dict[str, Any]]: pass class SecretDetector(ABC): @abstractmethod def detect_secrets(self, strings) -> List[Dict[str, Any]]: pass class ResultClassifier(ABC): @abstractmethod def classify_by_severity(self, secrets) -> Dict[str, Any]: pass Dependency Inversion Principle (DIP) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Implementation**: Dependencies on abstractions, not concrete implementations .. code-block:: python class SensitiveDataAssessment: def __init__(self, config: Dict[str, Any]): # Depends on abstractions (strategies), not concrete implementations self.string_collector = StringCollectionStrategy(self.logger) self.deep_analyzer = DeepAnalysisStrategy(self.logger) self.pattern_detector = PatternDetectionStrategy(self.detection_patterns, self.logger) # These could be injected as dependencies for better testability Strategy Pattern Implementation ------------------------------- The secret detection system has been refactored using the Strategy Pattern to separate concerns and improve maintainability. Strategy Pattern Overview ~~~~~~~~~~~~~~~~~~~~~~~~~~ The Strategy Pattern allows selecting algorithms at runtime and makes the code more flexible and testable. .. code-block:: python # Strategy Pattern workflow in secret detection def _assess_crypto_keys_exposure(self, analysis_results: Dict[str, Any]) -> List[SecurityFinding]: # Strategy 1: String Collection string_collector = StringCollectionStrategy(self.logger) all_strings = string_collector.collect_strings(analysis_results) # Strategy 2: Deep Analysis Enhancement deep_analyzer = DeepAnalysisStrategy(self.logger) enhanced_strings = deep_analyzer.extract_deep_strings(analysis_results, all_strings) # Strategy 3: Pattern Detection pattern_detector = PatternDetectionStrategy(self.detection_patterns, self.logger) detected_secrets = pattern_detector.detect_secrets(enhanced_strings) # Strategy 4: Result Classification result_classifier = ResultClassificationStrategy() classified_results = result_classifier.classify_by_severity(detected_secrets) # Strategy 5: Finding Generation finding_generator = FindingGenerationStrategy(self.owasp_category) return finding_generator.generate_security_findings(classified_results) StringCollectionStrategy ~~~~~~~~~~~~~~~~~~~~~~~~ **Responsibility**: Collect strings from various analysis sources with location metadata .. code-block:: python class StringCollectionStrategy: def collect_strings(self, analysis_results: Dict[str, Any]) -> List[Dict[str, Any]]: """ Systematically extract strings from multiple sources: - String analysis module results - Android properties and system configuration - Raw strings from DEX analysis Returns list of dictionaries with 'value', 'location', 'file_path', 'line_number' """ **Key Features**: * Handles multiple string sources (analysis results, Android properties, raw strings) * Adds location metadata for traceability * Graceful handling of missing or malformed data * Supports both object-based and dictionary-based string analysis results DeepAnalysisStrategy ~~~~~~~~~~~~~~~~~~~~ **Responsibility**: Extract additional strings from deep analysis artifacts (XML, Smali, DEX) .. code-block:: python class DeepAnalysisStrategy: def extract_deep_strings(self, analysis_results: Dict[str, Any], existing_strings: List[Dict[str, Any]]) -> List[Dict[str, Any]]: """ Enhance string collection with deep analysis sources: - DEX object string extraction using Androguard - XML resource file string extraction - Smali code string extraction Only operates in 'deep' analysis mode for performance """ **Analysis Modes**: * **DEEP mode**: Full string extraction from DEX, XML, and Smali sources * **FAST mode**: Returns existing strings unchanged (performance optimization) **Benefits**: * Significantly increased string coverage for secret detection * Performance-aware operation based on analysis mode * Comprehensive error handling and logging PatternDetectionStrategy ~~~~~~~~~~~~~~~~~~~~~~~~ **Responsibility**: Apply 54 different secret detection patterns to collected strings .. code-block:: python class PatternDetectionStrategy: def detect_secrets(self, strings_with_location: List[Dict[str, Any]]) -> List[Dict[str, Any]]: """ Apply comprehensive pattern matching for secret detection: - 11 CRITICAL patterns (private keys, AWS credentials, etc.) - 22 HIGH patterns (API keys, JWT tokens, service credentials) - 13 MEDIUM patterns (database URIs, SSH keys, etc.) - 8 LOW patterns (S3 URLs, high-entropy strings, etc.) """ **Detection Categories**: * **CRITICAL**: Private keys, AWS credentials, GitHub tokens * **HIGH**: API keys, JWT tokens, service-specific credentials * **MEDIUM**: Database connection strings, SSH public keys * **LOW**: Service URLs, base64 strings, high-entropy data ResultClassificationStrategy ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Responsibility**: Organize detected secrets by severity and prepare output formats .. code-block:: python class ResultClassificationStrategy: def classify_by_severity(self, detected_secrets: List[Dict[str, Any]]) -> Dict[str, Any]: """ Create two output formats: - Terminal display format with emojis and location info - Structured evidence entries for JSON export and detailed analysis """ **Output Structure**: * **findings**: Terminal-friendly display strings with emojis * **secrets**: Structured evidence entries with full metadata FindingGenerationStrategy ~~~~~~~~~~~~~~~~~~~~~~~~~~ **Responsibility**: Generate final SecurityFinding objects with remediation guidance .. code-block:: python class FindingGenerationStrategy: def generate_security_findings(self, classified_results: Dict[str, Any]) -> List[SecurityFinding]: """ Create SecurityFinding objects with: - Secret-finder style messaging with emojis - Comprehensive remediation steps - Evidence limited to prevent overwhelming output - Severity-appropriate recommendations """ **Finding Features**: * **Secret-finder style titles**: "🔴 CRITICAL: 2 Hard-coded Secrets Found" * **Detailed remediation steps**: 3-5 actionable steps per finding * **Evidence limitation**: 10-20 items max to prevent information overload * **OWASP categorization**: Proper mapping to A02:2021-Cryptographic Failures Refactored AnalysisEngine Architecture -------------------------------------- The AnalysisEngine has been refactored from monolithic methods to a clean, focused architecture. Result Building Architecture ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Before**: Single massive method handling all result creation **After**: Focused builder methods with clear responsibilities .. code-block:: python def _create_full_results(self, module_results, tool_results, security_results, context): """Orchestrate result creation using focused builder methods (32 lines)""" apk_overview = self._build_apk_overview(module_results) in_depth_analysis = self._build_in_depth_analysis(module_results, context) apkid_results, kavanoz_results = self._build_tool_results(tool_results) # Assemble final results object full_results = FullAnalysisResults() # ... populate results return full_results **Builder Methods**: ``_build_apk_overview(module_results)`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ * **Responsibility**: Create APK overview object from module results * **Size**: 26 lines (was part of 211-line method) * **Features**: Fallback to manifest analysis if APK overview failed ``_build_in_depth_analysis(module_results, context)`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ * **Responsibility**: Create in-depth analysis object using mapping methods * **Size**: 15 lines * **Delegates to**: 7 specialized mapping methods ``_build_tool_results(tool_results)`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ * **Responsibility**: Create external tool result objects * **Size**: 22 lines * **Handles**: APKID and Kavanoz results with success/failure handling Mapping Architecture ~~~~~~~~~~~~~~~~~~~~ Specialized mapping methods handle specific result types: .. code-block:: python # Each mapping method has a single responsibility def _map_manifest_results(self, in_depth_analysis, module_results): """Map manifest analysis results to in-depth analysis structure""" def _map_permission_results(self, in_depth_analysis, module_results): """Map permission analysis results to in-depth analysis structure""" def _map_string_results(self, in_depth_analysis, module_results, context): """Map string analysis results with fallback support""" def _map_library_results(self, in_depth_analysis, module_results): """Map library detection results to in-depth analysis structure""" **String Analysis with Fallback**: .. code-block:: python def _map_string_results(self, in_depth_analysis, module_results, context): """Handle string results with built-in fallback logic""" string_result = module_results.get('string_analysis') if string_result and string_result.status.value == 'success': self._apply_successful_string_results(in_depth_analysis, string_result) else: # Resilient fallback using legacy string extraction self._apply_string_analysis_fallback(in_depth_analysis, context) Benefits of New Architecture ---------------------------- Maintainability Improvements ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Before**: - Methods with 200+ lines were difficult to understand and modify - Mixed responsibilities made changes risky - Testing required complex setup for entire workflows **After**: - Focused methods (5-25 lines) are easy to understand and modify - Single responsibilities make changes safer and more predictable - Individual methods can be tested in isolation .. code-block:: python # Easy to test individual responsibilities def test_string_collection_strategy(): strategy = StringCollectionStrategy(mock_logger) result = strategy.collect_strings(mock_analysis_results) assert len(result) > 0 assert all('value' in item for item in result) Performance Improvements ~~~~~~~~~~~~~~~~~~~~~~~~ **Parallel Execution**: Smaller methods enable better parallelization .. code-block:: python # Methods can be executed in parallel when dependencies allow with ThreadPoolExecutor() as executor: apk_future = executor.submit(self._build_apk_overview, module_results) tool_future = executor.submit(self._build_tool_results, tool_results) apk_overview = apk_future.result() apkid_results, kavanoz_results = tool_future.result() **Strategy Pattern Benefits**: Different strategies can be optimized independently .. code-block:: python # Fast strategy for basic analysis if analysis_mode == 'fast': pattern_detector = FastPatternDetectionStrategy(basic_patterns, logger) # Comprehensive strategy for deep analysis else: pattern_detector = PatternDetectionStrategy(all_patterns, logger) Extensibility Improvements ~~~~~~~~~~~~~~~~~~~~~~~~~~ **New Strategies**: Easy to add new detection strategies .. code-block:: python # Add machine learning-based detection without changing existing code class MLSecretDetectionStrategy: def detect_secrets(self, strings_with_location): return self.ml_model.predict_secrets(strings_with_location) **New Result Builders**: Easy to add new result types .. code-block:: python # Add new result builder for custom analysis types def _build_custom_results(self, module_results): """Build custom analysis results""" custom_result = module_results.get('custom_analysis') if custom_result and custom_result.status.value == 'success': return CustomResults(data=custom_result.findings) return CustomResults() Testing Improvements ~~~~~~~~~~~~~~~~~~~~ **Unit Testing**: Individual methods can be tested in isolation .. code-block:: python class TestStringCollectionStrategy: def test_collect_strings_from_string_analysis(self): # Test specific responsibility without complex setup strategy = StringCollectionStrategy(mock_logger) result = strategy.collect_strings(mock_analysis_results) # Focused assertions on single responsibility **Integration Testing**: Strategy coordination can be tested separately .. code-block:: python class TestSecretDetectionWorkflow: def test_complete_strategy_workflow_integration(self): # Test strategy coordination without implementation details assessment = SensitiveDataAssessment(config) findings = assessment._assess_crypto_keys_exposure(mock_results) assert isinstance(findings, list) Error Handling Improvements ~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Isolated Failures**: Problems in one strategy don't affect others .. code-block:: python def _assess_crypto_keys_exposure(self, analysis_results): try: all_strings = string_collector.collect_strings(analysis_results) except Exception as e: self.logger.error(f"String collection failed: {e}") all_strings = [] # Continue with empty strings try: enhanced_strings = deep_analyzer.extract_deep_strings(analysis_results, all_strings) except Exception as e: self.logger.error(f"Deep analysis failed: {e}") enhanced_strings = all_strings # Fall back to basic strings **Graceful Degradation**: System continues to work even if some components fail Migration Guide --------------- For developers working with the refactored code: Accessing Refactored Methods ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Old approach** (calling massive methods directly): - Direct access to monolithic methods was discouraged **New approach** (using focused public interfaces): .. code-block:: python # AnalysisEngine public interface remains the same engine = AnalysisEngine(config) results = engine.analyze_apk(apk_path) # Same as before # Internal methods are now focused and testable # (but still internal - use public interface) Working with Strategy Pattern ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **For security assessment customization**: .. code-block:: python # Custom strategy implementation class CustomDetectionStrategy(PatternDetectionStrategy): def detect_secrets(self, strings_with_location): # Custom detection logic custom_secrets = self._apply_custom_patterns(strings_with_location) base_secrets = super().detect_secrets(strings_with_location) return custom_secrets + base_secrets # Use in configuration assessment = SensitiveDataAssessment(config) # Could be extended to accept strategy injection Testing Patterns ~~~~~~~~~~~~~~~~~ **New testing patterns** for focused methods: .. code-block:: python # Test individual strategies def test_pattern_detection_strategy(): patterns = load_test_patterns() strategy = PatternDetectionStrategy(patterns, mock_logger) test_strings = [ {'value': 'sk_test_12345', 'location': 'test.java', 'file_path': None, 'line_number': None} ] results = strategy.detect_secrets(test_strings) assert len(results) == 1 assert results[0]['severity'] == 'HIGH' **Integration testing** for strategy coordination: .. code-block:: python # Test complete workflow def test_security_assessment_integration(): config = load_test_config() assessment = SensitiveDataAssessment(config) mock_results = create_mock_analysis_results() findings = assessment._assess_crypto_keys_exposure(mock_results) assert isinstance(findings, list) # Test workflow coordination without testing implementation details This architectural refactoring provides a solid foundation for future enhancements while maintaining backward compatibility and improving code quality across all SOLID principles.