Cybersecurity threat analysis and prediction of High-speed Railway Signal System based on knowledge graph

Published:

Introduction


1. In recent years, the signal system of High-speed Railway is facing unprecedented security threats. Since it lacks effective prediction or warning mechanisms for the Advanced Persistent Threats(APT), this project proposes study on cybersecurity threat analysis and defense technology of railway signal system

p2-16


2. This project mainly supported my master’s thesis, titled "Analysis and Detection of Cyber Threat Behavior in Train Control System Based on Knowledge Graph"

  • The specific structure of my work is shown below:
p2-0


Threat Analysis


Introducton

1. How to testify there are security issues in the rail transit? Maybe we can conduct penetration tests in a simulation environment. But how to figure out the route and goal of cyberattacks within such a complex system? The answer is to firstly conduct a theoretical threat analysis

Design

1. I’ve proposed a novel methodology for the coalescence analysis of safety and security in cyber-physical systems, namely Process-Oriented and Coalescent Analysis (POCA). Different from the traditional object-oriented methods that directly start the analysis with system components or communication links, our method mainly focuses on the specific working process of the object, which is process-oriented analysis

  • The overall framework of this methodology is shown below:
p2-1


2. POCA consists of 2 parts, which are oriented toward functional safety and cyber security respectively. POCA achieves a coalescence of these 2 attributes by drawing the 2 parts together

  • The first part, as system service process analysis, abstracts service processes into analyzable objects by referring to the STPA method, which lays a foundation of “process-oriented”
  • Another part, as system cyber threat analysis, identifies potential cyber threats based on outputs of the first part according to common security analysis methods

3. I’ve written an academic paper about the work of this part for publication

  • A comparison of POCA with some previous methods is shown below:
MethodSafetySecuritySystem ServiceComponent ConstraintThreat ScenarioTTPs AnalysisRemediation
Attack Trees---
OCTAVE----
STPA---
TARA---
Extended TVRA---
Threat Profile----
STPA-Sec---
STPA-SafeSec-
POCA-

Experiment

1. I’ve applied POCA to the Temporary Speed Restriction (TSR) scenario of the train control system, and indeed identified several threat scenarios against the TSR service

  • Since this paper has not been published, here I could only provide part of the analysis output
p2-111


Threat Simulation


Introduction

1. This part not only verifies the usability of POCA outputs, but also provides a dataset for subsequent research. Due to the closed nature of the railway system, only internal attacks are likely to be implemented. Therefore, common datasets that include external attacks such as Web penetration are not suitable for our demand. Meanwhile, the datasets include attacks against railway systems are difficult to obtain. Thus, we have to generate our own dataset

Design

Simulation range

1. I’ve built a simulation range in my laboratory according to the Signal Safety Data Network(SSDN) of train control system (networking processes):

p2-3
Click to view range component
  • Network Equipment
    • The range uses switches to carry communication between the ground equipment, and uses router to connect ground equipment and Centralized Traffic Control (CTC)
  • Data Generation Area(Environment data)
    • This area is composed of 5 servers, each running several virtual machines that simulate the ground equipment of the train control system. Since the network cards of the virtual machines are configured in "bridge mode", these servers also act as "switches"
    • The LAN of switch1 is configured as "domain"
  • Attacker
    • The attacker operates a Kali Linux host which has already connected to the ISDN server within the LAN of switch 3
  • Data Collection and Analysis Terminal
    • This terminal is a host that installs the Elasticsearch engine, which is responsible for collecting and processing data from the data generation area
    • The knowledge graph constructed later is also deployed on it
Click to view range configuration
  • Each virtual machine is installed with software to simulate services of the train control system
  • Sysmon and NXlog are installed to realize syslog generation and forwarding separately
DeviceOperating SystemIPSecurity ToolMain Software
RBC activeWindows 10192.168.4.203360 Security
Windows Defender
RBC Simulation Software
Sysmon+NXlog
RBC standbyWindows 10192.168.3.105Same with activeSame with active
ISDN Server activeUbuntu 18.04192.168.4.206ClamAVISDN Simulation Software
SysmonForLinux+NXlog
ISDN Server standbyUbuntu 18.04192.168.3.106Same with activeSame with active
TSRS activeWindows 10192.168.4.200360 Security
Windows Defender
TSRS Simulation Software
Sysmon+NXlog
EasyFileSharing
TSRS standbyWindows 10192.168.3.103Same with activeSame with active
TSR Interface ServerWindows Server 2008172.110.2.11360 Security
Windows Defender
Sysmon+NXlog
CTC activeWindows 10172.110.2.12360 Security
Windows Defender
CTC Simulation Software
Sysmon+NXlog
CTC standbyWindows 10172.110.2.13Same with activeCTC Simulation Software
Sysmon+NXlog
EasyFileSharing
Kali LinuxKali Linux 2020192.168.4.211-Metasploit
MITRE CALDERA
ELK mainframeWindows 1010.10.10.230360 Security
Windows Defender
Elasticsearch

Simulation attack

1. I designed a complete attack strategy against the system based on the aforementioned POCA output “threat scenario2”. It covers all 12 tactics and includes 18 techniques of MITRE ATT&CK

p2-5


2. I separately implemented the pre and post penetration by Kali in the range, and all syslogs (total 3 days) are saved as the raw dataset

Click to view dataset example
p2-24

Development

Log labelling

1. The highly-textual information contained in logs will greatly increase the workload and difficulty of analysis, so it’s necessary to preprocess the dataset: try to add a “label” to each log to generalize its behavior

2. A configuration file of Sysmon have actually helped us take the first step. It can map logs to the ATT&CK techniques in the RuleName, which could properly represent their security behavior

p2-7


3. However, the above work is kind of rudimentary, since about 93% logs will be labeled. In other words, its strong generalization results in low identification of real attack behaviors

  • For example, all operations achieved by Powershell will be labeled as “T1059.001 Powershell”, while they can actually be divided more specifically

4. Therefore, we’ve written another set of detection rules (repository) specialized for the commandline (powershell/cmd/terminal) inputs. It integrates 770 attack abilities of MITRE CALDERA platform, covering 11 tactics and about 240 techniques (60%) of ATT&CK matrix

  • By running the test_in_my_case.py in the repository, it will overwrite the RuleName of some logs with more precise “ATT&CK technique_ids”, and then export the processed dataset from ELK as syslog.csv at once


Graph construction


Introduction

1. The expanding scale of cyberspace leads to a sharp increase in the amount of security-related data, which are diverse, fragmented, and heterogeneous. The main challenge in current security analysis is not data shortage, but how to effectively combine information from multiple sources

2. Previous chapters have generated various data, such as threat modeling, system architecture, and system logs. To effectively utilize these outputs, we need to address the multi-source heterogeneous issue. Knowledge graph(KG), with its excellent data integration, correlation, and visualization capability, becomes the preferred technology

3. Knowledge graph is a large-scale semantic model composed of vertices and edges. It can intuitively model various security scenarios. This section intends to merge all the aforementioned outputs with existing achievements to construct a cybersecurity KG of train control system

Design

Ontology structure

1. A review article “Recent Progress of Using Knowledge Graph for Cybersecurity” provides us with a general architecture of CSKG, which consists of 4 dimensions:

p2-8


2. On the basis of this architecture, we’ve designed the following ontology structure of our CSKG of train control system. The next 4 sections will separately discuss each dimension

p2-12
Click to view meaning of edges between dimensions
DimensionRelationshipDescription
CTI-Knowledge dataCTI-TTPGeneral security CTI is described by "technique" and "tactic" in knowledge data
Behavior data-Knowledge dataSyslog-TTP"Technique" in knowledge data is used as the label to generalize the behavior of "syslog" in behavior data
Environment data-Behavior dataAsset-Process"Process" in behavior data can be correlated with "asset" in environment data based on IP address
Environment data-CTIAsset-CA"Control action" in specific railway CTI is carried out by "asset" in environment data
Environment data-CTIAsset-Weakness"Weakness" in specific railway CTI exists in "asset" in environment data
Click to view meaning of edges within dimensions
DimensionRelationshipDescription
Knowledge dataTactic-TechniqueTactic includes multiple techniques
Technique-CapecATT&CK techniques and CAPEC attack patterns have overlaps
Technique-TechniqueParent-child relationship exists among techniques
Capec-CapceParent-child relationship exists among attack patterns
Technique-Technique_mitigationMitigations can reduce the impact of techniques
Technique-Technique_detectionDetection methods can identify traces of techniques
Capec-CweAttack patterns exploit weaknesses in components
Cwe-CveWeaknesses in components include multiple vulnerabilities
CTICTI-CTISequential relationship exists among attack behaviors
Accident-HazardSystem hazards can cause accidents
Hazard-ServiceAbnormalities in service can cause system hazards
Service-CAServices include multiple control actions
CA-WeaknessControl actions may lead to unsafe control actions
Weakness-WeaknessParent-child relationship exists among unsafe control actions
TS-WeaknessThreat scenarios include multiple unsafe control actions
TS-TSParent-child relationship exists among threat scenarios
Environment DataAsset-AssetConnection relationship exists among assets
Behavior DataSyslog-SyslogChronologic relationship exists among syslogs
Process-SyslogProcesses include multiple logs
Process-ProcessAccess relationship exists among processes
Parentp-ChildpParent-child relationships exists among processes


① Knowledge data

1. Knowledge data from common cybersecurity knowledge bases, such as ATT&CK, CAPEC, CWE , CVE, MITRE Engage and D3FEND, are already linked together by researchers from MIT as an open source graph “BRON”. So I directly imported BRON as the knowledge data

  • The ontology structure of BRON’s main part is shown below:
p2-9

② General security CTI

1. This dimension is represented as the “cyber threat intelligence(CTI)” ontology. At present, most common CTI exist in unstructured or semi-structured forms. In order to construct the KG of this dimension, we need to extract those CTI into structured data

2. MITRE developed an open source platform “TRAM”, which can associate the input attack procedure (left) with ATT&CK techniques and tactics (right) to help generate CTI in a structured form as “TTPs”

p2-10


3. Through the TRAM platform and manual verification, we’ve generated the general security CTI of some common attacks, which could be easily imported into graph

  • The figure below shows an example of the extracted “file stealing” attack
p2-23

② Specific railway CTI

1. This CTI refers to the threat modeling results of target system. In this project, it is generated by POCA. The ontology structure of POCA outputs is shown below:

  • Results such as “control action”, “hazard” and “threat scenario” can be presented as ontologies
  • Results like “risk score” and “description” can be used as attributes of ontologies
arango8

2. The following is a conceptual display of entities and relationships in the specific railway CTI:

  • A small part of the content in the figure is different from the actual one
p2-2

③ Environment data

1. This dimension is represented as the “asset” ontology. It is generally based on the topology of target system and includes attributes such as OS and IP of the equipment. It not only models the physical composition of target system, but also acts as a bridge between behavior data and CTI

  • The environment data of this project has been provided here

④ Behavior data

1. The behavior data of this project is the syslog generated by Sysmon, and “behavior” is represented by the ATT&CK technique label

2. Details of this part are summarized in this repository. In short, logs whose EventID = 1 (ProcessCreate) or 10 (ProcessAccess) contain info that separately represent 2 kinds of process relations: “parentp-childp” and “process-process”. We can utilize them as well as the inherent “time” as the 3 major relations to form a syslog ontology structure:

p2-11

Development

1. As for the development, knowledge graph can be constructed through graph database. This project chooses the ArangoDB, and its basic construction processes are recorded here

2. Here, I take the specific railway CTI and environment data 2 dimensions as examples to display the actually constructed graph:

p2-13


3. Basic indicators of the constructed KG:

Click to view indicators
IndicatorDefinitionValueExplanation
NodesNumber of nodes$500966$-
EdgesNumber of edges$1685976$The graph is a directed graph
Isolated NodesNumber of nodes with no edges$22368$As some nodes in BRON are isolated, this graph is a disconnected graph
Network DensityRatio of actual edges to possible edges$6.72\times10^{-6}$The density is close to 0, indicating that the graph is sparse
Average DegreeThe sum of degrees of all nodes divided by the number of nodes$6.73$On average, each node has connected with 6.73 edges
Maximum DepthThe longest path from the root node to a leaf node$10$The graph has the maximum depth when its root is the "accident" layer and its leaf is the "CVE" layer
Network DiameterThe longest shortest path between any two nodes$14$Such path is found between an "accident" node and a "CVE" node


Anomaly Detection


Introduction

1. Modern cyberattacks are often carried out in a concealed and highly variable style, which lack obvious features or patterns. Thus, traditional methods are difficult to effectively identify them. Therefore, behavior-based anomaly detection has become an important idea, which identifies potential threats by modeling system behaviors and detecting abnormal ones on this basis

Design

Detection framework

1. From a macro perspective, the project studies 2 types of behavior: abstract threat behavior derived through theoretical analysis (CTI in KG), and specific system behavior collected through practical experiments (behavior data in KG). The overall threat detection idea is “based on CTI dimension, supplemented by other dimensions, detect anomalies in behavior data dimension”

p2-14


2. Based on this idea, I’ve designed a behavior-based anomaly detection framework shown above, which defines 3 kinds of behaviors according to the threat level from low to high:

  • System device behavior
    • It is the complete set of behavior data, including 2 subsets of mid & high-level behavior
    • Due to the high proportion of labelled syslogs, it is hard to identify the abnormal data hidden in massive normal data at this level
  • Security threat behavior
    • It is detected when some system device behaviors satisfy a specific attack pattern recorded in general security CTI
    • The idea is: search for combination of syslogs in which their “ATT&CK technique labels” match the “techniques” used by the attack pattern
  • Service abnormal behavior
    • It is detected when some security threat behaviors further conform to a specific threat scenario described in specific railway CTI
    • The idea is: among detected attack patterns, search for those that involve operations of certain “service command file” exploited by the threat scenario

Detection modes

1. As for the application, this framework can perform 2 detection modes:

  • Bottom-up means the detection is from the low-level all the way up to the high-level, and directly achieves the anomaly detection
  • However, considering the complexity of cyber attacks and the incompleteness of CTI in the KG, high-level abnormal behavior usually cannot be directly mapped through the bottom-up detection. Therefore, the more flexible bi-directional detection should be widely applied
p2-15

2. The program flowchart of following experiment based on the detection framework is:

p2-16


Experiment

1. Graph traversal is the technical carrier of this detection experiment. Since our KG has a relatively large depth, the Breadth-First Search (BFS) is more applicable and efficient

  • ArangoDB’s query language AQL has integrated multiple basic algorithms including BFS, so we could develop detection functions based on it

Security threat behavior detection (low → middle)

1. The general idea for detection at this level is:

  • Traverse within “CTI” to obtain all attack patterns’ entities (CTI)
  • Traverse to “asset” through “syslog” to find related logs for each pattern (behavior data)
  • Output the traversal path as the detection result
FOR vertices,edges,paths IN ANY 'CTI/steal1'
                CTICTI,
                CTITTP,
                OUTBOUND TechniqueTechnique_mitigation,
                INBOUND SyslogTTP,
                INBOUND ProcessSyslog,
                INBOUND ParentpChildp,
                INBOUND AssetProcess
    OPTIONS {bfs:ture}
RETURN paths

2. After executing codes similar to above, 2 kinds of attack patterns were successfully detected:

  • Lateral movement
p2-17


  • File stealing
p2-18


Service abnormal behavior detection (middle → high)

1. The basis for mapping from mid-level to high-level is the service command files. The general idea for detection at this level is:

  • Based on the traversal result of security threat behavior detection, set filter conditions for the specific command file to continue traversing upwards

2. For the detected “lateral movement”, it does not involve any command file; For the “file stealing”, the commandline input of “syslog/23647” (corresponding to the 3rd step of this attack pattern) indicates that it used the Copy-Item to copy (steal) the “TSR_Cancel.CONF” command file to a folder called “staged”:

p2-19


3. Based on this clue, the “file stealing” attack may be further mapped to the high-level service abnormal behavior

FOR vertices,edges,paths IN 1..8 ANY 'CTI/steal3'
                     CTITTP,
                     INBOUND SyslogTTP,
                     INBOUND ProcessSyslog,
                     INBOUND ParentpChildp,
                     INBOUND AssetProcess,
                     INBOUND AssetCA,
                     INBOUND CAWeakness,
                     OUTBOUND TSweakness
     OPTIONS {bfs: true}
     FILTER p.vertices[*]._key ANY == "23647"
        AND p.vertices[*].command ANY == "TSR_Cancel"
        AND p.vertices[*].security_threat ANY == "Leakage"
RETURN paths

4. After executing the above code, only the first step of threat scenario2 (node “TS2”) was matched, suggesting that bottom-up detection is insufficient for our dataset. Therefore, bi-directional detection is required to further trace subsequent steps of threat scenario2

p2-20


Service abnormal behavior detection (high → low)

1. The general idea for detection at this level is:

  • Traverse within “threat_scenario” to obtain remaining threat scenarios’ entities (CTI)
  • Traverse downward to “syslog” to find related logs for each scenario (behavior data)
  • Output the traversal path as the detection result

2. At first, node “TS2.1” was read, which involves tampering with the TSR cancel command file. However, since this file was tampered locally by the attacker, no relevant logs can be detected

3. Then, continue traversing to the “TS2.1.1” node, which involves leakage of the TSR execution reminder command file. The corresponding AQL code is:

FOR vertices,edges,paths IN 1..7 ANY 'threat_scenario/TS2'
                     TSTS,
                     INBOUND TSWeakness,
                     OUTBOUND CAWeakness,
                     OUTBOUND AssetCA,
                     OUTBOUND AssetProcess,
                     OUTBOUND ParentpChildp,
                     OUTBOUND ProcessSyslog
     OPTIONS {bfs: true}
     FILTER p.vertices[*]._id ANY == "asset/8"
        OR p.vertices[*]._id ANY == "asset/9"
     FILTER p.vertices[*].command ANY == "TSR_ExecutionReminder"
        AND p.vertices[*].security_threat ANY == "Leakage"
     FILTER p.vertices[*].TargetFilename
        AND p.vertices[*].TargetFilename LIKE "%TSR_ExecutionReminder%"
RETURN paths

4. Through the above code, abnormal behavior of operating such command file was detected:

  • Firstly, RuleName field of “syslog/24049” indicates the involvement of script and payload, suggesting that it is highly likely a trace of attacker monitoring the TSR execution reminder
  • Furthermore, the TargetFilename field records the monitored file and its location as “C:\Users\Administrator\AppData\Roaming\Microsoft\Windows\Recent”, which is typically used to store shortcuts of recently used files
  • Therefore, it can be inferred that the script used by attacker doesn’t directly monitor the original command file, but a shortcut in another directory
p2-21


5. Continue traversing to the “TS2.1.1.1” node, which involves the counterfeit of TSR execution command file. The corresponding AQL code is:

FOR vertices,edges,paths IN 1..8 ANY 'threat_scenario/TS2'
                     TSTS,
                     INBOUND TSWeakness,
                     OUTBOUND CAWeakness,
                     OUTBOUND AssetCA,
                     OUTBOUND AssetProcess,
                     OUTBOUND ParentpChildp,
                     OUTBOUND ProcessSyslog
     OPTIONS {bfs: true}
     FILTER p.vertices[*]._id ANY == "asset/8"
        OR p.vertices[*]._id ANY == "asset/9"
     FILTER p.vertices[*].command ANY == "TSR_Execution"
        AND p.vertices[*].security_threat ANY == "Counterfeit"
     FILTER p.vertices[*].TargetFilename
        AND p.vertices[*].TargetFilename LIKE "%TSR_Execution%"
RETURN paths

6. Through the above code, abnormal behavior of operating such command file was detected:

  • Firstly, “syslog/4634” corresponds to event 11, which is generated when a new file is created or the original file is overwritten. It is consistent with the fact that the attacker replaced the “TSR execute” with the stolen “TSR cancel” command file
  • Secondly, the process path recorded in Image field includes cmd.exe, indicating that the attacker replaced file through remote commandline
  • Then, same abnormal behavior was detected on CTC active (asset8) and standby (asset9), indicating that the attacker had replaced files on both devices
  • Finally, TargetFilename field clearly reveals that the attacker’s target is TSR_execution.CONF
p2-22


Assessment

Detection result

1. We collected the detection results of our KG model and performed detection on the same dataset using a log analysis platform based on the ELK engine as a reference

2. As shown below, the log analysis platform detected 40% of all the attack processes, while the KG identified 60%, with a higher availability in detecting post-penetration attack

StageProcessDescriptionDetected by KG?Detected by ELK?
Pre-penetration1Exploit vulnerabilities to establish connection with CTC from specific port×
2Elevate permissions on CTC through MSF commands such as process migration××
3Establish the proxy link between Caldera and CTC through the Meterpreter shell
4Use the Mimikatz to steal the name and password of CTC domain administrator××
5Use stolen credentials to realize the lateral movement between the CTC active and standby
Post-penetration1Search in all agent hosts for reserved historic "TSR cancel" files×
2Copy searched files to the Kali and delete the attack trace
3Tamper the content of the "TSR cancel" configuration file locally for forgery××
4Remotely upload a PS script to CTC standby to monitor the change of its "execution reminder" file×
5Once the targeted file changed, replace the "TSR execution" file with the counterfeit one×

3. To further demonstrate the advantage of behavior-based detection, we take the detection of “Post-penetration process 5” as an example to analyze the difference between the 2 approaches:

  • Actually, the log analysis platform contains detection rules for the “malicious file replacement”, but primarily based on the “source IP”, “file name”, and “replaced content”. However, in our designed attack, the attacker’s IP was pre-set as legitimate, and the target file was only replaced by another service command file, without any malicious code. This case indicates that the platform’s feature-based detection can be easily bypassed
  • On the contrary, the detection of our KG model is behavior-based. Regardless of changes in features such as “IP”, “file name”, or “file content”, as long as the adversary still exhibits the behavior as “replacing command file”, it will be recognized as an anomaly in system service

Detection efficiency

1. To preliminarily evaluated the detection efficiency, we conducted repeated detection of “lateral movement” and “file stealing” attacks using the KG model on 3 groups of datasets (primarily varying in size), and performed similar operations using the aforementioned log analysis platform

2. As shown below, considering that the KG establishes associations (shortcuts) among logs, even with a larger amount of data (i.e., knowledge dimensions apart from logs), it still exhibits higher efficiency compared to the log analysis platform’s sequential query approach

Dataset
(Number of logs)
Detection TargetPlatform
(Data Format)
Minimum Time
(ms)
Maximum Time
(ms)
Average Time
(ms)
Dataset 1 (36970)Lateral MovementELK (JSON)36.8768.8150.44
KG (Graph)1.171.481.35
File StealingELK (JSON)72.0283.4378.29
KG (Graph)3.985.534.68
Dataset 2 (50440)Lateral MovementELK (JSON)33.8157.2045.01
KG (Graph)1.952.182.06
File StealingELK (JSON)75.5691.6483.40
KG (Graph)10.2911.9311.05
Dataset 3 (525776)Lateral MovementELK (JSON)76.3199.8387.28
KG (Graph)47.8067.2157.17
File StealingELK (JSON)105.87143.68127.51
KG (Graph)96.47124.71110.29

3. However, the test data also demonstrates that the query efficiency of graph database is more significantly affected by the data size. This highlights the importance of deploying distributed graph database to ensure optimal performance when handling large data volumes

Conclusion


1. This thesis focuses on threat modeling and anomaly detection research for train control system. The main contributions are:

  • A novel threat modeling approach is proposed, which integrates security analysis with the process of system service to achieve the coalescence of functional safety and cyber security of cyber-physical systems
  • A cybersecurity knowledge graph of railway train control system is constructed, which provides researchers with a global analysis perspective by using multidimensional data to model the behavior of railway systems
  • A abnormal behavior detection framework is proposed based on the constructed knowledge graph, which can effectively detect major attack behaviors hidden in system logs and provide intelligible visual outputs

2. Although certain results have been achieved, there are still limitations and researchable issues:

  • The POCA provides a relatively simple description of the attack patterns involved in threat scenarios, which directly leads to the inability to effectively associate 2 types of CTI when constructing the knowledge graph
  • Manual analysis is used to assist the graph model in the bi-directional detection. With the development of AI technology, the attack and defense scenarios will gradually become intelligent. Our graph model should also integrate a variety of model-based intelligent technologies to achieve fully automated analysis and detection