
Supervised learning on heterogeneous, attributed entities interacting over time
Amine Laghaout, CSIS Security Group A/S
July 2020
ABSTRACT:
Most physical or social phenomena can be represented by ontologies where the constituent entities are interacting in various ways with each other and with their environment. Furthermore, those entities are likely heterogeneous and attributed with features that evolve dynamically in time as a response to their successive interactions. In order to apply machine learning on such entities, e.g., for classification purposes, one therefore needs to integrate the interactions into the feature engineering in a systematic way. This proposal shows how, to this end, the current state of graph machine learning remains inadequate and needs to be be augmented with a comprehensive feature engineering paradigm in space and time.

An architecture for processing a dynamic heterogeneous information network of security intelligence [link]
Marios Anagnostopoulos, Egon Kidmose, Amine Laghaout, Rasmus L. Olsen, Sajad Homayoun, Christian D. Jensen, Jens M. Pedersen
15th International Conference on Network and System Security
Tianjin, China | 23 October 2021
ABSTRACT:
Security intelligence is widely used to solve cyber security issues in computer and network systems, such as incident prevention, detection, and response, by applying machine learning (ML) and other data-driven methods. To this end, there is a large body of prior research works aiming to solve security issues in specific scenarios, using specific types of data or applying specific algorithms. However, by being specific it has the drawback of becoming cumbersome to adjust existing solutions to new use cases, data, or problems. Furthermore, all prior research, that strives to be more generic, is either able to operate with complex relations (graph-based), or to work with time varying intelligence (time series), but rarely with both. In this paper, we present the reference architecture of the SecDNS framework for representing the collected intelligence data with a model based on a graph structure, which simultaneously encompasses the time variance of these data and providing a modular architecture for both the data model and the algorithms. In addition, we leverage on the concept of belief propagation to infer the maliciousness of an entity based on its relations with other malicious or benign entities or events. This way, we offer a generic platform for processing dynamic and heterogeneous security intelligence with an evolving collection of sources and algorithms. Finally, to demonstrate the modus operandi of our proposal, we implement a proof of concept of the platform, and we deploy it in the use case of phishing email attack scenario.

Cyber security – avoid harmful websites
Artificial intelligence from research to business
DTU Compute
October 2021
ABSTRACT:
At DTU Compute, we conduct research into using Machine Learning algorithms to strengthen cyber security in the project called SecDNS. In collaboration with the private company CSIS Security Group, researchers from DTU and Aalborg University are working to find new solutions
to prevent accidental disclosure of information to criminals or visits to malicious, virus-infected websites

Detecting Ambiguous Phishing Certificates Using Machine Learning
Sajad Homayoun, Kaspar Hageman, Sam Afzal-Houshmand, Christian D. Jensen, Jens M. Pedersen
The 36th International Conference on Information Networking (ICOIN)
IEEE
Jeju Island, Korea | 12-15 January 2022
ABSTRACT:
Recent phishing attacks have started to migrate to HTTP over TLS (HTTPS), making a phishing web page appear safe to the user’s browser despite its malicious purpose. This paper proposes new data features as well as machine learning-based solutions to predict digital certificates involved in HTTPS as phishing or benign certificates. In contrast to previous works that consider this a binary classification problem, we take into account that a certificate can be partially benign and phishy simultaneously. We propose a multi-class classifier and a regressor to classify these ambiguous certificates, in addition to benign and phishing certificates, where the `phishyness' of a certificate is expressed as a value between 0 and 1 for the regressor. We apply our method to a set of certificates obtained from certificate transparency logs and show that we can classify them with high performance. We extend our validation by evaluating the performance of the model over time, showing that our model generalizes over time on our training data set.