Cyber deception

Table of contents

Table of contents Introduction Overview Psychological background for deception Deception in Cyber Security Objectives Risks Taxonomy & definitions Interesting concepts References Relevant talks Open Source Implementations Large projects and framework-like implementations Single implementations Commercial Products Related tools

Introduction

This page has the purpose of organizing the information collected about the state of the art of deception technologies. The research has been carried out to achieve the following goals: (1) survey the stateof the art and collect interesting design principles and techniques on the subject (2) Identify proper definitions and taxonomy for the involved products / components / implementations (3) find and outline available implementation, both open source and commercial products. As for the latest point, simple implementation and significant projects have been looked at with different depths with the multiple sub-goal to identify reusable implementations and/or design principles on on hand and possibly identify projects to be selected as a ground base to build upon on the other. This page is organized as follows: Section Overview introduces the concept of deception and its background from psychology and social sciences. The elements in this section will turn up useful in the design stage for deception strategies. Section Deception in Cyber Security introduces its application and purpose in cyber security, with examples of the different directions observed from the revirewed literature Section Taxonomy & definitions provides a quick summary of the taxonomy on the subject, common to most reviewed literature. Section Mixed interesting concepts outlines a short summary of the planning, design and deployment principles and features of interest observed in the reviewed implementations, projects and poducts. Section References holds a list of the reviewed literature Section Open Source Implementations outlines most significant open source projects and outline in a table some details of most of the available OSS implementation reviewed. Section Commercial products describes the commercial products in the space of the subject matter identified in the research. Section Related tools outlines a list of tools related to implementation within the subject that can be useful in designing and/or managing something related.

Overview

All warfare is based on deception –Sun Tzu, The Art of War Deception has historically been a common practice in different fields in human history employed in situations of conflict to support either tactical or strategical objectives of one of the contending parties (the deceiver) over the other. Common example of deception have been long observed in different fields, such as: Military strategy – e.g., divert adversary intelligence to influence troops deployment in order to gain advantage on the battlefield Politics – e.g., false propaganda to collect consensus and raise to positions of power Economic trades negotiations – e.g., persuading the counterpart of a different best alternative to negotiated agreement Financial frauds – e.g., impersonating someone else’s’ identity to retrieve or transfer funds Cons – e.g., persuading individuals in providing money or goods with false promises or in exchange for negligible values in comparison Human relationships – e.g., cheating on one’s partner while pretending to be engaged in an healthy relationship Human computer interactions - e.g., dark patterns, such as making unconvenient mandatory features extremely difficult to reach in UIs or having user form habits then leveraged to perform unwanted actions Entertainment – e.g., “magic” shows A good introduction to deception in the different fields is provided in [18], one of the first pioneers of the topic of deception applied to information security. Conceptually, deception is the attempt to influence the decision making process of the opponent to make her take actions which are usually against her interest and to the specific benefit of the deceiver [2, 5]. In military terms, deception is explicitly meant to give the deceiver an edge into the Observe, Orient, Decide, Act (OODA) loop [1]. Such practice, which is inherently covert by its nature, in different domains is most often subject to a certain degree of uncertainty, i.e., whether or not the adversary is actually deceived. From a complimentary perspective, should the deception not hold, the operation is also possibly subject to be uncovered and the deceiver subject to counter-deception, giving the practice an inherent recursive nature. For such reason when deception is employed, either in a known or unknown conflict for the parties involved, secrecy is most often considered fundamental and the practice to ensure the secrecy condition, also known as operational security or “OPSEC”, is a requirement [18]. Different models have been proposed to formalize deception mechanisms but what they all share is the core concept of the deceiver generating and/or maintaining a competitive advantage in terms of knowledge of the truth over the counterpart. In other words, the general goal of the deception is the prevention of a true belief and/or the formulation of a false belief as a misperception intentionally induced by another entity [41]. As stated in [5], deception relies on the asymmetry between deceiver and her target, by providing indirect information which is difficult to verify for decision-making, but for which on the other hand the deceiver (or her mechanisms) pretend to be a trustworthy provider. Generally speaking, a given instance of deception is characterized by two components: The simulation of something that is not real to induce a false belief The concealment or dissimulation of something real to either keep it hidden or make its disclosure slower to the point that sub-objectives can be achieved before that happens The expected outcome can be as well categorized as one of two types: A-Type, also known as confusion, as in increased uncertainty (e.g., by introducing noise and or leading the adversary to inconclusive information that could be crucial for her decision making process but she is not aware if they are truth or lies) M-Type, also known as deceit, as in increasing confidence in a wrong perception which will eventually lead to an action against the target own interest [18]

Psychological background for deception

Despite the generality of the topic and its specific applications in different fields, all forms of deception attempt to exploit human cognition process by leveraging its known characteristics [18]. Truth to be told, this research has not deeply investigated nor accidentally found any extensive knowledge scientifically proven about how the human cognitive processes operate, but some elements seems recurrent in related work, have been widely experimented for confirmation and are basically accepted by cognitive psychologists. For instance, a good summary of pre-2000 results on psychological aspects of self-deception is provided in [36]. For instance, it is believed that individual cognition processes through two reasoning systems [39]: System1, also known as associative, which is automatic and heuristic based System2, which is rule-based, slower in comparison for processing and driven by logic and rationality Deception targets System1, which is mostly affected by emotions and which heuristics can be more easily diverted, e.g., by leveraging one or more of the individual biases. Def. Bias - Inclination to judge or interpret based on personal and oftentimes unreasonable point of view. Biases are typically considered in the following categories, in order of specificity: personal, organizational, cultural and cognitive. With “specificity” it is intended the fit of the bias to his carrier individual, rather than to the class such individual pertains (such a given organization, culture, or specie). The more specific the bias, the easier is usually considered to exploit, although when facing prior unknown adversaries, the more specific the bias the less likely it is to match the actual adversary that will be confronted. A personal bias is related to a specific individual and leverages the principles and concepts that the specific individual has formed through her experience and are extremely subjective. An organizational bias is related to the influence that habits or routines from a specific organization or group the adversary belongs to have over her perception and evaluation. A cultural bias is related to the principle of the culture the adversary belongs to and is therefore supposedly influenced by. Given the relatively small number of cultures (in comparison with organizations and individuals), cultural biases have been analyzed and are usually measured by the evaluation of the following metrics: Power-distance index (PDI) – A measure of the expectation and acceptance that power is distributed unequally. Individualism versus collectivism (IVC) – A measure of the belief that the good of the group is more desirable than the good of the single. Masculine vs Feminine (MVF) – A measure of the tendency to assign lower value to individuals belonging to the female sex rather than their male counterpart. Uncertainty Avoidance Culture (UAI) – A measure of how much a structured response can be expected. This also usually represents the tendency to avoid risks. Long term orientation vs short term orientation (LTO vs STO) – A measure of the attribution of greater value in building for long term advantages rather than short term satisfaction. Indulgence vs Restraint (IVR) – A measure of the tendency about choosing of activities for leisure and happiness rather than refrain e.g., to save resources or to meet social acceptance. A cognitive bias is related to the tendency to repeated or basic mistakes in reasoning, e.g., making wrong assumptions based on inconclusive information. Cognitive biases can vastly influence an individual decision-making, sometimes as a baseline habit of the individual but more often when the individual is presented with uncertain conditions. However, the low specificity of cognitive biases makes them quite common and therefore recognizable: adversaries aware of deception techniques might easily catch on the attempt to leverage a cognitive bias and thus uncover the deception, possibly initiating counter activities. A number of cognitive biases are well known in literature, such as the following: Anchoring bias – Using the first relevant information collected about the solution to a problem to establish a baseline for the evaluation of solutions of further instances of problems of the same class. Apophenia bias – The tendency to perceive relevant patterns in random data. Attribution bias – Actually a class of biases, represent the tendency to come up with pseudo-reasonable explanations, which do not necessarily reflect causes of events actually determined by personal and subjective reasons. Confirmation bias – Tendency to search, interpret and recall information in a way that confirms a previously established hypothesis while giving disproportionally less attention to other which might contradict it. Framing bias– The tendency to consider a situation as happening in a known context characterized by features, which are just part of the subject filters used to perceive the world in the specific context where the situation is happening (i.e., a “frame”), rather than assessing the actual context and considering its objective features. Halo and Horn effects - The tendency to judge specific properties of an object, a person or an organization with the same positive (Halo) or negative (Horn) characterization that pertain to the overall impression of the subject of the object. Self-serving bias – The tendency to judge in favor of outcomes that preserve or increase the perception of the value of the subject performing the judgement. Status quo bias – The tendency to assign greater value to options that maintain the current conditions in comparison with those that introduce changes Representativeness bias – Tendency to acknowledge unproved information if they have similar features of their parent population in the individual hierarchy of concepts

Behind any deception is therefore assumed some sort of intelligence to profile the counterpart in a way that reasonable assumptions can be made about her biases and likelihood of decision-making outcome. Human cognition is chaotic and difficult to predict; there are however some principles which are commonly leveraged to increase the success rate of the deception either by reducing the likelihood of unexpected influences on the process that the deception is trying to direct, or by reinforcing some observed mechanisms that are likely to recur. As follows, a collection of various examples from the reviewed literature. Deception must fulfill the expectations of the target. It has been observed that is significantly easier to lead a target astray by reinforcing her existing beliefs rather than creating new ones. This is actually a plain simple application of the confirmation and framing biases, used at a higher level of abstraction with respect to the belief that the deception is fabricating or neglecting for the target. Limited resources lead to controlled focus attention [18]. Exhausting the resources of the target will force her to look closer at each single problem without considering the potential noise and apparently irrelevant conditions. However, the closer one looks at a problem, the less likely she is to see the bigger picture in which the overall meaning of that problem might change completely. This closely relates with the following principle, as in the more wide, decomposed and detailed in its components a flow of information is, the more likely the individual evaluating it will lose sight of its overall story by exhaustion of resources and focus on validating the components, which might be convincing on their own regardless of the sense of their composition. Humans do not attach probability judgment to events but rather to the description of the events. Also, the more detailed and unpacked the description is (i.e., deconstructed into several disjoint components that can be easily accepted) the more increase the likelihood of people to attach to it, either by heuristic or exhaustion of resources. Sequencing – if elements are sequentially discovered in a coherent story to the target the chance of the belief to be incepted is increased. Simulated cause-effect relationships can be seen as an instance of this principle, in which the expectation of effects resembling the causes of a given event is fulfilled and exploited to reinforce the belief being conveyed. Cry-wolf – something appearing in multiple instances with an apparent continuity is usually perceived as common or normal, and therefore innocuous. Thie builds over the individual bias that if the observed event was a threat its effects would have manifested already and possibly somebody would have done something about it, as opposed to the more logical consideration that effects can manifestate long after their cause in time. Multiple confirmation of the same clue lead to significant reinforcement of the belief that the clue is conveying. Commonly, due to the confirmation bias, once a clue is confirmed (something is seen twice) it is erroneously perceived as true by most without any other supporting evidence or further investigation. The same instance of clue experienced through multiple channels tends to have the same effect of multiple instances of the same clue. Therefore, the number of channels through which deception is implemented improves its chances, although it makes it harder both to implement and to keep secret (i.e., have the deception not being discovered) due to the delivery channels usually being shared by their nature with the adversary. When the goal of deception is to prevent a given action to be taken, the provisioning of inconclusive information and lack of a definitive answer can generate confusion (Type-A) leaving the target in a state of doubt, which will likely prevent e.g., high UAI culture-minded targets to proceed due to their typical high risk adversity.

Deception in Cyber Security

Never attribute to malice that which is adequately explained by stupidity –the naïve black-hat who fell for it Deception in cyber security is embodied by a set of well-known practices for both defense and offense purposes, of which a comprehensive overview is provided in [2]. Core concepts for defense such as honeypots for early detection and intelligence gathering have been leveraged for decades and can be found in early literature such as An evening with Berferd [37] or the popular novel The Cuckoo’s Egg by Clifford Stoll [50, 51]. While traditional security controls are reactive and focus on attackers’ actions to enforce some sort of denial, deception focuses on adversaries’ beliefs and attempt to manipulate them into taking actions (or not taking actions) to the advantage of the defense of targeted systems [1]. Computer systems have historically been designed to provide feedback to the user also in case something went wrong within their operation. Such design principle usually makes it such information is provided also to potentially malicious actors about the status and implementation of the system. Deception in Computer and Network Defense (CND) leverages this channel almost inevitably used by attackers to deliver specifically crafted information to manipulate their beliefs and consequent actions. Counterpart examples in offense include the wide set of techniques belonging to social engineering as novelized in The Art of Deception by Kevin Mitnick [52]; eventually, usage of deception techniques in offense grow more in time as a response to the evolution of defense mechanisms such as anti-viruses, intrusion detection systems or network traffic analysis solutions. To evade these, offense tactics do employ all sort of deception techniques from a technical perspective such as payload encryption, concealment within apparently legitimate software, usage of covert channels for communications or steganography for data exfiltration and the general living-off-the-land approach. This research will however focus on the defense perspective, hence offense techniques will not be covered except when either relevant as a possible form of counter-deception or useful to highlight characteristics of adversaries’ profiles, e.g., for deception strategies targeting advanced adversaries such as those modeled in the MITRE ATT&CK framework. From the defense perspective, deception represents one of the few domain in which an advantage in inherently possessed against the adversaries. Defenders usually have more extensive knowledge of the networks and systems they protect, and can therefore leverage what is sometimes referred as the fog of war [T3, T5], i.e., the partial visibility that the adversaries will have over their target. Such advantage is also clearly represented in the different attempts to formalize measurable models to evaluate and guide deceptions strategies from a game theory perspective, e.g., as a one sided partially observable stochastic game in which a player has full observability and another has only partial observability. Several game-theory-based formalizations (i.e., as a non-cooperative two-player dynamic game, partially observable stochastic game, dynamic Bayesian game) are provided in [3, 4, 5]. [40] presents a survey of such formalizations and related taxonomy, while [7] builds on a formalization that considers the stochastic occurrence of cyber-attacks to propose a statistical framework for their analysis along with a case study. The model proposed in [11] accounts instead for the probability of successful cyber-attacks computed by enumerating all paths across a multi-layer graph structure with nodes representing humans, local assets and global assets, as a leading metric to implement multi-layer deception mechanisms for defense. With a different perspective, [21] presents a more speculative model of errors in perception and conditions under which these occur.[40] reviewed a set of publications about cyber security deception from a game-theory perspective and identified 6 types of defensive deception strategies, as follows: Perturbation - Degradation of the observed information value by data pollution, i.e., corrupting the actual data for which precision represents a low value to retain in relation to the risk of the precise information being compromised by the adversary. Moving target defense (MTD) – Continuous change the shape of the attack surface to the point that an adversary cannot plan and execute an offensive initiative in time before the attack surface changes, thus forcing her to reinitiate the planning stage. Obfuscation – Hiding relevant information by introducing external noise revealing useless information, i.e., increase entropy in leaked information by inserting fake information into collections of valuable information to make them look less valuable [1], also such as described in [48]. The assumption here is the lack of capability or resources of the adversary for analyzing the data to distinguish and remove the useless information and use the remaining to achieve her objectives. Mixing – Have relevant flows of information sharing the same channels to the point that source and destination are not distinguishable from an observer positioned on paths reachable by the adversaries. The assumption in this case is that source and destination of information are the actual values to be protected and that adversary presence can be contained to the abovementioned paths. The Onion routing (also known as, the Tor network) employs this principle to provide anonymity to its users. Honey-X – The usage of the approach used by honeypot-like technologies, widely discussed in the rest of this research Attacker engagement – Provide false or inconclusive feedback to the point to exhaust the adversary resources necessary to proceed with their initiatives or to collect enough intelligence about them to either execute denial-based response through traditional security controls or adjust their tuning for future detection. From a technological perspective, a number of mechanisms have been developed over the past decades to act as decoys and simulate services or systems either of general purpose or for very specific environments such as Industrial Control Systems / Supervisory Control And Data Acquisition (ICS/SCADA) [53] or Internet of Things (IoT) devices [10, 26, 27, T1]. For the sake of simplicity, such mechanisms are described in the following section alongside their definitions. Their available OSS implementations and commercial products are instead summarized in subsequent sections. Deception mechanisms have been proved effective also by several experiments, such as the one conducted in 2013 by MITRE to test the Blackjack platform. In the experiment, despite the failure of prevention mechanisms (denial) in keeping the red team out of the systems, the blue team succeeded in feeding them false information and degrading the content to the point to make the outcome negligible from the attacker perspective [48]. In the mentioned case however, technological component played a limited part in the game, since the success of the deception was vastly decided by blue team military experience in counter-intelligence. Other experiments have been also focusing on exposing some of these mechanisms to traffic originating from the Internet [16, 19, and 29]. For instance, [16] presented an experiment highlighting, besides the findings related to apparently malicious attempts to compromise the simulated systems, the complexity in managing a large-scale deployment of such mechanisms. Such challenge seems a common struggle from the reviewed literature and is further discussed in one of the following sections. As another example, interesting results from [29] show an empty intersection between the set of machines performing brute-force against simulated SSH services and the set of machines using the discovered credentials to initiate a connection. This suggests dedicated machines (compromised boxes possibly running malware part of botnets) are used to perform brute forcing while other machines (most likely reached through hard-to-track-through proxies) are used as jump endpoints for the actual connections. They also noted that none of the IP addresses detected as attackers was included among those presented in [16], suggesting either high volability of compromised machines or more likely different ownership of botnets used at different times. As an example of threat intelligence generated by the experiment presented in [29], an analysis of the dictionaries used for brute force attempts was also provided as part of the publication.

Objectives

Leveraging deception techniques in cyber defense can be motivated by different goals. Two categories of goals can be identified as: Intelligence gathering – adversaries have a limited set of tactics, techniques and procedures (TTP) which development takes time and eventually decreases in value due to defense mechanisms evolution, which catch up with them providing detection capabilities that force the adversaries to research and implement variants or new ones. Therefore, collecting intelligence by monitoring attackers’ operations after detection [33], collecting malware samples such as polymorphic worms hard to identify with common signature-based methods [30], identifying supposedly covert communication protocols and generally speaking keeping track of their TTPs does provide the defenders an edge on arranging or enhancing further detection or prevention mechanisms. In this case, deception technologies are used to hide from the adversaries the fact that they have been detected while keeping they engaged in the network and recording their operations as threat intelligence (def. threat Intelligence: the set of data collected assessed and applied regarding security threats, threat actors, exploits, malware, vulnerabilities and IoC). This will then be further elaborated either for research purposes or as input for existing security controls (e.g., defining SIEM correlation rules, expand signature databases or, more trivially but most common, report vulnerabilities being exploited to the related vendor and applying proper patching). Technologies such as honeypots serve well this purpose, since in most cases they are disposable and separate systems from production hosts, in a way that they can be subject to significant compromise without significant impact and even frozen and removed for subsequent forensic analysis. Active defense – Deception technologies are also employed as active defense mechanisms. In this case the baseline is still to manipulate the adversary’s belief on whether or not she has been detected, but with a primary focus on disrupting her immediate operations and only a secondary focus on collecting intelligence on her TTPs for future defense enhancements. The time window between detection and its actionable outcomes might vary depending on the deception strategy and on the level of information that are deemed of interest for collection. Also, denial-based security controls activation has to be carefully planned to avoid the possibility for an adversary to exploit the possible automation should the deception be uncovered. It is noted in the reviewed literature that the need for operational security of the deception does not imply that the value provided is security through obscurity, because deception technologies represent an enhancement to traditional reactive security controls, needed to cope with the uncertainty of the constantly evolving context of the threat landscape and the potential unbalance of information between the adversaries. Such practice has historically been employed at different levels (or stages) of the typical offensive attack process, which can be modeled in a simplified version by borrowing the concept of “kill chain” from military jargon which resulted in the model of the cyber kill chain [54] . The concept of the cyber kill chain is that an attacker will go through each of the stages of the chain to achieve her objectives and therefore disrupting any of these stages will result in an high chance of the overall attack process to be disrupted. Note that, when looking at the cyber kill chain keeping concepts from the military world in mind, each stage involves a cycle of the OODA loop from the attacker perspective. Therefore, each stage can be influenced through simulation and concealment in the adversary’s observation phase to have her orient, act and decide in a different way rather than how she would normally do to achieve her objectives. As follows, a list of examples of deception goals, tactics and technologies across the stages of the cyber kill chain: Reconnaissance – The reconnaissance phase (also simply “recon”) is the phase in which the adversary gathers information on her target. This can either be passive (i.e., leveraging resources outside the control of the target, such as whois databases, DNS, social media, etc.) or active (e.g., network scanning). Both can be subject to deception in such a way that the adversary would not be able to, for instance, distinguish between real and false information. The goal at this stage is usually to confuse and mislead the adversary, make her waste time, and generally make her unprepared – either knowingly or unknowingly – for the next stage. Depending on the deception strategy, an increase in the risk of the attack from the adversaries’ perspective might also be beneficial, as it will increase the chance for the adversary to just give up and interrupt the process. Such assumption however will not likely hold for some categories of attackers, such as those either with enough resources to ensure their own operational security or with no incentive for stealthy operations to avoid detection and possibly attribution. From technological perspective, deception mechanisms involved at this stage are mostly focused on the possible addition of misleading information into the sources an attacker is likely to use for passive recon (e.g., see honeypeople) or the usage of LIHP. Weaponization & Delivery – Weaponization & Delivery are phases in which the adversary prepares for the attack by building her toolchain and the delivery mechanisms for the initial access to her target. Since such activity is mostly carried out offline, it is hard to interfere with it. However, since the delivery mechanism needs an interface to perform its function it is also likely that the active part of the recon will drift in this phase in which the adversary will actively probe the target attack surface, if not for testing at least to attempt the delivery. LIHP and sticky honeypots are typical technological means of deception in this context. Depending on the defense strategy, denial of further access to the network can be immediately enforced – which will not stop any even slightly skilled adversary – or additional detection mechanisms can be focused on the sources of the identified malicious activity. Exploitation & installation – The exploitation & installation phases are when the adversary attempts to gain the initial access to the target. This could be for instance by delivering a payload to a vulnerable network-attached service or by employing other initial access technique to gain a foothold into the target, then possibly setting some persistence mechanism in place to regain access later in time or to ensure the access is retained upon changes of the initial conditions. In these phases, adversaries expect some kind of interaction with their target to validate the success of their attempts. Depending on the level of sophistication of the adversaries, MIHPs or HIHPs would be the most likely suitable mechanisms to engage them and divert their actions in this stage. Command & Control (C2) – In this phase the adversaries operate under the assumption of a successful compromise and possibly rely on a software implant to act as an agent on the compromised system. Such agent is typically tasked with actions that might vary from operations to be performed on the compromised machine (e.g., tampering of locally stored data, attempt to escalate privileges, gathering additional credentials, etc.) to reinitiating the recon phase and act as a jump to additional targets previously not exposed as part of the attack surface. In this phase, the level of interaction expected by the adversary with the system is at its peak. Such expectation include the possibility of having code executed with the privileges gained during the exploitation phase. Code execution might result in the most diverse interactions and can be hardly simulated by anything except HIHPs, which are therefore the most likely deception technical tool employed against attacks at this stage. Lateral Movement – Lateral movement is a phase in which the adversary attempts to leverage the gained foothold to extend her presence across the target resources. This could take the form of using a compromised host as a jump to reach additional systems or, simply and most commonly, leveraging resources acquired with the initial compromise, such as credentials or other information, to gain access to additional systems. Any good CND practice should assume the constant presence of an adversary who had reached this stage and structure accordingly to enforce the defense in depth. In this context LIHP make their appearance again given the possibility of the compromised system being used for the initiation of another cycle starting at recon. However, additional mechanisms find their place in this phase, such as honeyaccounts and honeytokens. Staging & Exfiltration – This phase, also known as Action on objectives embodies the part of the attack in which the adversary supposedly reaches her goal. These might significantly vary from information theft, to data tampering, to plain simple disruption of the target operations. A number of mechanisms are known to provide means for deception at this stage, including honeyfiles, honeytokens, endless files and fake keys.

Risks

There are a number of caveats and risks associated to the use of deception in cyber defense. A comprehensive overview from the reviewed literature, specifically referred to honeynets, is presented in [24]. As follows, the most commonly considered are reported: Detection – once the deception mechanism is discovered, its value is dramatically decreased – A discovered deception mechanism not only lose its usefulness, but can also easily become counter-productive, e.g., by being subject to counter-deception and fed with poisoned intelligence to mislead the CND staff. Also, a discovered detection mechanism will be exposed to the risk of being exploited in its possibly automated integration with denial-based security controls (e.g., a LIHP which activity sources gets blocked by a firewall could be targeted with spoofed traffic causing the firewall to block potentially legitimate traffic, therefore causing a denial of service). Harm – a deception mechanism such as a HIHP find its purpose in being compromised and used by an adversary, unknowingly of the difference between the machine she is operating and a real system. Due to the nature of the HIHP however, there is always the risk that an attacker will use the system to attack other non-honeypot resources. Limitations to the ability of the HIHP could prevent this condition to the given extent they are designed for, but as soon as such extent becomes noticeable to the adversary, it defeats the secrecy of the mechanism incurring into the previously discussed detection risk. Disabling functionalities – Upon the expected compromise, an adversary could theoretical exploit the jailing mechanism e.g., of an HIHP itself and disable the control plane (also known as the monitoring channel), removing the competitive advantage of the defender and possibly deceiving them with false negative indicators. Violation – a compromised deception mechanism designed on purpose to allow adversary interaction such as an HIHP could be leveraged to perpetrate criminal activities against target and scope outside or beyond the organizational boundaries (e.g., attacking other systems over the internet, hosting illegal content, distribute malware, etc.). In this case, hypothetical attribution will point initially to the organization that initially set up the deception mechanism, with all sort of related legal issues and need to interact eventually with the relevant authorities. The same consideration stated for the Harm risk about possible limitations and their tradeoff with operational security holds. Additionally, a short overview of the legal risks in employing honeypots as a mean for decepion is also discussed in [2], section VI, where challenges related to the claim to entrapment, privacy and liability are discussed. Briefly: The claim for entrapment should not apply in most cases since server honeypots do not actively lure adversarial activities. From a privacy perspective, legal risks are presented as seemingly negligible as long as the purpose for the data collection is the the defense of owned assets and collected data is treated accordingly. It is mentioned however that, althought processing of transactional data is common practice and not persecuted by many countries, some of the collected data (e.g., IP addresses with associated timestamps) are in some countries (e.g., Germany) considered personal data. In the case of network defense, it is unclear how in these cases the situation should be considered and if there is any internationally accepted practice on how to treat such collected information, or if it is even allowed to collect it without consent at all. As far as liability goes, such risk is relevant when incurring in the possibility of the aforementioned Harm or Violation. This is inherent in the usage of HIHPs and can be mitigated only to some given extent depending on the level of fidelity that the deceiver wants to expose. Specifically, the advice proposed in [2] is to apply frequent restore to a non-compromised state as well as to employ restrictions and limitations in terms of the possibility to use the compromised HIHPs ability to reach elements not part of the organization. As mentioned, the application of such practice would have to find the proper tradeoff with the fidelity the decoy needs to expose.

Taxonomy & definitions

In this section, the technical mechanisms used in cyber deception outlined in reviewed literature are summarized. Many of these are mentioned in other parts of this research without further introduction but they should be fairly clear from their definitions as hereby reported. Honeypot (HP) - Honeypots are computing resources deployed in networks in which they meant to be probed, attacked and/or compromised by adversaries. They are designed to be scanned, attacked and compromised [45, 37, 38, 51]. According to [8], they are logically composed by: (1) A decoy (i.e., the simulation), which should present some degree of fidelity, i.e., the level of exactness with which resembles the simulated system; (2) A security program (i.e., the control & monitoring means through which the simulation is operated), which should facilitate the decoy related activities (e.g., activity detection, monitoring, etc.). Every activity on an honeypot is usually an indicator of compromise since they are not supposed to be used besides administration – and possibly honeyactivity – which can be easily excluded by the security program. Such characteristic makes honeypots presenting a negligible false-positives rate, which turns them into useful instruments for aiding CND. Server honeypot – A server honeypot is a type of honeypot for which the decoy mimics a service or server. Server honeypots are passive, in which they do not present any initiative in engaging the attackers. When generally referring to an honeypot without specification of a client or server honeypot, usually a server honeypot is intended. Client honeypot – Also known as honeyclients [15, 8, 28], are a type of honeypot used to investigate client-side intrusion and to engage potentially harmful systems. A client honeypot is a software pretending to be a client application either processing its functionalities or acting on behalf of a user. Client honeypots applications varies from inspecting web-based resources simulating the behavior of a web browser to controlled executions of software that must process untrusted input (e.g., malware samples). [2] presents many client honeypot implementations that could be leveraged as a support service for 2nd level analysis in security operations. Low-Interaction honeypot (LIHP) – A type of honeypot that provides minimal interaction, usually just an open port and minimal supported protocols response. LIHP are commonly used to detect port-scanning activities either along a perimeter or within protected networks for early detection of lateral movement. Honeyd is one of the most common implementations used to deploy and manage LIHPs. Medium Interaction honeypot (MIHP) – A type of honeypot which emulates the decoy logic by programmatically serving responses comparable to these of the simulated system. MIHP are difficult to implement because their success rate depends directly on their ability to simulate the real system, which might turn out time consuming and difficult to achieve without incurring in implementation distinctive traits that will eventually enable a trivial fingerprinting of the honeypot. A nice picture of an attempt to establish a MIHP for intelligence gathering is portrayed on [37]. Popular examples include e.g., Dionaea, Kippo, Cowire and Glastopf. High interaction honeypot (HIHP) – A type of honeypot that mimics – and usually is – a fully functional system, closely monitored to the point to be fully instrumented forensic devices. Instances of HIHPs can be easily detached from the network and replaced with other instances to be submitted for a complete forensic analysis. A popular example is the cuckoo sandbox for malware analysis. Recent trends in HIHPs is to leverage virtualization technologies to implement them as disposable VMs, which can be monitored through hypervisor mechanisms without such thing being highly evident from within the VM, i.e., a virtual honeypot (VHP). VHP also have the advantage to be easily restored to a previous known and non-compromised state through typical virtualization snapshot features. A fairly long list of techniques however have been developed to find out whether code execution is taking place within a VM or not, and are usually employed by malware to avoid automated sandbox analysis, which is to some extent a kind of implementation of HIHP [13, 29]. Containerization is also an alternative in which Linux container mechanisms are leveraged with the same purpose, e.g. as presented in [31]. A summary of the results of exposing an HIHP to the internet with an SSH decoy is presented in [29], which also depicts the trouble in setting up an hard-to-detect HIHPs for which in their case a modification to the Linux kernel was needed in order to create a reasonably hidden monitoring channel. HIHPs usually require significant maintenance effort and are among the deception mechanisms, which expose the deceiver to the highest risks. For instance, one of the very initial reference to honeypots [37] outlines the issues in establishing such a mechanism by using an early form of OS-level-virtualization such as chroot jail. To make a case of such implementation being a significant risk, chroot is subject to escape techniques on Linux OS, which is the most commonly used in similar contexts and likely the one would make more sense to use as part of a decoy. The equivalent tool on FreeBSD is, for instance and for the sake of comparison, apparently much more resistant in its isolation capabilities. See here for an at-a-glance comparison of OS-level-virtualization capabilities of different implementations. Sticky honeypot:  Also known as, tarpits, a type of honeypot that leverages unused IP address space to interact with attackers probing the network and persist connections as long as possible to slow down adversary operations, most often automated ones. The way tarpits do this is by sending plausible but extremely inefficient messages in response to probes from the network, e.g., via specific TCP configuration or application level throughput. Sticky honeypots often provide good results in terms of active defense against malware spreading. Examples include LaBrea, which is also a LIHP. Note that the classification of the level of interaction provided by an HP and other characteristics (e.g., its stickiness) can be orthogonal. Honeynet – A honeynet is a set of honeypots which adds multiplicity and possibly simulated network configuration to the simpler concept of just a honeypot. The honeynet project takes its name from the concept. HIHP designed to capture extensive information on threats are usually part of an honeynet setup. Typical architecture is centered on an honeywall element, a layer 2 bridging device with no IP stack on the bridge interfaces which sets together the honeynet with one or more production networks [25, 26]. The picture on the side, courtesy of the honeynet project, shows a schematic version of such architecture (although the obvious separation in the address space would probably alert the reptile brain of sophisticated adversaries). Fake honeypot – production systems equipped with artifact (e.g., from virtualization platform), additional services and decoys in such a way they convey the message to the attacker that the system is a valueless honeypot (although monitored and therefore dangerous to play with) decreasing her incentive to both attack it and keep her presence on it. Fake honeypots can be set up by installing fingerprintable honeypots on production systems. The level of fidelity of such decoys will depend on the profile of the deception target. Careful consideration and related mechanisms in place should be put into ensuring decoys could not be exploitable to disrupt or attack nearby production systems or services. Honeypeople – Imaginary people with contact information, profiles on social networks and other form of intelligence purposely constructed and planted to mislead adversaries [11, 12]. The purpose of honeypeople is to thwart recon and instill to the adversaries false beliefs. For instance, suggest an actually non-existing (but highly monitored) target with contact information available on institutionary web sites. As another example, convey misleading information about the technology stack used in an organization by establishing a large set of technicians (honeypeople) with related skills on job-market related social networks. Side note: see here for an example of ML-generator of faces which could be turned to a generator of profile-pictures by using a different training dataset. Honeyfiles – Also known as, fake files [11, 12, 38], are files positioned in specific locations such as network shares, NAS, file-system either of servers or workstation, but also MIHP and HIHP with the purpose of being accessed or exfiltrated by adversaries. Honeyfiles are monitored in access and can be set up to contain tracing mechanisms (e.g., MS Office or PDF documents embedding macros using e.g., canarytokens) Honeytoken – Honeytokens are bogus information such as database records, identifiers or credentials purposely planted within production systems and for which the usage or dissemination is monitored as feedback channel [55, 38]. I.e., is someone uses an honeytoken(credentials) or a network traffic stream contains an information pertaining to an honeytoken(databaserecord) an alert is raised to security monitoring systems because a compromise has occurred. Canarytokens – Canarytokens are unique tokens which can be embedded in URLs to be either planted as decoy data or embedded in documents and emails. Either by the adversary explicitly visiting the URL or by the URL getting automatically opened e.g., by a document macro, the URL requests a web resource which triggers an alert that provides visibility over the token being accessed (i.e., the “canary sings”. Note that the keyword “canary” has long been used in security for mechanisms that provide a reaction when triggered, such as the homonymous buffer overflow protection mechanism). More on Canarytokens here and here. Canarytokens are an instance of the more general concept of honeytokens; in canarytokens however automation and leverage of web-bugs (or sometimes called web-beacon) mechanism enable easy deployment and monitoring. Honeyactivity – Best-practices for CND mandate the assumption that the defended perimeter is already compromised. In a situation of compromise, an adversary will have partial observability into the network, and will therefore be able to capture and inspect traffic. Based on the collected data, the adversary could then deduct which systems are under which conditions of usage and possibly detect systems that are almost never used, such as honeypots. Here is where Honeyactivity [11, 12] come into play. Honeyactivity is a simulation of the activity of a decoy part of a deception strategy. It is meant for the adversary to be noticed and misinterpreted as legitimate activity to plant or reinforce the belief that the system subject to the activity is valuable and worth the effort and risk of attacking it (or at least, that it is not an honeypot). Honeyactivity should resemble legitimate activity (to embody the proper fidelity and appear plausible to the attacker) but be clearly distinguishable by the security program for monitoring purposes, e.g., via non-trivial differentiation by network/timing pattern with any other potential activity as suggested in [12]. Honeytrap – The term is use in different ways in the reviewed literature. [56] uses it to indicate the overall umbrella of technologies used in cyber security deception. Different other publications use it to indicate specific subcomponents in production systems, such as data structures or dedicated tables in relational databases that act according to the same principle of honeypots. I.e., honeytraps are monitored and raise alarms if accessed or used but are not systems, either simulated or real ones, while still not being single unique specifically crafted information (such as honeytokens). Hybrid honeypot – architectural approach in which LIHPs are deployed on a more extensive portion of the attack surface from the perspective of the adversaries (e.g., by sinkholing entire portions of the available address space) while HIHP are deployed deeper along the assumed attack path to gather more detailed information for those adversaries who are detected through the LIHP. The mechanism to lead the attackers along the envisioned path is usually the main component of a hybrid honeypot. Honeyfarm – Scalability oriented approach in which rather than deploying a significant number of HPs, a mechanism is use to forward ports – either via network configuration or through some agent on multiple hosts – to a central location, i.e., the Honeyfarm. In such location, one or more HPs are deployed and the adversary is kept engaged while under the belief to interact with more edge-positioned systems [8, 24]. This approach also holds the nice property that being the HP deployment centralized, is supposedly easier to jail/contain the instances e.g. of HIHPs thus reducing the risk of a compromised HIHPs being leveraged as a jump point along an actual attack path leading to uncontrolled compromise. Fake endpoints, fake files, fake credentials, fake cached items, fake network shares – Alternative nomenclature for honeypots, honeyfiles, honeytokens and honeypot with a network share as decoy. E.g., [38]. Interestingly, fake cached items are rarely mentioned as honeytokens but caches are one of the well-known locations adversaries investigate for data either for recon or lateral movement purposes, e.g., via cache snooping techniques or inspecting in-memory cache of authentication systems. Endless files - An hybrid concept between honeyfiles and tarpits (sticky honeypots) - An endless file is an honeyfile served either in an extremely inefficient manner or as fast as possible, althought in both cases as having a never-ending-content (or very large for what matters), which could be easily generated as random data ($dd if=/dev/random) or with dedicated utilities that would intercept attempt to use copy functions of the decoy OS [57]. The purpose of an endless file is to both confuse the adversary and waste her resources (time, bandwidth, disk space) by providing access to, as the name suggests, endless honeyfiles. Dynamic honeypots - (Static) honeypots have a fixed decoy and require manual configuration and maintenance. A dynamic honeypot is an honeypot system that learns the behavior of the hosts on the network and automatically deploy decoys based on the gathered information [45, 42]. A dynamic honeypot system is usually composed of 3 parts with distinctive functions: fingerprinting, configuration building, deployment & monitoring. [48] presents an example usage of artificial intelligence related techniques, specifically expert systems and case based reasoning, respectively for the deployment and fingerprinting engine parts.

Interesting concepts

Although some claim deception being more art than science [38], many attemps have been carried out and are presented in the reviewed literature to formalize and establish procedures to successfully plan, design and deploy deception initiatives. This section contains a set of interesting concepts extracted from such attempts. They are divided into the different stages of planning (i.e., decide what strategy use, what are the assumptions and expectations, which are the adversary profiles considered, etc.), design/features (i.e., concepts that could turn into requirements to implement the desired outcomes), deployment (i.e., concepts related to the challenge of setting in place, operating and managing the technological mechanisms) and ideas (more like a list of notes for possibilities to investigate; be advised, they could as well be way too long shots).

Planning

For planning successful deception initiatives, different principles are outlined and discusses in [1, 2, 6, 11, 12]. [11 and 12] suggest the use of a multi-layered approach, i.e., planning strategies in multiple stages based on typical adversarial behavior. Three layers outlined as example in [11] go from a human asset (e.g., compromise of an employee access to gain an initial access), to a local asset (e.g., a machine the employee has access to), to a global asset (e.g., a system reachable via lateral movement that either is or holds the goal of the adversarial activity. In this case, the adversary is assumed to be an APT in the context of a targeted initiative. A multi-layer approach would also hold the side benefit of thwarting detection of the deception by the adversary should this be based on the isolation of the decoy. According to [1], in which an attempt to define a framework to incorporate deception into common CND practice is discussed, the planning phase should consider the following seven steps: Define strategic goal Define wanted reaction from the attacker to the deception Understand the attacker biases Create the deception story (simulation and dissimulation) Define feedback channels for monitoring and reaction Implement and integrate deception mechanisms Monitor the feedback channels Orthogonally, [6] outlines six principles to be considered for a deception initiatives to be successful. The following should be considered for validation purposes across the relevant steps in the planning phase. According to these, the deception should: Reinforce adversaries expectations Have realistic timing and duration Be integrated with operations Be coordinated with concealment of true intentions Be tailored to the needs of the setting Be imaginative and creative As far as the monitoring of the feedback channel goes, [2] proposes the following lists of characteristics to be identified for each attack in order to profile the adversary: Motivation – the reason for the adversary in carrying out the offensive initiative Breadth - also referred as depth, intended as a measure of the compromise extent reached by the adversary, such as the number of compromised machines Sophistication – a measure of the adversary skills and resources Concealment – a measure of the effort provided by the adversary in hiding her presence Attack source – the channel or activity that led to the compromise (this is elsewhere known as the attack vector) Vulnerability – the condition exploited by the adversary to carry out the compromise Tools - the technical means (attempted to) use to carry on the adversarial activity Some of the mentioned characteristics are, however, arguably identifiable or derivable from the technical information that could be collected by mechanisms used in a deception initiatives. These include, for instance, IP addresses and related allocation country according to geolocation databases, autonomous system numbers (ASN), domains, user identifier and technical characteristics of the systems from which activity has been detected (e.g., user agents, operating systems, etc.). It is worth noting that the adversary can either forge most of such information (e.g., via spoofing techniques) to divert attribution and evade detection or the information could belong to disposable intermediate resources used by the adversary to hide her location. For such reasons, in many cases such information can hardly be trusted for attribution purposes but on the other hand can be leveraged as input for other denial-based CND mechanisms for active defense purposes. Other information usually collected about the target (i.e., the deception decoy) include as well IP addresses, port numbers, services (either their banner, implemented protocols or exposed behavior), OS (and/or other simulated components) and time-until-first-attack. [2] Presents, in section V, many more metrics to be collected and surveys the related data analysis methodologies. Table VI in [2] provides a quick outline mapping the mentioned metrics and analysis methodologies to the specific publications in which they are discussed in depth.

Design/Features

As follows are reported, without any specific ordering, a set of design principles, features and concepts from the reviewed literature that are deemed of interest to inspire the upcoming experiments and potentially the design of new implementations. Use high-fidelity decoys: Make the set of services in a host make sense for a specific machine (e.g., given the related OS/platform) [T3]. Expose the characteristics of the simulated OS/Platform, for instance applying OS-obfuscation techniques such as proper settings of TCP flags, packet sending timing, etc. as described in [47], thus fooling initial scanning and information gathering tools such as nmap. Supply the full extent of the service supposedly reachable from the surface that the adversary is supposed to reach [T3] Make the services vulnerable to (a subset of) known exploits (and to some extent to unknown exploits, for collection purposes, with some sort of validation for things that could actually make sense to an adversary) [T3]. The extent in terms of number of vulnerabilities exposed should be carefully evaluated not to make the decoy reveal itself as an obvious HP. Plausible situations should be configured considering most common services and related (possibly most recent) vulnerabilities, such as MS-RPC, Netbios, HTTP/IIS, MSSSQL, etc. A good place to start would be the nmap-services list, which ranks the entries port-service based on how frequently they are found open. Depending on the level of interaction granted to the adversary, be aware of the related possible fingerprinting techniques and counter them in the decoy implementation (or at least consider the possibility of the adversary employing them and make sure the related scenario does not turn against the original deceiver). Many known HP implementation have been historically found very easy to identify (e.g., Kippo, Conpot, especially in their default configurations). If/When containing the decoy activity, consider sinkholing (rather than denial) for outbound connections, e.g., at least for protocols known to be used for C2 communications. This will improve the chance for contained activity not to detect its status and terminate or deviate from its intended behavior. Sinkhole responses might vary from no response, sticky/endless response to simulated response. Gathered data can be both considered as IOC as well as used for further investigation, e.g., via separate simulation through a client HP. Use HPs to gather intelligence to be leveraged by more conventional denial-based CND mechanisms Use honeypots to collect data on exploitation attempts and create unique attack signatures from detected activity (either for NIDS or AV) – The concept is mentioned in a project named honeycomb from C. Kreibich at University of Cambridge, not to be confused with the homonymous implementation from Cymmetria. Capture copies of malware attempting to spread to compromised HP machines – Dionaea is one implementation built around such concepts. Use sample analysis services such as VirusTotal or Hybrid-Analysis to enrich captured samples with related known intelligence or information about the samples’ behavior. Such services could be integrated via their API. Alternatively, use a malware analysis sandbox such as Cuckoo to implement this as part of an experiment. Sandbox integration for malware analysis is suggested in [13] as opposed or in conjunction with botnet-related malware detection. In comparison, other approaches such as [14] suggest using spatial-temporal correlation of network activity, which detection and processing is in that case implemented within a snort plugin. Samples collection can be enhances using client honeypots, to actively check domains and URLs with doubtful reputation for malware distribution Automate discovery and deployment enhancing the concept of dynamic honeypots Automatic protocol learning and LIHP generation. The concept is inspired by ScriptGen [17], which aims to automatically generate honeyd-compatible protocol emulators based on traffic captures. Continuously and dynamically, discover new possibilities for decoys to be used as baits and deploy them. The concept is similar to dynamic honeypots but is meant to run continuously and have quick cycles. Inspired by catering honeypots in BAIT-TRAP [49]. Evaluate if and how much this could contribute to improve the results desired for the deception initiative. Would it decrease the fidelity making it counter-productive? Would it confuse adversaries due to the MTD principle? Would both the previous points be the case but the mechanism would be good to have and then best to be used only when needed according to the strategy? Consider in the planning and deployment stages the optimization of metrics related to costs of the initiative against expected losses (and relevant cut provided by the initiative itself). An attempt to formalize budget optimization (i.e., minimize the total cost of ownership (TCO) and the total expected loss) is provided in [11].

Deployment

Deployment of honeypot systems as part of deception initiatives is often perceived as challenging due to its implied overhead and difference in terms of technology from most organizations production systems stacks. Moreover, honeypots require both a significant amount of management due to their specific nature as well as introduces a non-trivial amount of complxity in administering networked environments. Usually, HPs are deployed in terms of CND as cheap intrusion detection systems (IDS) sensors. In engineered environments, their convenient configuration is to have their activity forwarded as logs to a Security Information Events Management (SIEM) system, in such a way that they can contribute to highlight adversarial activities which can then be handled as security indicents through the organization incident management process. According to [46], most HPs are deployed behind firewalls, i.e., within corporate or production networks, as a measure of defense in depth. This has also the advantage of avoiding prolonged and continuous exposure which can lead to the deception to be uncovered, e.g., through OSINT intelligence gathering (see for instance honeyscore). However, HPs are also often placed exposed mostly to gather data about attackers’ TTPs and malware samples, both to be fed to detection systems to increase their efficacy. The deployment scheme of the HPs ad their related exposure given an organization network segmentation is a significantly specific task which involves almost unique considerations in the planning stage for the decoy to be tailored to the deceiver organization. Different alternative approaches have been proposed moving towards the concept of dynamic honeypots capable to adapt to the environment where they are deployed, including the methodology presented in [43] to generate the deployment configuration using machine learning (ML). Basically such methodology involves a fingerprinting of the hosts in the networked environment (e.g., through an nmap scan), the processing of the gathered data to output a seamless configuration for honeypots to put in place within the environment, and the actual deployment of such configuration via honeyd. A survey of typical strategies used by dynamic honeypots to identify the number and topology of decoys to be deployed is presented in [46]. As another example, one of the methodologies mentioned involves both active (nmap) and passive (p0f) fingerprinting coupled with the usage of multiple-level-of-interaction-HPs (e.g., from LIHPs to HIHPs equipped with Sebek). Besides mimicking the target environment characteristics to increase the fidelity of the overall decoy, [8] identified 6 deployment strategies for HPs as follows: Sacrificial lamb – a normal system left loosely maintained but reasonably distant from production systems and separated through network segmentation in a way to be potentially accessible by an attacker but unreasonable to be used as jump point. Deception ports – simulated services (LIHP or MIHP) directly on production servers. Main purpose could shift from detection to deterrance (e.g., see fake honeypots). Proximity decoy – HPs placed on production networks along with real production systems. This strategy is for instance employed by honeyd. This strategy might be coupled with traffic redirection and employ different interaction-level HPs. Minefield – Place a large but sparse set of HPs along the perimeter (e.g., a DMZ). Purpose focused on early detection and response (most typically via integration with IDSs). Redirection shield – This strategy assume the integration with detection capabilities (e.g., IDS or anomaly-based detection): when the detection happens, traffic is redirected or forwarded to the HPs for intelligence gathering. An interesting approach is also discussed in [13, 32] in which the adversary (either via a malware or via remote presence) is allowed to further spread or progress through the network by means of an HIHP which has visibility on some other (decoy) system that could possibly be compromised as an extension of the infection or in a lateral movement perspective. Such system is also an HP in this approach. The purpoe of this approach is to simulate the possibility for progression having the adversary wasting even more resources and have also more intelligence to collect. In case of malware, such mechanism would be also beneficial in defeating honeypot-aware adversaries who are recently common to include in malware routines that check for signs of simulations to avoid reverse engineering. It has been observed [13] that in some cases, the checks performed are limited to the possibility to spread further or at least to not be a stub system (i.e., a system which does not lead to further network access or can be used to reach other systems). Such sort of dynamism in following the adversarial activity or presence implies an high degree of flexibility in the deployment and management of the HPs, which is considerably easier in case of virtual honeypots, for which common mechanisms used to manage virtual machines (VM) can be leveraged. However, virtual environment can in many cases be easily detected via different factors such as these outlined in [44], including: Virtualization technologies’ artifacts (files, binaries, entries in the procfs in case of unix-like systems, etc.) Fingerprinting of the appearing hardware configuration and interfaces Errors in the implementations of the instructions set that are not replicated in virtualized environments Operations that behaves differently in virtualized environments, e.g., calls to strace on Linux that cannot be performed twice by the control channel and by the adversary’s malware Due to such condition of virtualized environments, other form of virtualization have also been considered in recent approaches, as the use of OS-level virtualization such as Linux containers. Linux containers (LXC) are a mechanism to run multiple isolated virtual systems on a single host sharing a single Linux kernel. LXC leverage the concepts of namespaces and cgroups to provide isolation and containment for a single process effectively abstracting applications from the OS. [31] experimented with LXC for HP deployment and concluded the mechanism to be significantly compelling in terms of performance gain against other virtualization mechanisms due to the low footprint of the OS-level virtualization. [44] also explored the possibility to use LXC to contain HPs, which seems promising since the usage of kernel namespaces should in theory provide an edge over monitoring practice with low footprint from the potential detection of the HP from the attacker perspective. However, in practice LXC still present some distinctive traits such as the usage of namespaces itself and peculiar permissions, which could in theory be used by an attacker to unmask the deception. For instance, according to [44], namespaces implementation restrict visibility from within the container of the output of the common utility ps, but does not do the same for sysinfo. The difference in the outputs is likely to reveal the fact of such utilities being run within a container. This technique is for instance employed for anti-detection purposes by the vlany LD_PRELOAD rootkit. Other caveats can reveal the contained execution state, such as the difference in the results expected from a privileged operation. Permissions in containers are mapped in a predefined way with the permission on the host system for security reasons. That means for instance that performing an operation as root within an unprivileged containers will most likely expose the condition of being contained because it will either be denied or not provide the expected results. For instance, a call as root to dmidecode from within an unprivileged container, will most likely fail due to the lack of access – for the user on the host system mapped from root within the container – to the device representing the memory from where the DMI table (also known as SMBIOS) is parsed. Besides the possibilities of uncover the decoy (Detection risk), the use of LXC (and similar OS-level virtualization techniques) presents two more additional challenges. (1) Since LXC share the same kernel in memory both among containers and with the host system, the impact of an HP being fully controlled by an adversary is potentially higher. This makes for instance LXC containers unlikely suitable for running HIHP. (2) Still due to the sharing of the kernel with the host system, an HP contained via LXC should only simulate decoys running on the very same platform to avoid easy detection, except when specifically implementing additional decoys to hide the platform the HP is actually running on.

Ideas

Model adversaries behavior (in order to exploit their biases and control their beliefs) leveraging the ATT&CK framework. Can PRE-ATT&CK techniques be easily matched to LIHPs? Orchestrate technical means for deception such as HPs with realistic dissemination of counter-intelligence information (whois databases, DNS, search engines, social networks). E.g., see for instance the google hack honeypot (GHH) project. Possibly impeding factors: legal issues, ToS violations, repercussions of disseminating false information. Continuously deceive the attacker while she progress on the network by quickly spinning up decoys in a virtualized environment (both in terms of resources and network topology, e.g., by leveraging SDN technologies), which extends as the surroundings of the compromised assets. In the process, leverage the adversary experience in the environment she went through to form correct beliefs based on past observations (but actually false). At the initial stages, this would employ the principle of MTD to confuse the adversary and lead them into a possible etrance for a simulated path, then leverage reinforcement of observed conditions coupled with assumptions of other false input provided to the adversary (e.g., planted intelligence from honeypeople, honeytokens, etc.) to keep her engaged and push her down the rabbit hole. A similar idea is discussed in [20] where in comparison to the “moving simulation” the focus is more on monitoring and leading the adversary along the preferred path on the graph of potential attacks, much like in physical reality techniques are used to drive prey into kill zones during hunting. The similarity in strategy is significant, but their proposed view of the attack paths seems quite limited and does not account for the typical APT attack process. Also, their experiments seems to have been conducted against red-teams, not real attackers. The approach in reality is also different because red teams do not have exaggeratedly high stakes in not being caught so they can afford to attempt “arbitrary exploits” or “arbitrary selection of the targets”, which might instead be, depending on its objectives, unlikely for an APT. An approach involving a dynamic set of potential simulations would offer significantly more dynamism to absorb such cases. Couple mechanisms typical of behavioral authentication systems (e.g., behavioral biometrics such as keystroke-timing patterns) with MIHP or HIHP to put human and automated adversaries apart and fingerprint human adversaries (as in “individuals”)

References

Cyber Security Deception - Mohammed H. Almeshekah, Eugene H. Spafford - 2016 A Survey on Honeypot Software and Data Analysis - Marcin Nawrocki, Matthias Wählisch, Thomas C. Schmidt, Christian Keil, Jochen Schönfelder - 2016 The Partially Observable Games We Play for Cyber Deception - Mohamadreza Ahmadi, Murat Cubuktepe, Nils Jansen, Sebastian Junges, Joost-Pieter Katoen, Ufuk Topcu - 2018 Dynamic Bayesian Games for Adversarial and Defensive Cyber Deception - Linan Huang, Quanyan Zhu - 2018 A Game-Theoretic Foundation of Deception: Knowledge Acquisition and Fundamental Limits - Tao Zhang, Quanyan zhu - 2018 Demystifying Deception Technology: A Survey - Daniel Fraunholz, Simon Duque Anton, Christoph Lipps, Daniel Reti, Daniel Krohmer, Frederic Pohl, Matthias Tammen, Hans Dieter Schotten - 2018 Characterizing Honeypot-Captured Cyber Attacks: Statistical Framework and Case Study - Zhenxin Zhan, Maochao Xu, Shouhuai Xu - 2016 Enabling an Anatomic View to Investigate Honeypot Systems: A Survey - Wenjun Fan; Zhihui Du; David Fernández; Víctor A. Villagrá - 2017 Cyber Threat Intelligence : Challenges and Opportunities - Mauro Conti, Ali Dehghantanha, Tooska Dargahi - 2018 ThingPot: an interactive Internet-of-Things honeypot - Meng Wang, Javier Santillan, Fernando Kuipers - 2018 Detecting Targeted Attacks by Multilayer Deception - Wei Wang, Jeffrey Bickford, Ilona Murynets, Ramesh Subbaraman, Andrea G. Forte and Gokul Singaraju - 2013 Catching the Wily Hacker: A multilayer deception system - Wei Wang; Jeffrey Bickford; Ilona Murynets; Ramesh Subbaraman; Andrea G. Forte; Gokul Singaraju - 2012 Hardening Honeynets against Honeypot-Aware Botnet Attacks - Charles Costarella , Sam Chung, Barbara Endicott-Popovsky , David Dittrich - 2013 BotSniffer: Detecting Botnet Command and Control Channels in Network Traffic - Guofei Gu, Junjie Zhang, and Wenke Lee - 2008 Automated State Machines Applied in Client Honeypots - Yaser Alosefer Omer Rana - 2010 The Leurre.com Project: Collecting Internet Threats Information Using a Worldwide Distributed Honeynet - C. Leita; V.H. Pham; O. Thonnard; E. Ramirez-Silva; F. Pouget; E. Kirda; M. Dacier - 2008 ScriptGen: an automated script generation tool for Honeyd - C. Leita; K. Mermoud; M. Dacier - 2005 A Framework for Deception -  Fred Cohen, Dave Lambert, Charles Preston, Nina Berry, Corbin Stewart, and Eric Thomas - 2001 The use of deception techniques: Honeypots and decoys - Fred Cohen - 2002 Leading Attackers Through Attack Graphs with Deceptions - Fred Cohen, Deanna Koike - 2002 Errors in the Perception of Computer-Related Information - Fred Cohen, and Deanna Koike - 2003 Red Teaming Experiments with Deception Technologies - Fred Cohen, Irwin Marin, Jeanne Sappington, Corbin Stewart, and Eric Thomas - 2001 Detecting and Characterizing MaliciousWebsites - Li Xu - 2014 Know Your Enemy: Honeynets - The Honeynet Project - 2006 Know Your Enemy: 2nd Generation Honeynets - The Honeynet Project - 2006 A ZigBee honeypot to assess IoT cyberattack behaviour - Seamus Dowling, Michael Schukat, Hugh Melvin - 2017 IoT honeypot: A multi-component solution for handling manual and Mirai-based attacks - Haris Šemić, Sasa Mrdovic - 2018 How to Design Practical Client Honeypots Based on Virtual Environment - Jin-Hak Park, Jang-Won Choi, Jung-Suk Song - 2016 Set-up and deployment of a high-interaction honeypot: experiment and lessons learned - Vincent Nicomette, Mohamed Kaâniche, Eric Alata, Matthieu Herrb - 2012 Polymorphic Worms Collection in Cloud Computing - Ashraf A. Shahin - 2014 Towards virtual honeynet based on LXC virtualization - Nogal Memari, Shaiful Jahari B. Hashim, Khairulmizam B. Samsudin - 2014 Extend Honeypot Framework to detect old/new cyber attacks - Hemraj Saini, Bimal, Kumar Mishra, H. N. Pratihari, T. C. Panda - 2011 Addressing the Cyber Kill Chain: Full Gartner Research Report and LookingGlass Perspectives - 2016 Honeypots: Are They Illegal? - Symantec - 2003 Building a Business Case for Deception - Gartner blog - 2016 Psychology of intelligence analysis - Richards J Heuer Jr. - 1999 An Evening with Berferd in which a cracker is Lured, Endured, and Studied - Bill Cheswick - 1992 A look at deception - Symantec - 2017 The Empirical Case for Two Systems of Reasoning - Sloman S. - 1996 A Game-Theoretic Taxonomy and Survey of Defensive Deception for Cybersecurity and Privacy - Jeffrey Pawlick, Edward Colbert, Quanyan Zhu - 2017 Toward a general theory of deception - Barton Whaley - 2008 A review on artificial intelligence techniques for developing intelligent honeypot - Wira Zanoramy Ansiry Zakaria, Miss Laiha Mat Kiah - 2012 An adaptive honeypot configuration, deployment and maintenance strategy - Daniel Fraunholz, Marc Zimmermann, Hans D. Schotten - 2017 A First Look: Using Linux Containers for Deceptive Honeypots - Kedrowitsch, Alexander and Yao, Danfeng (Daphne) and Wang, Gang and Cameron, Kirk - 2017 A review of dynamic and intelligent honeypots - Wira Zanoramy Ansiry Zakaria, Miss Laiha Mat Kiah - 2013 A Survey on Dynamic Honeypots - Hamid Mohammadzadeh, Roza Honarbakhsh and Omar Zakaria - 2012 An Application of Deception in Cyberspace: Operating System Obfuscation - Sherry B. Murphy, J. Todd McDonald, and Robert F. Mills - 2010 Active cyber defense with denial and deception: A cyber-wargame experiment - Kristin E., Heckman Michael J.Walsh Frank J.Stech Todd A.O’Boyle Stephen R.Di Cato Audra F.Herber - 2013 BAIT-TRAP: a Catering Honeypot Framework - Xuxian Jiang, Dongyan Xu - 2018 Stalking the Wily Hacker - Clifford Stoll - 1988 The Cuckoo’s Egg: Tracking a Spy Through the Maze of Computer Espionage - Clifford Stoll - 1989 The Art of Deception - Kevin Mitnick - 2002 SCADA honeypots: An in-depth analysis of Conpot - Arthur Jicha, Mark Patton, Hsinchun Chen - 2016 Intelligence-Driven Computer Network Defense Informed by Analysis of Adversary Campaigns and Intrusion Kill Chains - Eric M. Hutchins, Michael J. Cloppert, Rohan M. Amin - 2011 Honeytokens: The Other Honeypot - Lance Spitzner. Symantec - 2013 Honeytraps, A Network Forensic Tool - Alec Yasinsac, Yanet Manzano - 2018 More than passive defense - E. H. Spafford - 2011 A Virtual Honeypot Framework - Niels Provos - 2004 Joint Publication 3-13.4: Military Deception - US Public Intelligence - January 2012

Relevant talks

HITB19 - Hey Attacker! I can see you - 2019 Black Hat US - Towards an Intelligent-Interaction Honeypot for IoT Devices - 2017 DeepSec - Building a Better Honeypot Network - 2016 Black Hat US - Breaking Honeypots For Fun And Profit - 2015 Back Hat US - Bring Back the Honeypots - 2015 32c3 (ccc #33) - Breaking Honeypots For Fun And Profit - 2015 Black Hat EU - Honeypot That Can Bite: Reverse Penetration - 2013 CERIAS - Active Cyber Network Defense with Denial and Deception - 2013 Black Hat US - Alternatives to Honeypots - DTK - 2001 Modern Honey Network - 2014

Open Source Implementations

This section outlines the landscape of projects and open source implementation on the topic. For projects which code repositoty is available, last commit dates are reported as at the time of writing, i.e., Nov 2018. Check the git repositories for more up-to-date information.

Large projects and framework-like implementations

The following are either large projects or some also mentioned in the following table providing some sort of frameworks and/or mechanism for the orchestration for multiple HPs.

The Honeynet Project The Honeynet Project is a leading international 501c3 non-profit security research organization, dedicated to investigating the latest attacks and developing open source security tools to improve Internet security. Projects and tools realised within the honeynet project are available here. Notable examples include Sebek, Qebek (QEMU-based HIHP monitoring tool), Cuckoo, Dionaea and Glastopf.

DTK - The Deception Toolkit The Deception Toolkit (DTK) is one of the first tools released after the research work from Fred Cohen, written partially in C, partially in Perl. Installation, configuration and inner mechanisms are well covered in this paper from SANS reading room. DTK is a piece of history in the deception field, but nowadays seems more like a dead project. Honeyd Honeyd is a daemon written by Niels Provos (Google) to create virtual hosts on a network that can act as LIHP. Honeyd can simulate multiple virtual hosts and has the capability to simulate networking stacks of other operative systems by replicating their characteristics according to the fingerprinting done by nmap. HPs defined by honeyd present an interaction that is configurable via specific scripts. Such interaction can either define a specific behavior or forward the connection to specific hosts or processes. Honeyd is supposed to be used possibly in conjunction with other tools such as arpd (to claim unallocated IP addresses), honeydsum (to generate summaries from honeyd logs), honeycomb (to automatically generate signatures for NIDS) or honeyview (a graphical honeyd logs analyzer). Other tools in support of honeyd activity include honeyd2mysql (to store honeyd logs into a mysql database) and honeyd-viz (for visualization of statistics from honeyd logs). Most of all, to contain potentially exploitation of honeyd itself by potential adversaries, honeyd is advised to be ran sandboxed. systrace was originally suggested, but given its history of vulnerabilities other mechanisms should be investigated. honeyd-python is a different project related to LIHP in python, based on the same concepts of honeyd and integrated with the MHN. References: Project page - source code - Related publication - Honeyd OpenBSD manual page

Honeycomb Honeycomb is an extensible honeypot framework created by Cymmetria, design with a plugin architecture to handle different HP implementations. A repository of the available plugins can be found here. Honeycomb seems a well structured python project, supports containerization and has documented APIs, also for plugins. Not to be confused with the homonymous tool that generates NIDS signatures from honeyd-detected activities. References: Code repostory - Plugins Repository - Documentation

Honeytrap Honeytrap is an extensible and opensource system for running, monitoring and managing honeypots written in Go by DutchSec, not the LIHP framework written in C here, which is a different implementation. Honeytrap architecture include a central server and agents to expose open ports and forward connections. Decoys can be configured centrally on the server, where other HP implementation like Cowire or Glutton can be used. Logging is handled centrally as well in elasticsearch or kafka or splunk, etc. References: Documentation - Main code reporitory - Front end repository - Example configuration scenarions - Dockerized version

MHT - The Modern Honey Network MHT is a software to manage HPs in terms of data collection, deployment and monitoring. Managed HPs include Suricata, Dionaea, Conpot, Kippo, Amun, Glastopf, Wordpot, ShockPot and Elastichoney, as well as the monitoring part can display alerts rised by snort. References: Web page - Main code repository - Google group - Introductive talk

The Intelligent HoneyNet The Intelligent Honeynet is a sort-of-framework which architecture includes as well a centralized server and any number of HPs that will communicate with it. Includes a set of interesting scripts (in python) to extract data from the HPs supported by the solution which include e.g. Gaspot, Conpot, Cowire, Dionaea and Glastopf. Logs collection is done in elasticsearch. Web based UI with dashboard, map etc., based on Kibana. References: Code repository - Introductionary talk

ADHD - Active Defense Harbinger Distribution Distribution of several active defense tools including multiple honeypots and script for quick deployment in given scenarios, including Artillery. References: Project page - Documentation - Github Account - Introductive talk

Artillery Artillery, originally developed by TrustedSec, is a combination of a honeypot, monitoring tool, and alerting system, with the goal to eventually evolve into a hardening monitoring platform as well to detect insecure configurations from nix systems. Written in python and mantained by Binary Defense Dystems. References: Product page at TrustedSec - Product page at BDS - Code repository - Slide from Dave Kennedy (TrustedSec)

NOVA The Network Obfuscation and Virtualized Anti-Reconnaissance (Nova) is a project by DataSoft corporation with the goal to thwart 2nd stage reconnaissance (i.e., on internal networks, for lateral movement) by spawning and managing a large number of LIHP. Mostly implemented in C/C++, NOVA exists both as an open source version and as an commercially supported enterprise version. References: Project page - Enterprise version brouchure - Open source code repository - Introduction deck

Beeswarm Beeswarm is a (supposedly dead) project that aimed to provide easy configuration, deployment and management of both client and server honeypots. In beeswarm the core concept seemed to be to identify IoC by observing changes in the honeyactivity or in the usage of honeytokens used by clients. References: Code repository - Website (retired, from the way-back-machine)

Single implementations

Server Implementations

Name Decoy Description, Notes, Vulnerability, etc. Stack Last Commit Source code
Delilah Elasticsearch Groovy CVE-2015-1427, MSF Exploit PYTHON 2.7 Jun 2015 https://github.com/Novetta/delilah
ESPot Elasticsearch < 1.2 CVE-2014-3120, Exploit JS/NODE Aug 2014 https://github.com/mycert/ESPot
ElasticHoney Elasticsearch CVE-2015-1427 GO Jul 2015 https://github.com/jordan-wright/elastichoney
mysql-honeypotd MySQL LIH C Nov 2017 https://github.com/sjinks/mysql-honeypotd
NoSQLpot Redis Was supposed to be a framework… PYTHON 2.7 Jan 2017 https://github.com/torque59/nosqlpot
MongoDB-HoneyProxy MongoDB No pwd on Admin account JS/NODE Sep 2017 https://github.com/Plazmaz/MongoDB-HoneyProxy
pghoney PostgreSQL LIH GO Sep 2017 https://github.com/betheroot/pghoney
Sticky Elephant PostgreSQL MIH RUBY Apr 2016 https://github.com/betheroot/sticky_elephant
Glastopf Web Application Abandoned - See its successors (SNARE + TANNER) PYTHON 3 Nov 2018 https://github.com/mushorg/glastopf
SNARE + TANNER Web Application (PHP + Redis) https://snare.readthedocs.io/en/latest/ - https://tanner.readthedocs.io/en/latest/ PYTHON 3 Oct 2018 https://github.com/mushorg/snare - https://github.com/mushorg/tanner/
Shadow Daemon Web Application (PHP/Perl/Python) https://shadowd.zecure.org/overview/introduction/ C++ Mar 2016 https://github.com/zecure
StrutsHoneypot Web Application (Java/Struts) CVE-2017-5638 - Written as an apache2 module by Cymmetria Research C Mar 2017 https://github.com/Cymmetria/StrutsHoneypot
MTPot Mirai-infectable device By Cymmetria Research. See also https://github.com/jgamblin/Mirai-Source-Code PYTHON 2.7 Mar 2017 https://github.com/Cymmetria/MTPot/
WebTrap Web Application (Static) Cloner + Dedicated server of deceptive content PYTHON 2.7 Mar 2018 https://github.com/IllusiveNetworks-Labs/WebTrap
basic-auth-pot (bap) Web Application (http-auth) Trivial to detect bruteforce attempts PYTHON 2.7 Jan 2015 https://github.com/bjeborn/basic-auth-pot
django-admin-honeypot Django admin panel Doc at https://django-admin-honeypot.readthedocs.io/en/latest/ PYTHON 2.7 May 2018 https://github.com/dmpayton/django-admin-honeypot
Honeyhttpd Web Server (unspecified) Simple requests recorder apparently PYTHON May 2018 https://github.com/bocajspear1/honeyhttpd
phpMyAdmin Honeypot PHPMyAdmin Meh.. PHP Jul 2015 https://github.com/gfoss/phpmyadmin_honeypot
Shockpot Web Application CVE-2014-6271 (ShellShock). Doc here. Also deployable with MHN. PYTHON Dec 2015 https://github.com/threatstream/shockpot
HoneyPress Wordpress Apparently abandoned PYTHON 2.7 Sep 2016 https://github.com/dustyfresh/HoneyPress
WordPot Wordpress Detects probes for version. See also http://brindi.si/g/projects/wordpot.html PYTHON 2.7 Oct 2018 https://github.com/gbrindisi/wordpot
HonnyPotter Wordpress Available as a wordpress plugin, see https://wordpress.org/plugins/honnypotter/ PHP Dec 2015 https://github.com/MartinIngesen/HonnyPotter
AMTHoneyPot Intel AMT Firmware CVE-2017-5689. Exploit PoC. Details. GO Jan 2014 https://github.com/packetflare/amthoneypot
Ensnare HTTP Server / Web Application Implemented as a Ruby gem. Details here. Also, check this out. RUBY Apr 2017 https://github.com/ahoernecke/ensnare
HoneyPy Multiple LIH. Decoy plugins include Elasticsearch, SIP, NTP, ECHO, Telnet and TFTP. Doc here. PYTHON 2.7 Nov 2018 https://github.com/foospidy/HoneyPy
HoneyGrove Multiple by Universität Hamburg. Twisted based. Decoys include SSH, FTP and HTTP. PYTHON 2.7 May 2018 https://github.com/UHH-ISS/honeygrove
Honeyport Just open port Implementation in Python + Bash. See here and here. Meh PYTHON 2.7 Feb 2017 https://github.com/securitygeneration/Honeyport
HoneyPrint LPR LIH for printing service (LPR) decoy PYTHON 2.7 Jan 2016 https://github.com/glaslos/honeyprint
MICROS Oracle Hospitality Simphony CVE-2018-2636. Implementation by Cymmetria. PYTHON 2.7 Feb 2018 https://github.com/Cymmetria/micros_honeypot
RDPY Microsoft RDP Twisted based implementation PYTHON 2.7 Aug 2018 https://github.com/citronneur/rdpy
HoneySMB SMB HIH, does not seem very popular PYTHON 2.7 Apr 2018 https://github.com/r0hi7/HoneySMB
Toms Honeypot MSSQL, SIP, VNC, radmin Simple python implementation of a LIH PYTHON 2.7 Apr 2015 https://github.com/inguardians/toms_honeypot
WebLogic Honeypot Oracle WebLogic Server LIH. CVE-2017-10271. Exploit. Implementation by Cymmetria. PYTHON 2.7 Feb 2018 https://github.com/Cymmetria/weblogic_honeypot
Whiteface HTTP, RDP, VNC by csirtgadgets. Apparently leverages Toms Honeypot and kippo. Twisted based. PYTHON 2.7 Apr 2015 https://github.com/csirtgadgets/csirtg-honeypot
HoneyNTP NTP Minor extension to https://github.com/limifly/ntpserver PYTHON 2.7 Mar 2014 https://github.com/fygrave/honeyntp
honeypot-ftp FTP Twisted based PYTHON 2.7 Aug 2014 https://github.com/alexbredo/honeypot-ftp
Honeytrap VNC, telnet, ssh, redis, smtp, snmp, ldap, ipp, ftp, elasticsearch and more “Extensible and OSS for running, monitoring and managing honeypots”. Doc here. Quite good design and interesting features. High interaction in LXC. GO Nov 2018 https://github.com/honeytrap/honeytrap
DaemonHunter http, telnet, vnc Master-agent design. Doc here. PYTHON 3 Apr 2018 https://github.com/RevengeComing/DemonHunter
Conpot modbus, s7, snmp, bacnet, enip, tftp, http, ipmi, ftp Thought as HP for ICS/SCADA context. Dockerized. PYTHON Nov 2018 https://github.com/mushorg/conpot
GasPot Veeder Root Gaurdian AST The decoy is a controller typical for tanks in oil&gas industry. Interestingly, instances are created with some level o randomization not to look ever (well.. almost) alike. PYTHON Aug 2016 https://github.com/sjhilt/GasPot
GridPot IEEE 13 Based on conpot and Gridlab-d, hp for power-grid-like controlling systems PYTHON Mar 2015 https://github.com/sk4ld/gridpot
Damn Simple HoneyPot Raw socket / open port Designed for simplicity, multiple alert handler PYTHON Mar 2016 https://github.com/naorlivne/dshp
OpenCanary SMB, SNMP, RDP (Multiplatform) Seems a well structured project. Docs here. PYTHON 2.7 Oct 2018 https://github.com/thinkst/opencanary
HP for Router BD TCP32764 Backdoor confirmed in a significant list of routers around 2013 Decoy is on TCP port 32764 JS/NODE Feb 2014 https://github.com/knalli/honeypot-for-tcp-32764
WApot Belkin router N300 (web interface) Default setup of this popular wireless router. GO Nov 2018 https://github.com/lcashdol/WAPot
Ghost USB HP USB device Emulates an USB device to check if being connected an infection is attempted C Mar 2015 https://github.com/honeynet/ghost-usb-honeypot
Honeyperl telnet, squid, smtp. wingates Perl implementation of LIH for different decoys PERL Jun 2003 https://sourceforge.net/projects/honeyperl/
Heralding ftp, telnet, ssh, http, https, pop3, pop3s, imap, imaps, smtp, vnc, postgresql and socks5 Credentials-harvesting honeypot PYTHON Nov 2018 https://github.com/johnnykv/heralding
HoneyWRT RDP, VNC, MSSQL, telnet, FSS, Tomcat admin twisted based. deceased project PYTHON Apr 2015 https://github.com/CanadianJeff/honeywrt
HonTel Telnet HIHP, quite broken / breakable, chroot based (guess hard to maintain) PYTHON 2.7 Nov 2017 https://github.com/stamparm/hontel
LaBrea None LIHP. Tarpit for unused IP spawning VMs. Detailed info here and here. C Oct 2003 https://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/labrea/labrea/
UDPot DNS (udp 53) Twisted based DNS MIHP PYTHON 2.7 Mar 2014 https://github.com/jekil/UDPot
YAFH Telnet (CISCO), SSH HIHP. Dockerized / Dickerizable. GO Dec 2017 https://github.com/fnzv/YAFH
artic-swallow Telnet, NetBios, MSRPC, SMB LIHP. PYTHON 2.7 Oct 2017 https://github.com/ajackal/arctic-swallow
glutton ftp, http, jabber, mqtt, rdp, smb, smtp, telnet, ssh MIHP. Can also act as a proxy for ssh and (in progress) pure TCP for HIHP. GO Nov 2018 https://github.com/mushorg/glutton
go-HoneyPot None (plain tcp/udp ports) LIHP, simple project. GO Dec 2017 https://github.com/Mojachieee/go-HoneyPot
honeymail SMTP MIHP. Nice idea but broken (and likely abandoned) project. GO Aug 2016 https://github.com/sec51/honeymail
imap-honey IMAP, IMPAPS MIHP. GO Apr 2018 https://github.com/yvesago/imap-honey
portlurker Configurable banners LIHP with easily configurable banners RUST Jan 2018 https://github.com/bartnv/portlurker
Telnet IOT Honeypot Telnet IOT-Stats collection as a main goal. Include a nice interface. PYTHON 2.7 Jan 2018 https://github.com/Phype/telnet-iot-honeypot
vnclowpot VNC Logs all attempts and responses to a static VNC challenge, go stdlib only. GO Sep 2017 https://github.com/magisterquis/vnclowpot
Blacknet SSH Paramiko-based SSH MIHP. Integrable with Cowire. Dependencies include a mysql db. PYTHON Aug 2017 https://github.com/morian/blacknet
Cowire SSH, Telnet MIHP. Interesting features like fake filesystem, files collecting, forwarding to other HP for unhandled protocols. Dockerized and under active development. PYTHON Nov 2018 https://github.com/cowrie/cowrie
twisted-honeypots SSH, FTP, Telnet A collection of MIHP based on twisted and python3 logging to mysql. PYTHON 3 Apr 2018 https://github.com/lanjelot/twisted-honeypots
Kippo SSH MIHP which inspired many others. It’s itself inspired by kojoney. PYTHON 2.7 Sep 2016 https://github.com/desaster/kippo
hnypots-agent SSH, FTP, Elasticsearch, HTTP Credentials collector and more. Interface included. Data from a live instance here. GO Oct 2017 https://github.com/joshrendek/hnypots-agent
Hornet SSH Virtualhost-based topology MIHP in which structures of multiple VH can be defined. On each host, the attacker can interact with the other on the topology via SSH, ping, and wget. Sandboxed filesystems on each VH. GO Apr 2018 https://github.com/czardoz/hornet
sshd-honeypot SSH Proxy for Cowire based on OpenSSH 7.3p1 C Aug 2018 https://github.com/amv42/sshd-honeypot
sshsyrup SSH Yet another SSH honeypot. Support for sftp,scp, fake shell and virtual fs GO Oct 2018 https://github.com/mkishere/sshsyrup
Mailhoney SMTP Open SMTP relay harvesting credentials PYTHON May 2018 https://github.com/awhitehatter/mailoney
Artemisa SIP HP for VOIP. Documentation and details here. Windows only. C Feb 2011 https://sourceforge.net/projects/artemisa/files/
BearTrap None (open ports) Uses triggers for raw TCP/UDP interactions (just open ports) RUBY Sep 2015 https://github.com/tbennett6421/Beartrap
HoneyThing CWMP (TR-69) Mimics the CPE WAN Management Protocol (CWMP) protocol according to TR-069 PYTHON 2.7 Mar 2016 https://github.com/omererdem/honeything
KAKO Telnet, HTTP, HTTPS MIHP. Telnet gives access to a busybox-like environment. HTTP/HTTPS simulates uhttpd service with no routes. The project basically is meant as a framework to implement specific simulations e.g., for different devices and protocols. See the simulations repository for examples (e.g., uPnP over Huawei or Realtek, TPLink, Linksys or Netgear equipment, etc.). PYTHON PYTHON Jan 2018 https://github.com/darkarnium/kako https://github.com/darkarnium/kako-simulations/
Honeeepi Multiple Honeypot sensor on Raspberry Pi (Raspbian) via Conpot or Cowire or Dionaea or Glastopf or Honeyd or Amun MIXED Oct 2016 https://sourceforge.net/projects/honeeepi/files/
Honeypot Camera IP Camera Simulates a monitoring camera PYTHON Jun 2015 https://github.com/alexbredo/honeypot-camera

Client Implementations

As for the purpose of this research, honeyclients are not the main focus, so just some implementations that have been found kind of accidentally are reported here.

Name Decoy Description, Notes, Vulnerability, etc. Stack Last Commit Source code
Capture-HPC HICHP Capable of observing file system, registry, process of a system on a kernel level. C++ Nov 2009 https://projects.honeynet.org/capture-hpc - https://projects.honeynet.org/svn/capture-hpc/
imhoneypot Google Talk, MSN, ICQ, QQ Instant messenger honeypot PYTHON 2.7 Mar 2016 https://github.com/mushorg/imhoneypot
Thug Web Browser Mimic user-diven network client application to detect and emulate malicious content. Detection via yara. PYTHON Nov 2018 https://github.com/buffer/thug
Thug’s Rumāl None GUI for Thug and more. Docs here. PYTHON Jan 2017 https://github.com/thugs-rumal/rumal
ThugD None Thug Distributed Task Queuing Docs here. PYTHON Jan 2016 https://github.com/Aki92/Thug-Distributed
YALIH Web Browser or Network Application Yet Another Low Interaction Honeyclient - LICHP designed to detect malicious websites through signature, anomaly and pattern matching techniques. Integrates Bing API, geolocation and signatures from ClamAV and COMODO AV db. Pattern matching detection via yara (auto-signature generation). PYTHON Sep 2018 https://github.com/Masood-M/yalih

Honeytokens

Name Type Description Stack Last Commit Source code
Canarytokens URL, mail, document, embedded content (e.g., image) Tokens including URLs, DNS, email addresses, images, microsoft word documents, PDF and windows folder. Live demo instance available at http://canarytokens.org/. Code include reporting dashboard. PYTHON Oct 2018 https://github.com/thinkst/canarytokens
CEF Syslog Canary document Set of documents and executables embedding VBA scripts, autoIT and powershell code to generate CEF (Common Event Format) events and send them to a syslog server or SIEM-like system. VBAPOWERSHELL Feb 2010 https://github.com/nterl0k/CEF-Syslog-Canary
DCEPT AD Credentials Domain Controller Enticing Password Tripwire (DCEPT) creates credentials in AD discoverable only via memory extraction. Multiple components: server is written in python, agent in C#. PYTHONC# Sep 2016 https://github.com/secureworks/dcept
honeyλ None Serverless monitoring and alerting based on serverless framework and designed for AWS lambda. PYTHON 3 Oct 2018 https://github.com/0x4D31/honeyLambda
honeyku None Monitoring and alerting platform based on Flask for honeytoken in an heroku-based deployment scheme. PYTHON 3 Oct 2018 https://github.com/0x4D31/honeyku

For the sake of reference, here are the specs for the CEF format:

Commercial Products

Cymmetria - Mazerunner (+ActiveSOC)

Note: In Sep 2019, Cymmetria has been acquired by US private equity growth firm Stage Fund Cymmetria Inc. is a US cyber security company founded in 2014. According to Crunchbase, received 10+M USD in funding. Cymmetria provides CND solutions and managed services for incident response and threat hunting often based on deception mechanisms. Cymmetria has been quite active in the dissemination of deception technologies, by both proposing some relevant talks [T3, T5] and by publishing some open source implementations of decoys [see related table]. They advertise two main products, i.e. ActiveSOC, a deception-based tool to reduce workload of SOC analysts by automating detection and classification of security incidents, and MazeRunner, an orchestrator of deception mechanisms that allows to create deceptive stories by combining deployment of decoys and breadcrumbs (e.g., honeytokens). The two products are meant to work together in different phases of CND (i.e., MazeRunner for setting the stories in motion, ActiveSOC for automated detection and response), as well as to be plugged into existing industry-standard CND infrastructure (e.g., by integrating with SIEM or SIRP). MazeRunner exposes the following features: Physical of virtual appliance for the product, both on-prem and as a service. User access via centralized console (web-based UI). Wizard to create storied. API and SDK for integration with further automation. Support for syslog and STIX/TAXII. Integrates also with Chef, Puppet, Tanium for deployment if needed. Supports deployment of decoys in AWS. Manages decoys deployment (e.g., in nested VMs attached to VLAN trunks) and breadcrumbs placing (e.g., via integration with AD) Interestingly, two kind of breadcrumbs (i.e., honeytokens) are used: one type to lead to decoy systems (HPs), while another type that will be detected by integrated systems (SIEM) if used on real systems (e.g., cookies, credentials, etc.) Deception stories are also called “campaigns” and the product has a feature to refresh them, i.e., update the so-called breadcrumbs in such a way that they look recent. A list of supported decoys and breadcrumbs is provided within the mazerunner whitepaper. As far as terminology goes, breadcrumbs are mostly honeytokens, except those called network breadcrumbs which are basically honeyactivity. Decoys include git, intel AMT, IoT devices vulnerable to Mirai, MySQL, OpenVPN, RDP, SMB, SMTP (LIHP), SSH, FTP, HTTP (full decoy with preloaded applications like MediaWiki, phpmyadmin, SugarCRM, etc.), ActiveSOC on the other hand focuses on implementing automation between what is detected, possibly via correlation rules at SIEM level, and breadcrumbs placement or decoy deployment. A short but more detailed overview is presented in the related whitepaper. Both products whitepapers are attached as follows.

Thinkst Canary

Canary is the honeypot solution from Thinkst (same company providing the canarytoken service). It is available as a set of decoys in the for of either hardware devices or virtual cloud-based instances combined with a set of honeytokens. The solution is advertised as extremely simple to put in place and configure as well as with high confidence in identifying adversaries on the network.  The HPs come with a set of pre-defined configurations to mimic as decoys most common windows or linux servers or small routers. The attached Birding Guide presents some nice and clear configuration scenarios that leverages these characteristics. The product has also an open source version, OpenCanary [docs], already listed in the table above.

SmokeScreen - IllusionBlack

SmokeScreen is an Indian small company founded in 2015, selected by Gartner as Cool Vendor in AI in 2017. Their solution, IllusionBlack, claims to combine multiple deception mechanisms deployed in high numbers, including decoys for phones and email accounts (i.e., honeypeople) as well as the ability to deflect (i.e., redirect) adversaries to decoys deployed in cloud environments. IllusionBlack also has a feature to dynamically generate (and re-generate) content within the decoys to keep the simulation plausibly active. In their dissemination material, Smokescreen claims to have extensive expertise with deception strategies and actual deception campaigns in cyber security. IllusionBlack peculiar features include: NLP-based processing of data collected e.g., through honeypeople decoys (i.e., phishing emails) Integration with AD and AD decoy Realistic dynamic content generation for decoys. Reference material from smokescreen website is also attached here as follows. The Top 20 Lateral Movement Tactics whitepaper provides an idea of the decoys the solution implements.

Keyfocus - KFSensor

KeyFocus is a Ltd. originally born as a software house, founded in 1994 and based in London. Their solution, KFSensor, is focused exclusively on Windows platform and basically implements a set of HPs coupled with an alerting system. As of end of 2018, they claim 12 years of production use as Market Lead. From the information available on their website, the solution seems mostly host-based, althought the enterprise edition presents centralized management features. A list of the supported decoys is available on their feature page.

DataSoft - Nova

DataSoft Corporation is a US-based company founded in 1995. Its division Nova Network Security provides a solution, simply called Nova, which includes a number of configurable decoys and a related alerting system. The solution is the commercial version of the homonymous open source tool (source code here), and is offered coupled with an hardware appliance available in three sizes.

Symantec - Endpoin Protection

Althought not a strictly deception-based solution, Symantec Endpoint Protection, from release 14.1, employes deception mechanisms to enhance its endpoint detection and response (EDR) capabilities. The product seems to do so by using basically honeytokens, in forms of credentials, files, etc. that can be easily identifies when accessed, transferred or used, to rise alerts for the CND staff or as part of the product integration with other CND solutions. Symatec is also likely to boost this feature of its EDR solution specifically in relation to AD reconnaissance, after the acquisition of Javelin Networks.

Attivo Networks

Attivo Networks is a US based company founded in 2011. According to Crunchbase, received more than 40M USD in funding. The company claims to be leader in deception technologies for in-network threat detection and response. Attivo Networks offers a number of products as part of what they call the ThreatDefend platform, which basically is a deception-based detection and response platform. These products are namely: BOTsink - An implementation of multiple network decoys triggering alerts. Decoys simulates services targeted by ransomware, vulnerable applications, systems with fake data and documents and credentials. BOTsink seems offered in a number of configurations, i.e., a cloud-based instance, virtual or physical appliance. BOTsink claims to be capable of sumulating specialized devices including IoT devices, POS, medical IoT devices, routers and ICS. ThreatStrike - An implementation of “endpoint deception”, i.e., decoys simulating employee machines, capable of mimicking multiple platforms (windows, linux, macOS) and to include fake credentials (i.e., honeytokens). ThreatDirect - A completely cloud-based deceptive simulation that can be connected to a distributed environment such as those of companies with different branch offices. ThreatPath - A tool to enumerate and evaluate potential attack path within an infrastructure to assess the related risks ThreatOps - A playbook based incident response automation solution. An attack simulation tool called ThreatInject to validate their deception deployments. Interestingly, Attivo Networks attempted a comparison between the capabilities of its ThreatDefend platform with the MITRE ATT&CK matrix (included in the following attachments).

GuardiCore - Centra

GuardiCore focuses on cloud-based and hybrid cloud environments and provides a solution called Centra which is focused on the concept of micro-segmentation (i.e., isolation of network paths for all specific application flows in a way that only the logical components that are supposed to interact with each other can do so in a network) through policy enforcement. Highly segmented network environments are ideal for networked deception decoys because the choices for the adversaries in lateral movements are reduced to what the network segmentation allows. Therefore the solution provided by guardicore employs deception decoys (specifically, they claim to use HIHPs) to engage the attackers and gather intelligence for investigation. They also claim to have Patented dynamic deception with additional methods designed for the unique requirements of the cloud which provides coverage against attack vectors that other product miss. The solution comes with quite significant infrastructure requirements as described at the end of the GuardiCore Centra data sheet (the second attachment below).

Illusive networks

Illusive Networks is an Israeli-based company founded in 2014 and backed with 30M USD according to Crunchbase. Their flagship product is called the Illusive Platform and it’s designed to counter APT initiatives by providing high visibility over attempts of lateral movement. The solution uses an approach they refer to as Deception Everywhere, which means it includes lightweight decoys to be placed theoretically on every endpoint of the organization using it. They also provide a couple of specific products, namely TransferGuard (which is basically designed for the financial services industry and employs the usage of honeytokens) and MainFrameGuard (which seems a repackaging of their platform focused on environments where mainframes are present). They claim their products seamlessly integrates with leading IT solutions such as CISCO ISE & pxGrid, Splunk, ArcSight, CyberArk and Tanium.

More dissemination material from Illlusive Network

CNN video on Deception featuring Illusive networks Contact: Brad Cohen - Business Development Representative - +44 20 3984 9824 - bcohen@illusivenetworks.com

TopSpin - DECOYnet, now Fidelis Deception

TopSpin Security was an Israel based company providing a solution called DECOYnet platform available as virtual or physical appliance. The solution would work as a dynamic honeypot system, i.e., monitor traffic of a segmented network and generate decoys and traps based on its configuration and hosts attached, with the ability to dynamically adapt to network configuration changes. Decoys used include HPs in specific VLAN configuations, both for general purpose IT systems an specific environments such as IoT devices, and honeytokens. TopSpin has been acquired by Fidelis Cybersecurity, a US based company serving large customers such as IBM and the US DoC. Fidelis is now distributing DECOYnet as a solution simply called Fidelis Deception.

TrapX

TrapX is a US based company founded in 2012, claiming to be the World Leader in Cyber Deception Technology. They offer a solution called DeceptionGrid, which claims to employe multiple decoys including HPs (all of low, medium and high interaction), honeyactivity and honeytokens in a minefield-like deployment scheme. A number of PR resources and whitepapers are available on their website for several case studies, including its integration with several industry-common security solutions such as Carbon Black, MS Azure cloud platform, AWS or KVM virtualization technology.

Deception Solution Providers Must Prepare for Market Consolidation - Gartner - 2018 Competitive Landscape: Distributed Deception Platforms - Gartner - 2016 Emerging Technology Analysis: Deception Techniques and Technologies Create Security Technology Business Opportunities - Gartner - 2015 - Free-to-access version available here, courtesy of TrapX.

The following are tools that could turn out useful in designing experiements and solutions for deception initiatives. They were not intentionally researched but rather have been encountered during this research and reported here just not to waste the information. Last commit dates have been collected in Nov 2018.

Name Description Stack Last Commit Source code
Honeymap Client-Server map to visualize honeypot data. Backend in go, frontend in coffeescript. MIXED Sep 2013 https://github.com/fw42/honeymap
Honeyscore A service by Shodan that assigns score to IP addresses based on their collected intel 0-10 (HP or not HPs) UNKNOWN Closed- N/A https://honeyscore.shodan.io/
Honeybits Not actually an honeypot but more of a tool to lead attackers to the HPs by creating fake intel easily discoverable (fake bash history entries, fake entries in the ARP table, fake configuration files, fake history entries, fake LSASS entries, fake registry keys, etc.). Just a POC but the concept is significantly interesting. GO Oct 2018 https://github.com/0x4D31/honeybits
Systrace Trace and define policies for system calls. Can be used to sandbox honeypot implementations (suggested for honeyd) C 2008 http://www.citi.umich.edu/u/provos/systrace/ - http://cvsweb.netbsd.org/bsdweb.cgi/src/bin/systrace/
OFPOT An openflow script to redirect traffic directed towards unused IPs to an honeypot PYTHON Jan 2013 https://github.com/upa/ofpot
Hale Botnet C&C monitor. Nice modular design. PYTHON Oct 2017 https://github.com/pjlantz/Hale
DNSMole Analyse dns traffic to potentionaly detect botnet C&C server and infected hosts C Mar 2017 https://code.google.com/archive/p/dns-mole/ - https://github.com/redbad2/dns-mole
mitmproxy Interactive, SSL-capable HTTP intercepting proxy with a console interface PYTHON Nov 2018 https://mitmproxy.org/
VMCloak Automated Virtual Machine Generation and Cloaking for Cuckoo Sandbox. PYTHON Oct 218 https://github.com/hatching/vmcloak
Antivmdetection Script-generated virtualbox template to make VM detection harder PYTHON Aug 2016 https://github.com/nsmfoo/antivmdetection
LibVMI C library with Python bindings that makes it easy to monitor the low-level details of a running virtual machine C Nov 2018 http://libvmi.com/ - https://github.com/libvmi/libvmi - PYTHON bindings
peepdf Analyze PDF files to catch malicious ones (including shellcodes / js macros / etc) PYTHON Nov 2016 https://github.com/jesparza/peepdf
cc2asn Handy service to query for ASN/CC/address-spaces (e.g., to populate an interface with details upon detection) PYTHON Jun 2017 https://github.com/toringe/cc2asn, live at http://www.cc2asn.com/ (entire DB here)
MockSSH Just what the name suggests.. might come in handy to implement restricted fake ssh services PYTHON Jan 2017 https://github.com/ncouture/MockSSH
cowrie2neo Parse cowire logs and throw them into a neo4j graph db PYTHON Oct 2017 https://github.com/xlfe/cowrie2neo
HoneyDrive Live CD linux distribution equipped with many HP-related implementations (e.g., kippo, amun, dionaea, glastopf, conpot, labrea, thug, phoneyc) and tools. MIXED Jul 2014 https://bruteforcelab.com/honeydrive
Sebek Data capture tool to record attacker activity on the HP without detection. See here. C Feb 2013 https://github.com/honeynet/sebek/
Qebek QEMU-based Sebek. Same as Sebek but for QEMU-based VM (HIHPs). See here. C Jun 2011 https://github.com/honeynet/qebek
Xebek Xen-based Sebek. Same as Sebek but for Xen-based VM (HIHPs). See here. C Jul 2011 https://projects.honeynet.org/sebek/browser
Tripwire-open-source Open source remake of the well known integrity monitoring tool. C++ Apr 2018 https://github.com/Tripwire/tripwire-open-source/
OSfuscate TCP/IP flags & timing configuration to obfuscate the OS type and version (for windows) C++ Mar 2008 https://www.irongeek.com/security/osfuscate-change-your-windows-os-tcp-ip-fingerprint-to-confuse-p0f-networkminer-ettercap-nmap-and-other-os-detection-tools.htm
HoneypotBuster Tool to uncover honeytoken, honeybreadcrumbs, and honeypots in some configs. See here. POWERSHELL Dec 2017 https://github.com/JavelinNetworks/HoneypotBuster

image_1.jpeg image_2.png image_3.png image_4.png image_5.png image_6.png image_7.png image_8.png image_9.png image_10.png image_11.png image_12.png image_13.png image_14.png image_15.png image_16.png image_17.png image_18.png image_19.png image_20.png image_21.png image_22.png image_23.png image_24.png image_25.png image_26.png image_27.png image_28.png image_29.png image_30.png image_31.png image_32.png image_33.png image_34.png image_35.png image_36.png image_37.png image_38.png

me

My name is Adam Lichonvsky and I'm proud father and researcher.