How a Graph-Based Approach Can Elevate Your Cloud Security

Roy Maor
Thursday, Mar 11th, 2021

Ever wondered how large organizations map complex cloud architecture, complete with dynamic assets, fast-paced changes, and tightly woven interdependencies? Here’s your introduction to using graph theory for cloud risk management, reducing the cyber security risk through a science-driven approach to connected cloud data.  

The Building Blocks of Cloud Environments 

First, let’s answer one key question. What are cloud environments made of? Let’s use an example of AWS and Kubernetes infrastructures and list the main building blocks of any cloud environment with examples of relevant asset types for each block. 

Identity – Users, Groups, Roles, Policies, Accounts, Access Keys. 

Compute – EC2 Instances, Lambda Functions, Kubernetes Pods. 

Network – Subnets, Load Balancers, Security Groups. 

Data – Dynamo DB, RDS DB, Kinesis Streams. 

Service – ECS, KMS, SQS. 

Storage – S3 Buckets, EBS. 

As you can see, each of these building blocks can contain dozens of asset types, each type with a unique purpose and distinct properties and connections that distinguish it from others.  

Now, imagine the scale of S&P 500 enterprises cloud environments, with their DevOps and IT teams using, creating, configuring, and removing cloud assets every single day. We’ve suddenly got ourselves a dynamic mega-puzzle with vast numbers of pieces (cloud entities) and edges (logical connections between them). How can we solve this kind of puzzle? Graphs.  

The Process of Mapping a Cloud Environment 

To get an accurate view of your cloud infrastructure, you need a map. Cloud mapping is completed by assessing the relationships between cloud assets, so to create your graph, you’ll need to build an explicit and well-defined relationship table stating all the possible links between the assets, and then how these can be deducted from the data collected. The topological result of this stage is a graph containing all cloud entities, including the links between one another.  

Each link is created with two elements in mind. First, direction. This means we would ask “Is asset A connected to asset B or vice versa?” Next, type. The question here is more complex and could be “Is asset A contained in asset B, attached to it, exposing it?” and more.  

The graph should be a cross-platform graph, meaning it will contain assets from a multi-cloud environment, collected from various cloud providers (AWS, Azure, GCP), multiple orchestration platforms (Kubernetes, Rancher, Mesosphere), containerization platforms like Docker, 3rd party IP addresses, and more. 

Assets can be linked in the graph regardless of their platform origin (for example, a Kubernetes service exposing an AWS load balancer), making the graph topology holistic and fully resemblant of the client’s cloud architecture. 

Process of Mapping a Cloud Environment

How Do We Make Graph Theory Happen? 

So, on the technical side, what allows all this cloud graph frenzy? You guessed it right - graph databases. A graph database is a database designed to store and query graphs efficiently, in contrast to traditional relational databases (RDBMS).  

Instead of good old SQL, in graph databases we use querying languages built upon concepts of graph theory. That way we can assign and refer to variables with types like nodes, edges, and even paths in the graph, which in our case makes it perfect for threat modelling. Billions of nodes and edges can be held in a single database, making it a fertile ground for research of social networks, consumption tracking, customer interest maps, and as you’ll see in our next articles – cloud architectures 

One of the most popular graph databases out there is Neo4j. Neo4j is a graph platform that allows for the building and querying of graphs in production environments, and excels in its scale, performance, and easy-to-use graph query language, Cypher. 

Being an industry leader, Neo4j has some very useful extra capabilities. One of them is the Neo4j Graph Data Science Library (GDS) library, a collection of dozens of known graph algorithms ranging from spanning tree algorithms to machine learning models for node classification. Using tools like this puts graph database users on another level, being able not only to query their raw graph data but also to execute complex algorithms against it and draw meaningful conclusions from the results - conclusions that are sometimes very hard to find in other ways. 

Only Continuous Mapping of your Cloud Architecture Can Uncover Attack Paths 

It is evident that using a graph theory-based approach to reduce cyber security risk on the cloud is a need rather than a want. Having continuous mapping of a cloud environment yields value in two different aspects. First, visibility – gaining a deep understanding of the environment’s cloud architecture, and second, cloud risk management – identifying critical attack paths in the environment and mitigating the risk they present.  

Think of the Neo4j GDS library we mentioned earlier, and the way that this new level of insight can revolutionize your cyber risk landscape on the cloud. By finding the right connections between graph algorithms to cloud security misconfigurations and vulnerabilities, you can actually get brand-new visibility into possible risks arising from faulty cloud architecture.  

Want to learn more about how graph algorithms can be leveraged for advanced threat modeling and reducing cyber security risk on the cloud? We’ll be writing further articles to dive deep into this subject.  

Popup Image