Protein Structure Networks

What is a Protein Structre Network

Proteins are biomolecules crucial to life as we know it, as their ability to interact with other molecules maintain otherwise slow chemical reactions which all forms of life depend on. Their three-dmensional configuration is crucial for their capabilities, which we call tertiary structure. The linear amino-acid residue chain folds over itself, forming a well defined globular-like shape, building crucial conformations such as protein domains and active sites. To better understand where such important conformations lie, we wish do develop methods to help to zero in onto those residues.

The method we employ is to not to anlyse the protein structure directly but to create a simplified representation using networks. Graphs are objects in discrete mathematics used to represent such concepts as networks. They are composed of vertices which are connected by edges. In our study, we represent the amino-acid residues with vertices, and their bonds with edges. This simplified represention takes not in account that a covalent bond is more permanent than a mere bipolar intermolecular bond, however this simplified representation is still powerful enough to predict with good accuracy which residues are crucial to protein funcion. But how can we rank a vertex importance within a network?

What is a Network Centrality?

Network centrality is a measure of the relevance of a vertex within a graph or network. One might guess that if the protein strcuture network is a useful information upon the structure, a centrality measure might be useful to guess the importance of a amino-acid residue within protein strucutre/function. Not only this guess is adequate, but it has been studied for some time by structural biologists.

However, how can we measure something subjective such as importance? There are three interesting definitions we employed on our case study:

Closeness Centrality - It is based on the idea that a vertex that is importance might be close to the rest of the network, in other words, an important vertex is well connected to other vertices, getting to most vertices in just a few steps;
Betweenness Centrality - It is based on the idea that an important vertex will help stabilish connection between parts of the network. We can measure such thing by removing the vertex and measuring how worse the network has become in terms of connectivity;
Eigenvector Centrality - Imagine if a network is actually a representation of cities connected by roads, and there’s a wanderer randomly travelling on this network. Each time it will randomly take a road and get to the other city. One might guess that not all cities are equally visited by this method. The eigenvector centrality relates the frequency of the visits by this wanderer to the importance. We could actually estimate such frequency by actually simulating some wanderers and counting their visits (Monte Carlo Simulation). But we can actually calculate the eigenvector of the matrix which represents the graph itself, which is much quicker than the Monte Carlo Simulation.

What did I do?

The main intent of this undergraduate research was to develop a Python Graph class with centrality measuring capabilities and program a script able to calculate the centrality of each amino-acid residue within a protein structure given a .pdb file. This script also should be able to take multiple structures in a single file and measure the centrality to all instances.

The code

class Graph:
    def __init__(self, V): #inicialization
        self._V = V
        self._E = 0
        self._nextTo = [[]]
        for i in range(0, V):
            self._nextTo.append([])

    def isNextTo(self, v, w): #checks if two edges are next to each other
        if not(v < self._V and w < self._V):
            error.err("Graph does not contain this edge.")
        for i in self._nextTo[v]:
            if i == w:
                return True
        return False


            
    def addEdge(self, i, j): #add a edge to the graph
        if not(self.isNextTo(i, j)):
            self._nextTo[i].append(j)
            self._nextTo[j].append(i)
            self._E += 1
    
    def showEdges(self): #prints the adjacency matrix
        print self._E
        for i in range(0, self._V):
            print self._nextTo[i]
    
    def adjList(self, v):
        return self._nextTo[v]

    def size(self):
        return self._V
    
    def adj(self, v):
        return self._nextTo[v]
		
    def contactList(self):
		contactMatrix = []
		tmpLine = []
		for i in range(self._V):
			for j in range(self._V):
				if i == j:
					tmpLine.append(-1)
				else:
				    if self.isNextTo(i, j):
					   tmpLine.append(1)
				    else:
					   tmpLine.append(0)
			contactMatrix.append(tmpLine)
			tmpLine = []
		return contactMatrix

You can see that the class itself is very simple. All it does is give other centrality calculating classes all they need to measure centrality. As it turns out, it all boils down to checking what vertices are next to (isNextTo) and getting the adjacency matrix (adjList). The rest is mostly to build the graph itself and get additional info.

class Betweenness:
    def __init__(self, G):
        bet = [0]*G.size()
        for s in range(0, G.size()):
            S = deque()
            P = []
            for i in range(0, G.size()):
                P.append(deque())
            sig = [0]*G.size()
            d = [-1]*G.size()
            sig[s] = 1
            d[s] = 0
            Q = deque()
            Q.append(s)
            while len(Q) > 0:
                v = Q.popleft()
                S.append(v)
                for w in G.adj(v):
                    #fist time seen w
                    if d[w] < 0:
                        Q.append(w)
                        d[w] = d[v] + 1
                    if d[w] == d[v] + 1:
                        sig[w] = sig[w] + sig[v]
                        P[w].append(v)
            
            delt = [0]*G.size()
            #print sig
            while len(S) > 0:
                w = S.pop()
                for v in P[w]:
                    delt[v] = delt[v] + (((1.0*sig[v])/sig[w]) * (1 + delt[w]))
                if w != s:
                    bet[w] = bet[w]+delt[w]
        
        N = ((G.size() - 1) * (G.size() - 2))/2.0
        for i in range(0, len(bet)):
            bet[i] = (bet[i]/(2.0*N))
        self._bet = bet
    def getBet(self):
        return self._bet

Possibly it wasn’t a good idea to add all centrality measuring algorithms in the same Graph.py file, but it is exaclty what I did. In the same file I added Betweenness class. This is a implementation of Brandes’s algorithm, which is a faster algorithm for betweenness computation. I’ve seen other researchers using Bellman-Ford algorithm, however Brades claims on his paper that his algorithm is faster on sparser graphs. So that’s what I did.

class Closeness:
    def __init__(self, G):
        clo = [0]*G.size()
        for s in range(0, G.size()):
            bfs = BFS(G, s)
            dist = bfs.distanceList()
            clo[s] = 1.0/sum(dist)
        self._clo = clo

    def getClo(self):
        return self._clo

To compute closeness, all one have to do is to run breadth first search on all vertex pair on the graph and calculate the sum of all shortest paths distances with and without the desired vertex. The BFS class is also implemented on the same Graph.py file, it’s rather unremarkable so I shall not further comment on it.

class Eigenvector:
    def __init__(self, G):
        adjMatrix = G.contactList()
        eigVector = np.random.rand(G.size())
        for i in range(100):
            tmp = np.dot(adjMatrix, eigVector)
            tmpnom = np.linalg.norm(tmp)
            eigVector = tmp/tmpnom
        self._eig = eigVector

    def getEig(self):
        return self._eig

The Eigenvector one is the simples of the three, it merely takes the contact matrix and attempts to find its eigenvector by multiplying a random vector multiple times with the adjacency matrix. I don’t expect the algorithm to fail as it might happen sometimes, as the adjacency matrix is probably not a corner case in which it might happen.

Including Graph.py, I coded Protein(Centrality).py, in which (Centrality) is the name of one of the three centralities. The scipt takes a .pdb file and spits out a list of centralities.

You can check out the source code here.

Protein Centrality Networks

How could computers understand proteins?

Protein Structure Networks

What is a Protein Structre Network

What is a Network Centrality?

What did I do?

The code