Page 8 - Krész, Miklós, and Andrej Brodnik (eds.). MATCOS-13. Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science. Koper: University of Primorska Press, 2016.
P. 8
ed to further reduce the running time of the algorithm. Ullmann alg. VF2
Moreover, an additional improvement is based on the obser-
vation that the order of the atoms in the query can be fixed small queries first all first all
in the first step, thereby avoiding to find the atom with the 0.004 s 0.005 s 0.001 s 0.002 s
minimum index at each step. and targets 0.067 s 0.136 s 0.009 s 0.013 s
small queries
6. TEST CASES, SPEED AND MEMORY 0.27 s 1.084 s 0.01 s 0.027 s
RESULTS and large tar-

We have compared the implemented algorithms combined gets
with all the improvements mentioned above in different as- large queries
pects. We have studied the time requirement for finding
either the first or all substructure mappings for molecules and large tar-
having different size. Our test suite consists of three in-
stance sets: (1) small queries and small targets4; (2) small gets
queries and large targets, and (3) large queries and large
targets. The test molecules were selected from a public Table 3: The search time for 1000 query-target pairs
molecule database of National Cancer Institute [3]. The at finding the first/all mapping
different test cases are summarized in Table 2. The first
and second row of each cell corresponds to the queries and 7. CONCLUSION
targets, respectively. Some of the 20 small, frequent test
molecules are shown in Figure 5. This paper presents our results for solving the subgraph iso-
morphism problem. We have implemented the Ullmann al-
The heuristics mentioned above have decreased the search gorithm, the VF2 algorithm, and the atom-reordering step of
time of the Ullmann algorithm and VF2 by 35-40%. The QuickSI along with various heuristics that we devised to im-
running time can be further reduced by 10% if the algo- prove upon them. We have compared the memory and time
rithms are implemented in iterative way instead of recursive requirements of the implementations on real-world molecu-
way. Finding all mappings required between 1.5 and 3 times lar graphs having different size. The search time has been
as much running time as finding only the first mapping. reduced by 35-40% by applying the developed heuristics.

Our results show that the VF2 algorithm is typically much 8. ACKNOWLEDGMENTS
faster than the Ullmann algorithm. We found that the re-
finement of the initial boolean matrix used by the Ullmann I would like to express my gratitude to Krisztia´n Tichler
algorithm takes about 70% of the total search time. and P´eter Kova´cs, my supervisors for their patient guidance,
enthusiastic encouragement and useful critiques of this re-
The reduced search time for 1000 query-target pairs are search work. I am grateful to Istva´n Fekete for introducing
shown in Table 3. me to cheminformatics and for his continuous support. I am
obliged to staff members of ChemAxon Ltd. for the valuable
Query molecules information provided by them in their respective fields.

small large The research project presented in this paper is supported
by the European Union and co-financed by the European
Target molecules small 20 small molecules – Social Fund (grant agreement no. TA´ MOP 4.2.2./B-10/1-
2010-0030).
molecules having 11-15 atoms
Marvin was used for drawing, displaying and characterizing
large 20 small molecules large, similar chemical structures, substructures, and reactions, Marvin
molecules at least 76 atoms molecules 6.0.5, 2013, ChemAxon (http://www.chemaxon.com).

Table 2: Test cases 9. REFERENCES

Figure 5: Some small query molecules [1] N. Brown. Chemoinformatics – an introduction for
computer scientists. ACM Computing Surveys, pages
4In cheminformatics, molecules having at most 15 non- 8:1–8:38, 2009.
hydrogen atoms are considered as small molecules.
[2] L. P. Cordella, P. Foggia, C. Sansone, and M. Vento.
An Improved Algorithm for Matching Large Graphs.
In: 3rd IAPR-TC15 Workshop on Graph-based
Representations in Pattern Recognition, Cuen, pages
149–159, 2001.

[3] National Cancer Institute. NCI Open Database
Compounds, 2000.
http://cactus.nci.nih.gov/download/nci/.

[4] H. Shang, Y. Zhang, X. Lin, and J. X. Yu. Taming
Verification Hardness: An Efficient Algorithm for
Testing Subgraph Isomorphism. Proceedings of the
VLDB Endowment, pages 364–375, 2008.

[5] J. R. Ullmann. An algorithm for subgraph
isomorphism. Journal of the ACM, 23:31–42, 1976.

m a t c o s -1 3 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science 8
Koper, Slovenia, 10-11 October
   3   4   5   6   7   8   9   10   11   12   13