Log in / Register
Home arrow Computer Science arrow Linked Open Data
< Prev   CONTENTS   Next >

6 Conclusion

In this chapter we have described the advanced column store techniques and architectural ideas implemented in Virtuoso RDF store and its cluster edition, which help reduce the “RDF tax” by an order of magnitude (i.e., from 150 to 2.5). Extensive experiments using the BSBM benchmark on both short-running index lookup queries (the Explore use case) and the complex analytical queries (the BI use case) demonstrate that the new cluster architecture allows to perform RDF data management on a unprecedented scale (i.e., 150 billion triples).

In addition to the promising approach of exploiting the column store techniques, which significantly reduces the “RDF tax”, to make the performances of SPARQL and SQL systems converge, RDF store needs to be aware of the actual structure of RDF data, allowing it to decrease the inherent large number of self-joins and making query optimization more reliable. For that, we have presented practical techniques for discovering an emergent relational schema in RDF dataset, that recovers a compact and precise relational schema with high coverage and useful labels as alias for all machine-readable URIs (which it preserves). The emergent schemas not only open up many opportunities to improve physical data indexing for RDF, but also respect the schema-last nature of the semantic web as being automatically detected. Implementation of these techniques will soon be realized in Virtuoso, and hopefully will close the performance gap between the SPARQL and SQL systems.

Open Access. This chapter is distributed under the terms of the Creative Commons Attribution Noncommercial License, which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.


1. Abadi, J.: Query execution in column-oriented database systems, MIT Ph.D. thesis (2008)

2. Bizer, C., Schultz, A.: The Berlin SPARQL benchmark. Int. J. Semant. Web Inf.Syst. (IJSWIS) 5(2), 1–24 (2009)

3. Boncz, P., Neumann, T., Erling, O.: TPC-H analyzed: hidden messages and lessons learned from an influential benchmark. In: Nambiar, R., Poess, M. (eds.) TPCTC 2013. LNCS, vol. 8391, pp. 61–76. Springer, Heidelberg (2014)

4. IBM DB2.

5. Erling, O.: Virtuoso, a hybrid RDBMS/graph column store. IEEE Data Eng. Bull.35(1), 3–8 (2012)

6. Harth, A., Hose, K., Schenkel, R.: Linked Data Management. CRC Press, Boca Raton (2014)

7. Lamb, A., et al.: The vertica analytic database: C-store 7 years later. Proc. VLDB Endowment 5, 1790–1801 (2012)

8. Minh-Duc, P., et al.: Deriving an emergent relational schema from RDF data. In: ISWC (submitted) (2014)

9. MonetDB column store.

10. Neumann, T., et al.: Characteristic sets: accurate cardinality estimation for RDF queries with multiple joins. In: ICDE (2011)

11. Neumayer, R., Balog, K., Nørv˚ag, K.: When simple is (more than) good enough: effective semantic search with (almost) no semantics. In: Baeza-Yates, R., de Vries, A.P., Zaragoza, H., Cambazoglu, B.B., Murdock, V., Lempel, R., Silvestri, F. (eds.) ECIR 2012. LNCS, vol. 7224, pp. 540–543. Springer, Heidelberg (2012)

12. O'Neil, P., et al.: The star schema benchmark (SSB). PAT (2007)

13. Openlink Software Blog.

14. Page, L., et al.: The pagerank citation ranking: bringing order to the web. Technical report, Stanford InfoLab (1999)

15. Pham, M.-D.: Self-organizing structured RDF in MonetDB. In: ICDE Workshops (2013)

16. Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGrawHill, New York (1983)

17. Tsialiamanis, P., et al.: Heuristics-based query optimisation for SPARQL. In: EDBT (2012)

18. Zukowski, M., Boncz, P.A.: Vectorwise: beyond column stores. IEEE Data Eng. Bull 35(1), 21–27 (2012)

Found a mistake? Please highlight the word and press Shift + Enter  
< Prev   CONTENTS   Next >
Business & Finance
Computer Science
Language & Literature
Political science