Log in / Register
Home arrow Computer Science arrow Linked Open Data
< Prev   CONTENTS   Next >

4.3 Notes on the BI Workload

The test driver can run with single-user run or multi-user run, simulating the cases that one user or multiple users concurrently execute query mixes against the system under test.

All BSBM BI runs were with minimal disk IO. No specific warm-up was used and the single user run was run immediately following a cold start of the multiuser run. The working set of BSBM BI is approximately 3 bytes per quad in the database. The space consumption without literals and URI strings is 8 bytes with Virtuoso column store default settings. For a single user run, typical CPU utilization was around 190 of 256 core threads busy. For a multi-user run, all core threads were typically busy. Hence we see that the 4 user run takes roughly 3 times the real time of the single user run.

4.4 Benchmark Results

The following terms will be used in the tables representing the results.

• Elapsed runtime (seconds): the total runtime of all the queries excluding the time for warm-up runs.

• Throughput : the number of executed queries per hour. Throughput = (Total # of executed queries) * (3600 / ElapsedTime) * scaleFactor. Here, the scale factor for the 50 billion triples dataset and 150 billion triples dataset is 500 and 1500, respectively.

• AQET : Average Query Execution Time (seconds): The average execution time of each query computed by the total runtime of that query and the number of executions: AQET(q) = (Total runtime of q) / (number of executions of q).

BI Use Case. Table 3 shows the results for the BI workload. Some results seem noisy, for instance Q2@50B, Q4@50B, Q4@150B are significantly cheaper in the multi-client-setup. Given the fact that the benchmark was run in drill-down mode, this is unexpected. It could be countered by performing more runs, but, this would lead to very large run-times as the BI workload has many long-running queries.

In the following, we discuss the above performance result over the query Q2.

Further discussion on other queries can be found in [6].

SELECT ?otherProduct ?sameFeatures {

?otherProduct a bsbm:Product . FILTER(?otherProduct != %Product%)

{SELECT ?otherProduct (COUNT(?otherFeature) AS ?sameFeatures) {

%Product% bsbm:productFeature ?feature .

?otherProduct bsbm:productFeature ?otherFeature . FILTER(?feature=?otherFeature)

} GROUP BY ?otherProduct}}

ORDER BY DESC(?sameFeatures) ?otherProduct LIMIT 10

Table 3. BI Use Case: detailed results (Jan. 2013)

Table 4. BI Use Case: updated results (Mar. 2013)

BSBM BI Q2 is a lookup for the products with the most features in common with a given product. The parameter choices (i.e., %Product%) produce a large variation in run times. Hence the percentage of the query's timeshare varies according to the repetitions of this query's execution. For the case of 4-clients, this query is executed for 4 times which can be the reason for the difference timeshare between single-client and 4-client of this query.

The benchmark results in the Table 3 are taken from our experiments running in January 2013. With more tuning in the Virtuoso software, we have re-run the benchmark with the dataset of 50B triples. The updated benchmark results in Table 4 show that the current version of Virtuoso software, namely Virtuoso7March2013, can run the BSBM BI with a factor of 2 faster than the old version (i.e., the Virtuoso software in January). Similar improvement on the benchmark results is also expected when we re-run the benchmark with the dataset of 150B triples.

Explore Use Case. We now discuss the performance results in the Explore workload. We notice that these 4-client results seem more noisy than the singleclient results and therefore it may be advisable in future benchmarking to also use multiple runs for multi-client tests. What is striking in the Explore results is that Q5 (see the query below) dominates execution time (Tables 5 and 6).

SELECT DISTINCT ?product ?productLabel WHERE {

?product rdfs:label ?productLabel . FILTER (%ProductXYZ% != ?product)

%ProductXYZ% bsbm:productFeature ?prodFeature .

?product bsbm:productFeature ?prodFeature .

%ProductXYZ% bsbm:productPropertyNumeric1 ?origProp1 .

?product bsbm:productPropertyNumeric1 ?simProp1 .

FILTER (?simProp1<(?origProp1+120) && ?simProp1>(?origProp1-120))

%ProductXYZ% bsbm:productPropertyNumeric2 ?origProp2 .

Table 5. Explore Use Case: detailed results

Table 6. Explore Use Case results: query mixes per hour

?product bsbm:productPropertyNumeric2 ?simProp2 .

FILTER (?simProp2<(?origProp2+170) && ?simProp2>(?origProp2-170))

} ORDER BY ?productLabel LIMIT 5

Q5 asks for the 5 most similar products to one given product, based on two numeric product properties (using range selections). It is notable that such range selections might not be computable with the help of indexes; and/or the boundaries of both 120 and 170 below and above may lead to many products being considered 'similar'. Given the type of query, it is not surprising to see that Q5 is significantly more expensive than all other queries in the Explore use case (the other queries are lookups that are index computable. – this also means that execution time on them is low regardless of the scale factor). In the Explore use case, most of the queries have the constant running time regardless of the scale factor, thus computing the throughput by multiplying the qph (queries per hour) with the scale factor may show a significant increase between the cases of 50 billion and 150 billion triples. In this case, instead of the throughput metric, it is better to use another metric, namely qmph (number of query mixes per hour).

Found a mistake? Please highlight the word and press Shift + Enter  
< Prev   CONTENTS   Next >
Business & Finance
Computer Science
Language & Literature
Political science