If you're interested in functional programming, you might also want to checkout my second blog which i'm actively working on!!
Thursday, March 28, 2013
Friday, March 22, 2013
Tuesday, March 19, 2013
Monday, March 18, 2013
Thursday, March 14, 2013
Finding duplicate values with XQuery
As I'm always interested in benchmarks and optimized solutions I compared 3 strategies of finding duplicate values in a big sequence. This test will use 5000 randomly generated numbers and compare performance.
The first thing I needed to do was creating a sequence of 5000 randomly generated numbers. Scala comes to the rescue
This generates a file with following content. Remark: in reality that sequence contains 5000 numbers.
Now I ran the following XQueries on Zorba to get a feeling about how fast they are.
XQuery 1: (takes about 5 seconds for 5000 numbers)
XQuery 2: (takes about 5 seconds for 5000 numbers)
XQuery 3: (takes about 1 seconds for 5000 numbers)
Of course I got intrigued how Sedna would perform. I only tried XQuery1 (3 to 4 seconds) and XQuery3 (around 13 seconds) on my local machine, which only shows that you cannot always take the same approach while trying to optimize.
The first thing I needed to do was creating a sequence of 5000 randomly generated numbers. Scala comes to the rescue
Loading ....
This generates a file with following content. Remark: in reality that sequence contains 5000 numbers.
let $values := (1012,5345,2891,3833,2854, 2236)
Now I ran the following XQueries on Zorba to get a feeling about how fast they are.
XQuery 1: (takes about 5 seconds for 5000 numbers)
let $values := (1012,5345,2891,3833,2854, 2236) let $distinctvalues := distinct-values($values) let $nonunique := for $value in $distinctvalues return if (count(index-of($values, $value)) > 1) then $value else () return $nonunique
XQuery 2: (takes about 5 seconds for 5000 numbers)
let $values := (1012,5345,2891,3833,2854, 2236) return $values[index-of($values, .)[2]]
XQuery 3: (takes about 1 seconds for 5000 numbers)
let $values := (1012,5345,2891,3833,2854, 2236) return distinct-values(for $value in $values return if (count($values[. eq $value]) > 1) then $value else ())
Of course I got intrigued how Sedna would perform. I only tried XQuery1 (3 to 4 seconds) and XQuery3 (around 13 seconds) on my local machine, which only shows that you cannot always take the same approach while trying to optimize.
Subscribe to:
Posts (Atom)