Share your knowledge: March 2013

As I'm always interested in benchmarks and optimized solutions I compared 3 strategies of finding duplicate values in a big sequence. This test will use 5000 randomly generated numbers and compare performance.
The first thing I needed to do was creating a sequence of 5000 randomly generated numbers. Scala comes to the rescue

Loading ....

This generates a file with following content. Remark: in reality that sequence contains 5000 numbers.

let $values := (1012,5345,2891,3833,2854, 2236)

Now I ran the following XQueries on Zorba to get a feeling about how fast they are.

XQuery 1: (takes about 5 seconds for 5000 numbers)

let $values := (1012,5345,2891,3833,2854, 2236)
let $distinctvalues := distinct-values($values)
let $nonunique := for $value in $distinctvalues return if (count(index-of($values, $value)) > 1) then $value else ()
return $nonunique

XQuery 2: (takes about 5 seconds for 5000 numbers)

let $values := (1012,5345,2891,3833,2854, 2236)
return $values[index-of($values, .)[2]]

XQuery 3: (takes about 1 seconds for 5000 numbers)

let $values := (1012,5345,2891,3833,2854, 2236)

return distinct-values(for $value in $values
  return if (count($values[. eq $value]) > 1)
         then $value
         else ())

Of course I got intrigued how Sedna would perform. I only tried XQuery1 (3 to 4 seconds) and XQuery3 (around 13 seconds) on my local machine, which only shows that you cannot always take the same approach while trying to optimize.

Share your knowledge

Thursday, March 28, 2013

Using the loan pattern in Java

Friday, March 22, 2013

XQuery3.0: Mimicking a FLWOR expression with higher-order functions

Tuesday, March 19, 2013

XQuery3.0: Higher-Order functions (Functional composition)

XQuery3.0: Higher-Order functions (Filtering)

Monday, March 18, 2013

XQuery3.0: Partial function example

XQuery3.0: Using simple map operator

XQuery3.0 Group by example

Normalizing Unicode

Thursday, March 14, 2013

Finding duplicate values with XQuery