2012-12-10

PL/v8 is now the richest procedure language

postgresql plv8

It's been months since the latest version of PL/v8 was released. Today I'm happy to announce the new releases of PL/v8 1.2.1, and 1.3.0. 1.2.1 contains a couple of bug fixes upon 1.2.0, and 1.3.0 has new features and backward incompatibility. Especially json type conversion between DB and v8 is considered as a major backward compatibility, for which I suggest staying in 1.2.1 for a while if you have json argument function, but strongly recommend migrating to 1.3.0 soon.

Let's look through major items in 1.3.0.

Make two dialects (plcoffee, plls) official sub-extensions.

I have already introduced plcoffee in my posts, but later on Audrey Tang, who is also the author of Pugs, came up with a patch to add LiveScript support along with CoffeeScript, and I made them official sub-extensions. If you build PL/v8 and run `make install`, you'll be able to do

CREATE EXTENSION plcoffe;
CREATE EXTENSION plls;

And they're ready.

Implement builtin json type conversion.

This has been one of the top demands for PL/v8. Although we cannot store v8's objects in efficient way, we can serialize them via JSON.parse()/stringify(). In the previous version, users needed to call those functions to convert json types in/out, but now PL/v8 does it, if it's with PostgreSQL 9.2 or above. This will be painless when it comes to pure JS functions using find_function, as it's more transparent to the callee function.

Static build and automatic v8 build via 'static' target.

If you type `make static` in the plv8 source directory, it automatically downloads v8 source and build it, and statically link plv8.so to v8. This may mitigate the v8 version confusion and build annoyance.

Implement v8's remote debugger if enabled.

d8, the v8's command line tool, supports remote debugger. If you build PL/v8 with ENABLE_DEBUGGER_SUPPORT, the module will open remote debugger port which d8 can connect and control PL/v8. Once it's ready, try d8 such like:

$ d8 --remote-debugger --debugger-port=35432

The `debugger` statement can be also used to set a break point from JS code.

Implement bytea type conversion and typed array.

The bytea type was not automatically converted to JS value. Actually, v8 has a good way to handle this; typed array. It has been in v8 for pixel memory in a good performance and Node.js also uses it. The concept of typed array is it can map byte array to JS array directly without creating each element as JS value. What it means is when you pass bytea type to PL/v8 function, the value in the function can be seen as an array and you can fetch unsigned byte in each element in efficient way. Also this allows primitive PostgreSQL arrays such like int2, int4 to be mapped to JS array without creating a bunch of JS element. To use it, you need to declare the argument as plv8_int4array domain type which is defined by plv8 extension. One example is this.

create table foo(i, ary) as
  select i, ary from(select array_agg(a) ary
  from generate_series(1, 100000)a)a, generate_series(1, 100)i;

create or replace function int4sum(ary plv8_int4array) returns int8 as $$
  var sum = 0;
  for (var i = 0; i < ary.length; i++) {
    sum += i;
  }
  return sum;
$$ language plv8 immutable strict;

select int4sum(ary) from foo;
Time: 48.882 ms

-- when using plv8_int4array instead of int4[]
Time: 820.785 ms

The caveat is the array must be a 1-dimensional non-NULL array, but if you store some numeric values in array column, this must save a lot of time.

Allow polymorphic and internal types in argument and return types.

Someone has complained this before, but now we support anyelement, anyarray types as argument and return types. Also, you can declare internal type as argument, which is very meaningful when you declare a pure JS function for find_function.

Support user defined window functions.

PL/v8 is the first procedure language that allows you full access to window function API (although PL/R has also supported user-defined window functions partially). With window functions, you can create a function that operates on multiple rows that fall into a specific set of rows. To use it, retrieve WindowObject via plv8.get_window_object() which provides wrapper functions for internal C-level window function APIs.

var winobj = plv8.get_window_object();
return winobj.get_current_position() + 1;

I implemented all the built-in window functions in PL/v8, which you can see in the regression test file.

- - -

PL/v8 has been a trusted, full-functional procedure language even in the previous version. As you can see from this post, it's now the richest procedure language among all the existing ones including PL/pgSQL or PL/python. If you consider a trusted language, the only options are PL/pgSQL which is slow and you can write window functions with it. If you use mathematical libraries, PL/python is a good choice, but it's not trusted language, and array access in PL/v8 is much faster than PL/python. So, of course it is your choice, but PL/v8 is absolutely the first language you should consider when you have to write something.

Enjoy!

2012-05-11

PL/coffee Trial

postgresql plv8

We are getting to the final stage of the first stable release of PL/v8. This should be a good release with a lot of improvement such like

Subtransaction support
Better name space of built-in functions
OO style cursor, prepared plan
find_function
Start-up procedure with GUC
Separate context in user switch
A series of bug fixes.

And these days I'm getting increasing number of interests and feedback via email, SNS, plv8 project page. Upcoming PGCon will at least have two talks that may cover the usage of PL/v8.

In the mean time, there was a long standing feature request of PL/coffee, a procedural language in CoffeeScript, the dialect of JavaScript. CoffeeScript is only a source-source transformation, and the compiler is provided as a tiny JavaScript, so I tried to let the compiler to transform plv8 source if the language is "plcoffee".

=# CREATE OR REPLACE FUNCTION public.fibonacci(n integer)
 RETURNS integer LANGUAGE plcoffee IMMUTABLE STRICT
AS $function$
fibonacci = (x)->
  return 0 if x == 0
  return 1 if x == 1
  return fibonacci(x-1) + fibonacci(x-2)
return fibonacci n
$function$;
CREATE FUNCTION

=# select fibonacci(10);
 fibonacci 
-----------
        55
(1 row)

As I noted above, this is only a source transformation, so the engine is still v8, and the runtime environment and other stuff are shared with plv8. Of course it can reference plv8 function via find_function(). The plcoffee EXTENSION is separate and by default it's off. If you are interested in it, clone the latest source and say

make ENABLE_COFFEE=1 install

It works and it attracts more interest from the people around some area. However, it seems it is a little controversial to use CoffeeScript (of course not in the database usage context, but in the web browser and server context) that the source transformation is quite difficult to debug since the problems like run time error emitted by the JS engine is far from the actual script source, which is so annoying. And since I embed the compiler JS source file as a object file symbol, the size of plv8 shared object gets from around 200k byte to 600k byte. Thus, for now I leave it as an "experimental" feature and want to see if it gets popular or not. Please let us know if you like or dislike it.

2012-04-10

PL/v8 Road Map

postgresql plv8

For the last couple of months, I'm back to the PL/v8 development. One reason is that I'm now more interested in building and running a mid-term software project than hacking PostgreSQL core. I found myself enjoying to write the code of PL/v8 as V8's code base is so neat as well as PostgreSQL's. Unfortunately, V8 as a project is a bit closed, so the details on the design, its intention, or the behavior is sometimes vague, which takes me a little long to get an idea.

The goal of PL/v8 is clear. The fact that JavaScript is a pure language that doesn't have external modules in itself actually helps much in the database context. In contrast to PL/perl or PL/python, PL/v8 must be a purely trusted language. What this means is that I don't want to add untrusted features like loading modules from the file system (even pure js files.) The language capability should be completely within the database box and never goes out of it. Someone demands things like node.js, but node.js is now a platform that has npm repository, so the PL/node should come after PL/v8 (I'd think of PL/coffescript since it's only a source-source transformation, but that's another story :)).

Today I added plv8.find_function() to the master branch. This utility function returns a JavaScript function that is registered as a plv8 function. An example explains more than I.

CREATE FUNCTION callee(a int) RETURNS int AS $$
  returns a * a;
$$ LANGUAGE plv8;

CREATE FUNCTION caller(a int) RETURNS int AS $$
  var func = plv8.find_function("callee");
  return func(a);
$$ LANGUAGE plv8;

SELECT caller(2);
 caller 
--------
      4
(1 row)

This way I'd say "no" to users who demand to load external js files from the file system. I agree that the code reuse is a key solution for a good code management practice, i.e. DRY. I'd say PL/v8 has been a toy in that it had had no capability to reuse the code. And it does now.

Of course this is not complete. First, I don't like its name. Anyone has better idea please tell me. Second, the performance is not good. Well, actual code compilation time is not so bad thanks to the fastest JavaScript engine, but find_function is a pure JS function and it's called on every row. One idea to avoid this is that PL/v8 can cache the found functions internally and transparently returns from it. The other is to let a function-local object be "this" of the function and to ask users to cache the function object to "this". Both ways are not trivial, but I'd rather like the latter as the architecture, since the success story around JavaScript wrapper application is a "thin" wrapper as a C/C++ and make another JS layer that handles this kind of annoyance. Even V8 engine implements most of the built-in function by JavaScript.

The other things in my mind for the rest of PL/v8 to be out of the door as the production level include fixing known issues around SPI/triggers, registering to PGXN, Windows binary, and ... tell me what you want.