Tags: #octave 

Support for the JSON data format after GSoC 2020

The Google Summer of Code (GSoC) 2020 will end with this August. I had the pleasure to mentor Abdallah Elshamy working on the implementation of the jsondecode() and jsonencode() functions for Octave. These function allow to convert JSON data strings to Octave Objects and vice versa.

See the new benchmark of this Jupyter Notebook (November 23, 2020) with matlab.lang.makeValidName as C++ code.

Last week we pushed most of Abdallah’s work to the main Octave repository, but he is still working on the functions and hopefully still after GSoC 2020 is over. Now as it is very convenient to use the JSON functions, I gave them a try with larger JSON data. Some of the test cases I collected from the excellent nativejson-benchmark, but with focus on Octave. Another test by Abdallah has been carried out in June to test the compatibility for Matlab.

Only the running times for reading and writing JSON data are regarded in this benchmark.

The test environment is a laptop with

octave_version = version ()
octave_hg_id   = version ('-hgid')
octave_version = 7.0.0
octave_hg_id = 173807014259

The following JSON extensions for Octave are under test.

name description
Octave (builtin) Based on RapidJSON, reading DOM API.
octave-rapidjson Based on RapidJSON, reading SAX API.
octave-jsonstuff Based on RapidJSON, reading DOM API, writing m-file.
JSONio Based on JSMN, writing m-file.
jsonlab m-file only

The JSON test files are described in the following table.

name size (byte) description
citm_catalog.json 1,727,204 Structured data with mixed text and numeric.
canada.json 2,251,060 Numeric data set in GeoJSON format.
large-file.json 26,141,343 Structured data with mixed text and numeric.

Benchmark setup

Create a directory to keep track of the mess.

mkdir ('benchmark');
cd ('benchmark');

Load the benchmark JSON files.

if (exist ('citm_catalog.json', 'file') ~= 2)
  urlwrite ( ...
    'https://github.com/RichardHightower/json-parsers-benchmark/raw/master/data/citm_catalog.json', ...

if (exist ('canada.json', 'file') ~= 2)
  urlwrite ( ...
    'https://github.com/mloskot/json_benchmark/raw/master/data/canada.json', ...

if (exist ('large-file.json', 'file') ~= 2)
  urlwrite ( ...
    'https://github.com/json-iterator/test-data/raw/master/large-file.json', ...

Setup octave-rapidjson.

if (exist ('octave-rapidjson', 'dir') == 0)
  urlwrite ( ...
    'https://github.com/Andy1978/octave-rapidjson/archive/2d88511712032b14dea4c2272d82249e7547772a.zip', ...
  unzip  ('octave-rapidjson.zip');
  rename ('octave-rapidjson-2d88511712032b14dea4c2272d82249e7547772a', ...
  cd ('octave-rapidjson')
  urlwrite ( ...
    'https://github.com/Tencent/rapidjson/archive/35e480fc4ddf4ec4f7ad34d96353eef0aabf002d.zip', ...
  unzip  ('rapidjson.zip');
  rename ('rapidjson-35e480fc4ddf4ec4f7ad34d96353eef0aabf002d', 'rapidjson');
  mkoctfile -Wall -Wextra -I./rapidjson/include load_json.cc
  mkoctfile -Wall -Wextra -I./rapidjson/include save_json.cc
  cd ('..')

Setup octave-jsonstuff.

if (isempty (pkg ('list', 'jsonstuff')))
  pkg install https://github.com/apjanke/octave-jsonstuff/releases/download/v0.3.3/jsonstuff-0.3.3.tar.gz

Setup JSONio.

if (exist ('JSONio', 'dir') == 0)
  urlwrite ( ...
    'https://github.com/gllmflndn/JSONio/archive/6c699a315ac2c578864d8b740a061bff47b718bf.zip', ...
  unzip  ('JSONio.zip');
  rename ('JSONio-6c699a315ac2c578864d8b740a061bff47b718bf', 'JSONio');
  cd ('JSONio')
  mkoctfile --mex jsonread.c jsmn.c -DJSMN_PARENT_LINKS
  cd ('..')

Setup jsonlab.

if (exist ('jsonlab', 'dir') == 0)
  urlwrite ( ...
    'https://github.com/fangq/jsonlab/archive/d0fb684bd43165d312063345bdb795b628b2c679.zip', ...
  unzip  ('jsonlab.zip');
  rename ('jsonlab-d0fb684bd43165d312063345bdb795b628b2c679', 'jsonlab');

Benchmark run

The benchmark function reads the respective JSON file into a string and calls the libraries reading and writing function.

function t = benchmark (json_read_fcn, json_write_fcn)
  test_files = {'citm_catalog.json', 'canada.json', 'large-file.json'};
  N = length (test_files);
  t = nan (N, 2);
  for i = 1:N
    json_str = fileread (test_files{i});
    tic ();
    octave_obj = json_read_fcn (json_str);
    t(i,1) = toc ();
    tic ();
    json_str2 = json_write_fcn (octave_obj);
    t(i,2) = toc ();

The results for the Matlab (R2020b, prerelease) have been measured on the same system without JupyterLab.

t.matlab = [
  0.0768, 0.0853;
  0.1510, 0.5405;
  1.2222, 0.6521];

Octave (7.0.0, development version)

t.octave = benchmark (@jsondecode, @jsonencode);


addpath ('octave-rapidjson')
t.rapid_json = benchmark (@load_json, @save_json);
rmpath ('octave-rapidjson')

octave-jsonstuff: No results due to an error.

%pkg load jsonstuff
%t.jsonstuff = benchmark (@jsondecode, @jsonencode);
%error: cat: field names mismatch in concatenating structs
%error: called from
%    jsondecode>condense_decoded_json_recursive at line 116 column 9
%    jsondecode>condense_decoded_json at line 67 column 7
%    jsondecode at line 63 column 7
%    benchmark at line 8 column 16
%pkg unload jsonstuff

JSONio: Because of the long running time, the results of the first run are saved here.

addpath ('JSONio')
%t.jsonio = benchmark (@jsonread, @jsonwrite);
t.jsonio = [ ...
  0.9583,  30.5410;
  6.1333,  17.4022;
  4.3382, 552.8929];
rmpath ('JSONio')

Jsonlab: Because of the long running time, the results of the first run are saved here.

addpath ('jsonlab')
%t.jsonlab = benchmark (@loadjson, @savejson);
t.jsonlab = [ ...
   35.6242,  26.0625;
    6.1303,   0.7365;
  372.2456, 601.5318];
rmpath ('jsonlab')

Benchmark results

Update 2020-08-29: Abdallah found out that the speed problem (described blow) was the call to matlab.lang.makeValidName not the chosen DOM API.

graphics_toolkit ('qt')
titles = {'citm\_catalog.json (2 MB, mixed)', ...
          'canada.json (2 MB, numeric)', ...
          'large-file.json (26 MB, mixed)'};
for i = 1:3
  subplot (3, 1, i);
  bar ([t.matlab(i,:); t.octave(i,:); t.rapid_json(i,:)]');
  legend ({'Matlab (R2020b, pre)', 'Octave (7.0.0, dev)', ...
           'octave-rapidjson'}, 'Location', 'bestoutside');
  ylabel ('time in seconds');
  title (titles{i});


for i = 1:3
  subplot (3, 1, i);
  bar ([t.jsonio(i,:); t.jsonlab(i,:)]');
  legend ({'JSONio', 'jsonlab'}, 'Location', 'bestoutside');
  ylabel ('time in seconds');
  title (titles{i});


The results are not as overwhelming as I initially hoped for

(they are, see comment above.)

The first figure compares the running times of Matlab, Octave, and octave-rapidjson. Both Octave and octave-rapidjson are based on RapidJSON.

It must be the choice of the API (DOM vs. SAX) that slows down the current Octave implementation (DOM) by a factor of 10 to 100

(wrong, see comment above).

octave-rapidjson, using the SAX API, is for the mixed data case not slower than Matlab. But the implementation itself is less Matlab compatible than the current Octave implementation. The choice of DOM for the current Octave implementation was made to achieve best compatibility to Matlab.

On the positive side, in the case of more numeric data (canada.json) the DOM API outperforms the SAX API. Nevertheless, my humble assumption is that mixed data is more common for JSON data files.

The results of JSONio and jsonlab are split into a second figure, as the running times are significantly larger than those of the first figure. For octave-jsonstuff we could due to an error not obtain any results. I’ll inform the maintainer to hopefully in the future repeat this benchmark.

Regarding this benchmark Octave should seriously consider switching to the SAX API and additionally preserve the current Matlab compatibility

(see comment above).

GSoC 2020 is over, and Abdallah enriched Octave with a great new feature. When he (or someone else) ports the Octave function matlab.lang.makeValidName to the C++ language, the performance of JSON decoding and encoding is great and compatible to Matlab.

August 29, 2020 (Version 2)

Download the Jupyter Notebook.

(C) 2017 — 2021 Kai Torben Ohlhus. This work is licensed under CC BY 4.0. Page design adapted from minima and researcher. Get the sources on GitHub.