GoDataDriven Open Source Contribution for Q3 2019

Barend Garvelink/
21 October, 2019

In the third quarter of 2019, the GDD team has contributed to no fewer than 15 different open source projects:

Various Projects

Rens contributed documentation to voila (#229). Vincent made a modest improvement to
the documentation of Sense2Vec (#72). Tim contributed Growatt support to the
Home Assistant Community Store (#507). Fokko improved the logging in Apache Flink (#9493),
updated dependencies in Apache Spark (#25432, #25437, #25451) to patch some security
issues. Apache Iceberg (Incubating) (#488, #489), initially started at Netflix, and now incubating into the Apache Softare Foundation. Furthermore, small improvements on Airlift (#186), Presto SQL (#1603), Apache Parquet (#674), and resurrected the build for MySQL Replicator in #43. Kris reduced the docker footprint of Whirl by dropping an unneeded JDK dependency #54 and improved documentation of his docker-kafka image (#12).

Evol

Rogier contributed to Evol the pull requests #112, #128, #129,
#137, #138, #143, #144, #145 and #146. These are primarily
bugfixes and cleanups that made it into the 0.5.1 release.

Scruid

Bas Beelen and Barend worked together to add authentication (#74) support to the
Scruid project, and Bas added a cool new logo (#68). In the meantime, Fokko updated the testing harness
to use the latest version of Druid (#69). Barend improved the exception handling of unexpected HTTP status codes
(#67, #70). Fokko added a missing test case (#71)

Java IBAN

Barend published versions 1.6.0 and 1.6.1 of the Java IBAN project into Maven Central,
adding twelve new IBAN patterns, clarifying the use of reference data, scrubbing potentially sensitive information from
the exception messages and adding some minor features to the API.

Apache Airflow

We have contributed to Airflow and the Airflow ecosystem. Bas Harenslak and Fokko traditionally take
the lead here. The default behaviour for XCom's changes; command output is now discarded by default, where it used to be
pushed as an XCom by default (#5779). Anyone using Airflow to coordinate Spark jobs should cheer.

Apache Avro

Fokko shepherd the release of Apache Avro 1.9.1 release. The 1.9.1 was released quickly after 1.9.0 because of the discovery of a regression bug. If you're still on the Avro 1.8 branch, it is highly recommended to move to version 1.9.1. An overview of the changes can be found in a seperate blogpost. Not a lot of functionality has been added, but it bumped a lot of the dependencies of Avro which contained CVE's. Also, the dependency on Joda-Time has been removed (#631). Pull requests: #613, #623, #624, #626, #627, #629, #630, #631, #632, #633, #634, #635.

Apache Druid (Incubating)

Fokko was accepted as committer to the Apache Druid project! Wasting no time, he took care of some
version updates #8292, #8294, #8404, #8405, #8406, #8407 and
general improvements #8234, #8235, #8340.

Scikit-Lego

Rens added repeating basis functions to scikit-lego (#171). Vincent added pulls #162,
#164, #167, #168 and #170 which are mostly housekeeping, and reviewed #156 which
adds a cool FairClassifier. The 0.3.0 release contains these changes.

Join Us!

Are you a Data Engineer or Data Scientist who cares about open source, we're hiring!


Subscribe to our newsletter

Stay up to date on the latest insights and best-practices by registering for the GoDataDriven newsletter.