JHUG meetup October 4th 2016

After several months, JHUG returned on Tuesday, 4th of October (Meetup.com event). I asked my employer, Workable to host the event, and they happilly agreed, so JHUG members had the opportunity to visit our impressive new premises. In terms of people this was a very successful meetup, as around 70 non-Workablers showed up, several new faces in them. Building on our experience from the previous event, we (well, Kostis) recorded the event, so this post will be updated once the videos are ready.

The first presentation was by my two co-workers, Nikos Dimos (Linkedin) and Rui Miguel Forte (Linkedin), who lead the Sourcing and Data Science teams respectively. Nikos talked about how the Sourcing team at Workable moved to Microservices architecture. He started with some basics, like defining what a monolith architecture and a microservice architecture are, and when each is a good fit (no silver bullet here either). He then drilled down to specific problems that kept cropping up with our old, monolithic architecture, and how they were mitigated by moving to Microservices. He showed how the new architecture relies on Rabbit MQ for inter-service communication, and employs tools like Apache Kafka to keep track of what service invocations took place and how. Key takeaways were understanding the tradeoffs when selecting between the Monolith and Microservices, how testing becomes easier and the fact that when moving to a Microservice architecture you need to embrace working asynchronously and make the most out of it. You can find the slides here and the video here.

Miguel (who is also the leader of the Data Science Athens group) carried on with an introduction on what Data Science is and the kind of problems we use it to solve, and went on with tools that his team frequently uses. The first tool is Apache Tika, a toolkit that extracts text and metadata from multiple file types. He also demoed (code available to play with on Github - branch develop) some more advanced tasks, such as image and links extraction. The presentation went on with PMML (Predictive Model Markup Language). A common problem our Data Science team has is that predictive models are easier to develop and train in languages like Python or R, but then those models need to be ported to Java for production. PMML is a - surprisingly old - exchange format allows us to transfer (export and re-import) the models from from one platform to the other, so that the model does not need to be developed and trained from scratch (which may take days, depending on the model). You can find the slides here.

The second presentation was by Marios Kogias (Linkedin), who talked about Code Maintainability. The talk was based on the book Building Maintainable Software by Joost Visser and others. As Marios pointed out, the methodology promoted by the book is not written in blood, but is essentially a set of best practices which can make our life easier in the long run. Essentially, the talk highlighted three simple day-to-day habits, like keeping methods at most 15 LoC, duplicating code wisely, and keeping method signatures simple using encapsulation. Marios also described which refactoring operations (most IDEs have them out of the box) are appropriate for each case. Finally, the presentation mentioned code analysis tools like Better Code Hub, PMD and the fantastic SonarQube. The presentation is also appropriate for junior devs. You can find the slides here and the video here.

The third presentation was by Thomas Pliakas (Linkedin), who talked about Garbage Collection performance tuning. We all know that GC is something taking place in the background, and we don’t worry about it often. When we do though, it is useful that we know at least which are the basic Garbage Collectors, what algorithms they use and the phases of their execution. Thomas began with the presentation of the sub-regions of what we see from the outside as Java Heap (Eden, Tenured, Permgen), and why GC revolves around the age of objects. He then went on to explain what Minor, Major and Full GC are, as well as available Garbage Collectors. Thomas explained factors to take into account before tuning GC (latency, throughput, capacity) and continued with tunings for the G1, which is going to become the default. The presentation finished with references which are a very good starting point for people diving into the interesting world of GC. You can find the slides here.

Written on October 4, 2016