Jupyter notebook – Execute with parameters using CLI

executet-params-jupyter-notebook.md

Jupyter notebook – Execute from Command Line with parameters

The best way to execute a jupyter notebook with parameters is to use Papermill

  1. install: pip install papermill
  2. create a cell at the start of your jupyter notebook and add the tag: parameters
  3. add your variables to that cell
  1. and execute the command

papermill my-jupyter-notebook.ipynb my-output-notebook.ipynb -p project_key data-science-proj-key -p date '2021-08-01'

  • --p – the variable name and the value to be injected

More details here: papermill@github

Jupyter Notebook – Execute from Command Line

executet-jupyter-notebook.md

Jupyter notebook – Execute from Command Line

A typical command to execute a jupyter notebook from command line:

jupyter nbconvert --to notebook --execute mynotebook.ipynb --output output-name

  • --to notebook – the output format
  • --execute – the name of the notebook to be executed
  • --output – an optional name for the generated file

One question sometimes asked is if it is possible to pass parameters to a jupyter notebook.

  • Yes, it is possible, but you need help from papermill. See my next posts for more details.

Jupyter notebook – export to html

Jupyter notebook – Export to HTML

A typical command to export a jupyter notebook to html:

jupyter nbconvert my-notebook.ipynb --to html --no-input --no-prompt --output my-notebook-html

  • --to html – the output format
  • --no-input – without the python code
  • --no-prompt – remove the prompts from the output
  • --output – an optional name for the html generated file

How to extract an attribute from gradle.properties using bash

In some projects the version is written in the gradle.properties file using the attribute:
version=1.0.0-SNAPSHOT

I usually use this version as the version to be tagged in the Sonar server, for code analysis. This way, it is easier to have metrics more accurate when checking the new code against the previous release.

How to get the version. Alue using bash:
cat gradle.properties | “ˆ\sversion\s=“ | cut -d’=‘ -f2

And for the definition
version=1.0.0-SNAPSHOT

you should see
1.0.0-SNAPSHOT

How to get or display the project version from maven

When I am doing quality analysis on projects, one of the things I like to use is the real version of the project in Sonar.

How to extract it from maven projects?

mvn -q -Dexec.executable=echo -Dexec.args=‘${project.version}’ --non-recursive exec:exec

If your pom.xml has something like

You should see:
1.0.0-SNAPSHOT

How many unit tests?

There are several advantages in developing unit tests:

  • finds bugs early
  • You trust your code
  • Makes your code simpler, because you probably refactor it to make the tests easier to develop
  • It documents your code. The name of each unit test almost describes the feature
  • You will be more confident to change the existing code
  • It documents your code
  • The unit tests when running they are debugging your code
  • It makes it easier to troubleshoot production issues. You can reduce the scope of the analysis
  • It reduces costs. Code with fewer bugs reduces the costs of troubleshooting later
  • They execute faster than the other tests. You can use them as a rapid verification of the quality of our code.

Looking at all these advantages, we surely start to think we should develop as many unit tests as possible.
But the truth is, you should develop just enough unit tests.

How many unit tests are enough?

Let’s first analyse the consequences of developing unit tests:

  • Every unit test increases the coupling between your production code and the unit test. Meaning, every time you change the production code, you need to update one or several unit tests.
  • Developing unit tests takes time, not only developing but maintaining them too
  • It is very easy to start testing code that doesn’t need testing, such as the use of external frameworks. I see this a lot!
  • The unit tests, test the algorithm, not the functionality. You will prefer to use that time to develop integration tests, they cover more code with just only a few tests. Be aware they require some preparation, though (a future post)
  • Can create a lot of coupling if you use have to use a high quantity of mocks

Looking at all these advantages and disadvantages, I usually advise:

  • target a coverage around 70%, more than this starts to create a high couple between the code and the tests
  • Use an approach of risk-based testing, meaning
    • choose to test the branch in your method or unit that covers the most important functionality
    • cover the code that cannot fail because it brings costs to the company
  • Develop them at the same time as you develop the method, otherwise, the code will become complicated to test. Use a TDD-ish approach.

Pass a parameter to all JVMs in a machine at the same time

JAVA_TOOL_OPTIONS

Imagine a scenario where you need to change a JVM parameter, but you can’t or is not a good solution to changing the start script of your server(s).

One of the challenges we had, when we were working with containers, was a way to change a parameter to a JVM without building the docker image again.
The application at the start time should read a JVM parameter where a _user profile_was defined.
For specific reasons we sometimes need to change this profile, for instance, to use a more controlled user where we can debug an issue. In these situations, we want to stop the container/pod, change the profile and start again, or even start a different container with a different profile.

As you know, the way to pass a JVM parameter to an application is
java -Dprofile=myValue com.some.Application appParameter1 appPaparameter2

What if I don’t have access, or is not viable to rewrite the above line?
You can add to the Environment variables of your operating system

JAVA_TOOL_OPTIONS=‘-Dvar1=value1 -Dvar2=value2’

Yes, you can add multiple values, since it is not a parameter, but an environment variable whose values will be injected into the JVM at starting time.

When the JVM starts it searches for this environment variable and uses it. You can check the output of the JVM displaying a message with the values found.

JDK_JAVA_OPTIONS

This environment variable has a different purpose, it adds its values as a prefix to the Java other values

Example

export JDK_JAVA_OPTIONS=‘-Dparam=value @someFile

When we execute this
java @otherFile

The real execution is
java -Dparam=value @somefile @otherFile

Code complexity: Cyclomatic vs Cognitive

As the name suggests, code complexity measures how complex is your code. If you consider the number of possible paths your code produces, the measure is called cyclomatic. If you want other angle to measure the complexity, such as how difficult is to be read by a human, we call it cognitive.

Why is this measure important

If we take the following code

a graph like this one can be produced

From this graph we can deduce the number os possible paths an application can follow during its execution M=6-5+2x1, or 3. (Check the formulas below)
If you develop unit tests, you easily reach the conclusion that exists a strong correlation between the number of possible paths and the number of unit tests necessary to reach a coverage of 100% of the code!
Now, if your method, is too complex, or/and has a lot of possible paths of execution, you will need to develop and maintain a lot of unit tests, and, for sure, difficult for another person to understand and continue your work.
You want to have a low value for this metric. Usually, Sonar proposes a maximum value of 15. For any value above this one, you should consider refactoring your code.
By the way, the recommended value is 4 or 5.

Cyclomatic complexity

This measures how difficult is your code to be tested.
If you consider a graph generated from code, you can use the formula

E -> the number of edges in the graph
N -> the number of nodes in the graph
P -> the number of connected components (a method is considered a component, so in these cases P = 1)

For Java like languages another way to calculate exists

  • assign one point for the start of the method
  • Add one point for each conditional statement, such as an if
  • Add one point for each iterative statement
  • Add one point for each case or default in a switch
  • Add one point for each Boolean condition &&, ||
  • Sometimes it is added one point for each throws, throw, catch, and finally blocks, but it is advised to not use these points since Java has a very verbose code regarding exceptions

Cognitive complexity

This measures how difficult is your code to be read and understood.
There isn’t a specific formula to apply, but usually what is done is to take the regular formula for the Cyclomatic complexity and take into consideration how hard is to assess mentally the code

  • add points for catch, switch
  • Ignores the number of Boolean operator in an if, because mentally the effort is the same to read a && b and the a && b && c
  • Add points for each recursion (the Cyclomatic ignores this)
  • Increments for nested flows-break structures

Tools to measure the complexity

These are the ones I use most

Google for complexity metrics and several plugins for several languages and IDEs will be displayed.

Observer pattern

The observer pattern is one of the most used patterns when developing current web technologies.
For instance, in the interface when you change a value, such as a date, and something changes immediately in another place of the UI, it is the Observer pattern working behind the scenes.
When you make a REST API in an angular application, you register a method that will process the result after receiving the response, it is the Observer pattern.
In applications with buttons and dropdowns that affect other controls, it is again the Observer pattern.

what is the main idea of this pattern

You want to use this pattern when a component should be immediately notified when something changes its state or a specific behaviour happens.
By the way, this pattern is not the same as “publisher-subscriber”, but that’s a subject for another article.

An example

To avoid the common examples, imagine we have a class that every time a record is sent to a database a set of other classes want to know about, for instance, a logger and an audit class.
Let’s write a simple draft to understand a possible implementation.

  • We have a CustomerDao, the class responsible to send data to the database and the class that has information of utility to others classes
  • We have an interface for Subscriber to allow any class to be notified if it implements this interface
  • And we have 2 other classes to be Subscibers DatbaseLoggerObserver and DatabaseAuditObserver, to be notified from the CustomerDao every time a new Customer is inserted, but since they are decoupled from the CustomerDao they can receive from anywhere

Usually a possible flux would be
1. create the instances of the classes
2. Add the subscribers to the CustomerDao
3. Execute!

this is just a snippet to understand the pattern, a production code can and should do so much better. For instance you can extract the subscribers list and operations to add and remove them to another class, and so on

Advantages and disadvantages

Advantages

  • Allows to send data to other objects without knowing them
  • If it is well implemented the objects interact in a loosely coupled way
  • Observers can be added and removed at any time without changing the code

Disadvantages

  • Look for performance issues since each notification waits for each observer
  • What happens if an observer throws an exception?

Conclusions

This is a great pattern and is widely used at the code level, such as events in the interface, trigger behaviour in specific classes and methods and so on.
This pattern is not a good option if you need this behaviour at the architecture level, since it doesn’t have mechanisms regarding performance, scalability, reliability, etc. For these cases consider to use publisher-subscriber

Wrap Class – Refactor Legacy Pattern

Wrap Class

When using a wrap method, we want to add a new behavior that can be added around the method, such as before or after the existing code.
However, there are times when we need to add behavior in more than one method, like logging before or after some, or all public methods.

Description and how to apply

There are at least 2 ways of doing it:

1. Using the Decorator Pattern

When using the **Decorator Pattern**, we want to make sure both classes implement the same interface or inherit from the same base class, because we want the subclass to have the same “public” interface. This is because a Decorator class can be used interchangeably between the other Decorators. For instance, having a logging decorator or a performance decorator should be as easy as just pass one instance or the other in a class constructor, and its use should be transparent.

2. Aggregating the current class

If you think you want to be more specific about the class being refactored,
then a Wrap Class just wrapping the existing one should be enough. In this case, you don’t need to *extract interface* since the class is passed in the constructor parameters and used in the method. In this approach, you can do a “method rename”, for instance, if the original method was called *process*, the new exposed method could be *processWithLogging*. Another advantage is that in the constructor, you can pass other information useful for the new behavior, something that in the Decorator Pattern can “violate” the Decorator Pattern principle.

Which one?

From the examples below you can observe the Aggregating solution is more specific than the Decorator. Both will work, but if you need help choosing, choose the Decorator if you think that other changes will happen in the future that can be added creating more Decorators, and/or the updates are just adding new behavior, and probably you expect changes in multiple methods, like logging every public method. Use the Aggregating solution if the change is very specific, just in a few places and you don’t expect great changes in the future for these methods.

Examples

Imagine the following business existing code:

Using the Decorator Pattern

You just have to replace the calls to CustomerPaymentProcessor with the Decorator, sice both implement the same interface.

Probably when refactoring you will need to *Extract Interface* or create an *Abstract Class* to implement this pattern.

Using a Wrapper Class

Here, the solution is specific to the code refactoring being done, it cannot be reused.