Tuesday, May 8, 2018

Analytics Hubris

The word Hubris is ancient Greek and is used when man claims godlike powers, in our case, the power to foresee, to predict and control the future. Hubris is often followed by Nemesis which punishes the men that have stepped toofar.

Economic/Business growth happens on the verge of hubris. Risky behavior (hubris) is materialized in wealth if the venture is successful or bankruptcy and pain if it fails. For this reason and to keep the human spirit alive and vibrant some amount of hubris.

When modeling any real world process, as we strip complexity to make it computationally tractable, it is important to never loose sight of the abstractions we made.

We rely on the abstractions to get results BUT the abstractions are not the real thing.
As more and more of our work happens virtually and data analytics tools spread through the ranks of less experienced engineers we will see more of this phenomenon. What I call Analytics Hubris.

Earlier statesmen and senior scientists were familiar with this behavior. Younger men in their craze to climb the ladder mistake aptitude with a tool for wisdom.

I found this excerpt from the book Signals: The Breakdown of the Social Contract and the Rise of Geopolitics  which warns on the limitations of analytics/modelling (econometrics in this case) and the deeper knowledge you need to have in order to make decisions have systemic impact.

Saturday, May 5, 2018

Akin's Laws of Spacecraft Design* , (or anything engineering based)

  1. Engineering is done with numbers. Analysis without numbers is only an opinion.
  2. To design a spacecraft right takes an infinite amount of effort. This is why it's a good idea to design them to operate when some things are wrong .
  3. Design is an iterative process. The necessary number of iterations is one more than the number you have currently done. This is true at any point in time.
  4. Your best design efforts will inevitably wind up being useless in the final design. Learn to live with the disappointment.
  5. (Miller's Law) Three points determine a curve.
  6. (Mar's Law) Everything is linear if plotted log-log with a fat magic marker.
  7. At the start of any design effort, the person who most wants to be team leader is least likely to be capable of it.
  8. In nature, the optimum is almost always in the middle somewhere. Distrust assertions that the optimum is at an extreme point.
  9. Not having all the information you need is never a satisfactory excuse for not starting the analysis.
  10. When in doubt, estimate. In an emergency, guess. But be sure to go back and clean up the mess when the real numbers come along.
  11. Sometimes, the fastest way to get to the end is to throw everything out and start over.
  12. There is never a single right solution. There are always multiple wrong ones, though.
  13. Design is based on requirements. There's no justification for designing something one bit "better" than the requirements dictate.
  14. (Edison's Law) "Better" is the enemy of "good".
  15. (Shea's Law) The ability to improve a design occurs primarily at the interfaces. This is also the prime location for screwing it up.
  16. The previous people who did a similar analysis did not have a direct pipeline to the wisdom of the ages. There is therefore no reason to believe their analysis over yours. There is especially no reason to present their analysis as yours.
  17. The fact that an analysis appears in print has no relationship to the likelihood of its being correct.
  18. Past experience is excellent for providing a reality check. Too much reality can doom an otherwise worthwhile design, though.
  19. The odds are greatly against you being immensely smarter than everyone else in the field. If your analysis says your terminal velocity is twice the speed of light, you may have invented warp drive, but the chances are a lot better that you've screwed up.
  20. A bad design with a good presentation is doomed eventually. A good design with a bad presentation is doomed immediately.
  21. (Larrabee's Law) Half of everything you hear in a classroom is crap. Education is figuring out which half is which.
  22. When in doubt, document. (Documentation requirements will reach a maximum shortly after the termination of a program.)
  23. The schedule you develop will seem like a complete work of fiction up until the time your customer fires you for not meeting it.
  24. It's called a "Work Breakdown Structure" because the Work remaining will grow until you have a Breakdown, unless you enforce some Structure on it.
  25. (Bowden's Law) Following a testing failure, it's always possible to refine the analysis to show that you really had negative margins all along.
  26. (Montemerlo's Law) Don't do nuthin' dumb.
  27. (Varsi's Law) Schedules only move in one direction.
  28. (Ranger's Law) There ain't no such thing as a free launch.
  29. (von Tiesenhausen's Law of Program Management) To get an accurate estimate of final program requirements, multiply the initial time estimates by pi, and slide the decimal point on the cost estimates one place to the right.
  30. (von Tiesenhausen's Law of Engineering Design) If you want to have a maximum effect on the design of a new engineering system, learn to draw. Engineers always wind up designing the vehicle to look like the initial artist's concept.
  31. (Mo's Law of Evolutionary Development) You can't get to the moon by climbing successively taller trees.
  32. (Atkin's Law of Demonstrations) When the hardware is working perfectly, the really important visitors don't show up.
  33. (Patton's Law of Program Planning) A good plan violently executed now is better than a perfect plan next week.
  34. (Roosevelt's Law of Task Planning) Do what you can, where you are, with what you have.
  35. (de Saint-Exupery's Law of Design) A designer knows that he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away.
  36. Any run-of-the-mill engineer can design something which is elegant. A good engineer designs systems to be efficient. A great engineer designs them to be effective.
  37. (Henshaw's Law) One key to success in a mission is establishing clear lines of blame.
  38. Capabilities drive requirements, regardless of what the systems engineering textbooks say.
  39. Any exploration program which "just happens" to include a new launch vehicle is, de facto, a launch vehicle program.
  40. (alternate formulation) The three keys to keeping a new human space program affordable and on schedule:
    1. No new launch vehicles.
    2. No new launch vehicles.
    3. Whatever you do, don't develop any new launch vehicles.
  41. (McBryan's Law) You can't make it better until you make it work.
  42. There's never enough time to do it right, but somehow, there's always enough time to do it over.
  43. Space is a completely unforgiving environment. If you screw up the engineering, somebody dies (and there's no partial credit because most of the analysis was right...)

Akin's Laws of Spacecraft Design

Saturday, April 21, 2018

The "No Free Lunch" Theorem

The "No Free Lunch" theorem was first published by  David Wolpert and William Macready in their 1996 paper "No Free Lunch Theorems for Optimization".

In computational complexity and optimization the no free lunch theorem is a result that states that for certain types of mathematical problems, the computational cost of finding a solution, averaged over all problems in the class, is the same for any solution method. No solution therefore offers a 'short cut'. 

A model is a simplified version of the observations. The simplifications are meant to discard the superfluous details that are unlikely to generalize to new instances. However, to decide what data to keep , you must make assumptions. For example, a linear model makes the assumption that the data is fundamentally linear and the distance between the instances and the straight line is just noise, which can safely be ignored.

David Wolpert demonstrated that if you make absolutely no assumption about the data, then there is no reason to prefer one model over any other. This is called the "No Free Lunch Theorem" (NFL).

NFL states that no model is a priori guaranteed to work better. The only way to know for sure which model is the best is to evaluate them all. Since this is not possible, in practice you make some reasonable assumptions about the data and you evaluate only a few reasonable models.

Sunday, April 15, 2018

Review : Focal Loss for Dense Object Detection

The paper Focal Loss for Dense Object Detection introduces a new self balancing loss function that aims to address the huge imbalance problem between foreground/background objects found in one-step object detection networks.

y : binary class {+1, -1}
p : probability of input correctly classified to binary class

Given Cross Entropy (CE) loss for binary classification:
CE(p, y) =
-log(p) ,  if y = 1
-log(1 - p), if y = -1

The paper introduces the Focal Loss (FL) term as follows
FL(p,y) =
-(1-p)^gamma * log(p), if y = +1
-(p)^gamma * log(1-p), if y = -1

With gamma values ranging from 0 (disabling focal loss, default CE) to 2.
Intuitively, the modulating factor reduces the loss contribution from easy examples and extends the range in which an example receives loss.
Easy examples are those that achieve p close to 0 and close to 1.

Example 1
gamma = 2.0
p = 0.9
y = +1
FL(0.9, +1) = - ( 1 - 0.9 ) ^ 2.0 * log(0.9) = 0.00045 
CE(0.9, +1) = - log(0.9) = 0.0457

Example 2
gamma = 2.0
p = 0.99
y = +1
FL(0.99, +1) = - ( 1 - 0.99 ) ^ 2.0 * log(0.99) = 0.000000436
CE(0.9,9 +1) = - log(0.99) = 0.00436

That means a near certainty (a very easy example) will have a very small FL compared cross entropy loss and an ambiguous result (close to p ~ 0.5) will have a much higher effect.

In practice the authors use an a-balanced variance of FL:

FL(p,y) = 
-a(y) * ( 1 - p ) ^ gamma * log(p), if y = +1
-a(y) * ( p )  ^ gamma * log(1 - p), if y = -1

Where a(y) is a multiplier term fixing the class imbalance. This form yields slightly improved accuracy over the non-a-balanced form.

The authors then go and build a network to show off the capabilities of their loss function. The network is called RetinaNet and it's a standard Feature Pyramid Network (FPN) Backbone with two subnets's (one object classification, one box regression) attached at each feature map. It's a very common implementation for a one stage detector, similar to SSD (edit, exactly the same as SSD) and YOLO. A slight differentiation is the prior addition when initializing the bias for the object classification network and sparse calculation when adding the total cost.

For a high level understanding of deep learning click here

Thursday, February 8, 2018

Critique on "Deep Learning: A Critical Appraisal "

Deep Learning: A Critical Appraisal 

Gary Marcus argues that deep learning is : 
1. Shallow : Meaning it has limited capacity for transfer 
2. Data Hungry: Requires millions of examples to generalize sufficiently
3. Not transparent enough: It is treated as a black box

I'm not an academic but I've been reading research papers and I've seen a huge effort on all 3 fronts. (cudos to https://blog.acolyer.org/

New architectures and layers that require far fewer data and can be used for several unrelated tasks. 
A lot of opening the black box approachs based on anything from MDL, to information theory and statistics on interpreting the weights, layers and results. 

It's not all doom and gloom but huge the milestone jumps like the ones we had in the last 5 years in most AI/ML tasks are probably in the past. What we will see is a culling of a lot of bad tech and hype and the quiet rise of Differentiable Neural Computing.  

For a high level understanding of deep learning click here

Monday, January 29, 2018

Peter Thiel's 7 questions on startups

Fom Peter Thiel's "Zero To one", notes on startups 

All excellent questions before you start any venture :
  1. Engineering : Can you create breakthrough technology instead of incremental improvements ?
  2. Timing : Is now the right time to start your particular business ?
  3. Monopoly : Are you starting with a big share of a small market ?
  4. People : Do you have the right team ?
  5. Distribution : Do you have a way to not just create but deliver your product ?
  6. Durability : Will your market position be defensible 10 and 20 years into the future ? 
  7. Secret : Have you identified a unique opportunity that others don’t see ? 

Wednesday, January 10, 2018

Compiling Tensorflow under Debian Linux with GPU support and CPU extensions

Tensorflow is a wonderful tool for Differentiable Neural Computing (DNC) and has enjoyed great success and market share in the Deep Learning arena. We usually use it with python in a prebuild fashion using Anaconda or pip repositories. What we miss that way is the chance to enable optimizations to better use our processing capabilities as well as do some lower level computing using C/C++.

The purpose of this post is to be a guide for compiling Tensorflow r1.4 on Linux with CUDA GPU support and the high performance AVX and SSE CPU extensions.

This guide is largely based on the official Tensorflow Guide and this snippet with some bug fixes from my side.

1. Install python adependencies:

sudo apt-get install python-numpy python-dev python-pip python-wheel python-setuptools

2. Install GPU prerequisites:
  • CUDA developer and drivers
  • CUDNN developer and runtime
Make sure cuddn libs are copied inside the cuda/lib64 directory usually found under /usr/local/cuda.

sudo apt-get install libcupti-dev

3. Install Bazel google's custom build tool:

sudo apt-get install openjdk-8-jdk

echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list

curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -

sudo apt-get update && sudo apt-get install bazel

sudo apt-get upgrade bazel

4. Configure Tensorflow:

git clone https://github.com/tensorflow/tensorflow

cd tensorflow

git checkout r1.4

## don't use clang for nvcc backend [https://github.com/tensorflow/tensorflow/issues/11807] 
## when asked for the path to the gcc compiler, make sure it points to a version <= 5 

5. Compile with the SSE and AVX flags and install using pip:

# set locale to en_us [https://github.com/tensorflow/tensorflow/issues/36]

export LC_ALL=en_us.UTF-8

export LANG=en_us.UTF-8

bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --incompatible_load_argument_is_label=false --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

sudo pip install /tmp/tensorflow_pkg/tensorflow-1.4.1*

If you get a nasm broken link error :
edit  tensorflow/tensorflow/workspace.bzl and add an extra link

urls = [

6. Test that everything works:

cd ~/


>>> import tensorflow as tf

>>> session = tf.InteractiveSession()
>>> init = tf.global_variables_initializer()
## At this point if your get a malloc.c assertion failure, it is due to a wrong CUDA configuration (ie not using the runtime version)

At this point there should not be any CPU warning and the GPU should be initialized.