Clarifications on Assignment 11 (decision trees in Python)

system · 2. Juli 2020 um 18:24

Disclaimer: Dieser Thread wurde aus dem alten Forum importiert. Daher werden eventuell nicht alle Formatierungen richtig angezeigt. Der ursprüngliche Thread beginnt im zweiten Post dieses Threads.

minusOne · 2. Juli 2020 um 18:24

Clarifications on Assignment 11 (decision trees in Python)
Dear all,

Katja told me that there were some questions regarding assignment 11.

Log function

In the assignment it says that you should use information gain, which requires the use of logarithms, but that you shouldn’t import any libraries.
It is okay (and recommended) that you import the necessary logarithm functions from the standard math library!
We’ll add this exemption to the assignment.

On a side node: you could of course implement the logarithm yourself (using e.g. the series expansion of the logarithm):

log = lambda x : sum((-1)**(1+n) * (x-1)**n/n for n in range(1, 150))

Unfortunately, it doesn’t work well for numbers close to zero. So I don’t recommend this.

Default parameter

The [m]dtl[/m] function you are supposed to implement gets a parameter called [m]default[/m].
This directly corresponds to the [m]default[/m] parameter used in the DTL algorithm described in the slides.
The default value simply indicates which answer should be chosen if no examples are left in the subtree.

Feel free to ask more questions about the assignment in this thread

Frederik

Tobs40 · 2. Juli 2020 um 19:22

I’ve just managed to finish my version which changes bases and “goal” (the argument as I call it) until both are within a certain range and then uses bisection.
Seems to work for very small/very large numbers as well when I plot it.

Not that anyone would use it, but I had already started programming it :-/

Attachment:
log.py: https://fsi.cs.fau.de/unb-attachments/post_164590/log.py

Tobs40 · 2. Juli 2020 um 19:28

Is the problem with the series, that adding a very large float to a very small float, as opposed to multiplying, is the same as adding zero because there’s not enough space in the mantissa?

minusOne · 3. Juli 2020 um 09:54

Cool that you managed to implement it with bisection!

I think the main problem with the series is that it simply doesn’t converge very quickly.

Here are the Taylor approximations up to order 20 (dotted lines), which are still quite far off from the actual log function (blue non-dotted line):

It takes several hundred terms to get a reasonably close approximation for 0.01.