In this part we cover the paper "The case-crossover design via penalized regression" , published in BMC Medical Research Methodology. The authors conclude that "for the case-crossover design, we also encourage penalized regression for routine use."

]]>In this episode we talk abot the topic of his diploma thesis Probabilistic Forecasting of U.S. Treasury Bills and we cover time series analysis.

]]>Philipps reseach is concerned with the Theory of Distributed Computation. In distributed computation one assumes processors (or computation nodes) that are far away from each other. The input of some problem is distributed on these computing nodes and they have to collaborate to compute the solutions. The goal is to optimize the rescources, which in this case is communication, specifically the number of communication rounds. The goal of the research is twofold:

(1) Upper bounds: Designing algorithms that solve a given problem in a given distributed computational model with as little communication as possible.

(2) Lower bounds: Showing the intrinsic "hardness" of certain problems in a given computational model, by proving that they require a certain minimum amount of communication.

A typical problem in this regard is graph coloring, where the nodes of a graph must be colored with as few colors as possible, such that all neighboring nodes have different colors.

We discussed the seminal paper "LOCALITY IN DISTRIBUTED GRAPH ALGORITHMS" by NATHAN LINIAL which considers a simple distributed computational model called the LOCAL model where the communication network is a graph and nodes are processessing units that act synchronously in rounds. Nodes can send a messages to each neighbors in the graph in each round. The problem is to color this communication graph. The paper uses a mathematical structure called cover free sets by the mathematician Erdös. Given a graph which is already correctly colored with k colors, Linial uses this concept to push the number of colors down to roughly log(k) (with some simplifying lies) in a single round of communication. This step can be repeated several times (until the number of colors gets small and other factors play a role), and leads to a fast solution for coloring with few colors. This gives an "upper bound" for the communication required to solve the graph coloring problem. Linial complements this result by showing that this number of rounds is actually required to color a worst case graph with a number of colors that is close to the theoretical minimum, i.e. he gives a "lower bound".

We also talked about the Massively Parallel Computation model (MPC model): The MPC model is used in Big Data settings. The huge input is distributed over several machines. The machines have restricted memory and can send and receive at most (roughly) size of memory bits per round in an all-to-all communcation network. This captures the characteristics of distributed data centers and can be considered as theoretical counterpart to the rather famous MapReduce programming model of Google.

In the end we talk about the importance of a general computer science curriculum in school education.

Philipp recommended "The economist" as a good journalistic publication.