This paper revisits the idea of decision DAGs for classification. Unlike a decision tree, a decision DAG is able to merge nodes at each layer, preventing the tree from growing exponentially with depth. This represents an alternative to decision-trees utilizing pruning methods as a means of controlling model size and preventing overfitting. The paper casts learning with this model as an empirical risk minimization problem, where the idea is to learn both the DAG structure along with the split parameters of each node. Two algorithms are presented to learn the structure and parameters in a greedy layer-wise manner using an information-gain based objective. Compared to several baseline approaches using ensembles of fixed-size decision trees, ensembles of decision DAGs seem to provide improved generalization performance for a given model size (as measured by the total number of nodes in the ensemble).