In this talk, we will discuss the storage sub-system in big data systems such as Apache Spark and Apache Hadoop, and the roles Flash can play. Big Data systems have been used for multiple years to solve problems that require scale. A framework takes care of scalability and resiliency issues, and allows the user to focus on relevant computation, in the form of map and reduce functions. In these Big Data systems, we currently see a shift from the traditional use of the Hard Disk Drive (HDD) towards in-memory computation. We have theoretically evaluated these two generations of Big Data systems, as well as two implementations, being Apache Hadoop and Apache Spark, in combination with Flash technology. We have also evaluated the possible use of Flash technology in these Big Data systems, by performing two experiments. Our first experiment examined the performance of Apache Spark versus Apache Hadoop for a representative iterative algorithm, and the performance degradation of Apache Spark under memory constrains. We have found that, for the chosen algorithm, Apache Spark performs equal-or-better compared to Apache Hadoop when data has to be loaded from the HDD, such as is the case of the initialization phase of a program. For the iterative part of our program, we have seen an overall speedup of 30x, and a speedup of 100x for the map and reduce phases. In our second experiment, we evaluated two ways of using Flash, in particular using the IBM FlashSystem 840 connected to a Power8, in Apache Spark. We have first evaluated Flash technology with a mounted file system, and used this setup to replace the HDD as default spill location. We have found that this was not valuable, as the possible performance improvements were negligible compared to the overhead generated by data aggregation and system calls. We then shifted our focus to CAPI connected Flash, and modified Apache Spark to spill intermediate data directly to the FlashSystem using key value pairs. In our experiment, while limiting the memory to a fixed amount to force spilling, we were able to remove 70% of the overhead caused by spilling. This was mainly overhead of Operating System (OS) involvement. In our future work, we will address the overhead caused by data aggregation, as we can write smaller amounts of data, because we are writing in key value pairs. We believe that, once this overhead is removed, Big Data systems can benefit from Flash technology, and especially CAPI Flash technology, as one can use a system with a limited amount of expensive DRAM and a large Flash backend, to solve larger problem sets while maintaining a performance equal to in-memory computation.