Abstract: Elucidating the cellular architecture of the human neocortex is central to understanding our cognitive abilities and susceptibility to disease. Here we applied single nucleus RNA-sequencing to perform a comprehensive analysis of cell types in the middle temporal gyrus of human cerebral cortex. We identify a highly diverse set of excitatory and inhibitory neuronal types that are mostly sparse, with excitatory types being less layer-restricted than expected. Comparison to a similar mouse cortex single cell RNA-sequencing dataset revealed a surprisingly well-conserved cellular architecture that enables matching of homologous types and predictions of human cell type properties. Despite this general conservation, we also find extensive differences between homologous human and mouse cell types, including dramatic alterations in proportions, laminar distributions, gene expression, and morphology. These species-specific features emphasize the importance of directly studying human brain.
Abstract: In recent years the t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm has become one of the most used and insightful techniques for exploratory data analysis of high-dimensional data. It reveals clusters of high-dimensional data points at different scales while only requiring minimal tuning of its parameters. However, the computational complexity of the algorithm limits its application to relatively small datasets. To address this problem, several evolutions of t-SNE have been developed in recent years, mainly focusing on the scalability of the similarity computations between data points. However, these contributions are insufficient to achieve interactive rates when visualizing the evolution of the t-SNE embedding for large datasets. In this work, we present a novel approach to the minimization of the t-SNE objective function that heavily relies on graphics hardware and has linear computational complexity. Our technique decreases the computational cost of running t-SNE on datasets by orders of magnitude and retains or improves on the accuracy of past approximated techniques. We propose to approximate the repulsive forces between data points by splatting kernel textures for each data point. This approximation allows us to reformulate the t-SNE minimization problem as a series of tensor operations that can be efficiently executed on the graphics card. An efficient implementation of our technique is integrated and available for use in the widely used Google TensorFlow.js, and an open-source C++ library.
Abstract: Tissue functionality is determined by the characteristics of tissue-resident cells and their interactions within their microenvironment. Imaging Mass Cytometry offers the opportunity to distinguish cell types with high precision and link them to their spatial location in intact tissues at sub-cellular resolution. This technology produces large amounts of spatially-resolved high-dimensional data, which constitutes a serious challenge for the data analysis. We present an interactive visual analysis workflow for the end-to-end analysis of Imaging Mass Cytometry data that was developed in close collaboration with domain expert partners. We implemented the presented workflow in an interactive visual analysis tool; ImaCytE. Our workflow is designed to allow the user to discriminate cell types according to their protein expression profiles and analyze their cellular microenvironments, aiding in the formulation or verification of hypotheses on tissue architecture and function. Finally, we show the effectiveness of our workflow and ImaCytE through a case study performed by a collaborating specialist.
Objective A comprehensive understanding of anticancer immune responses is paramount for the optimal application and development of cancer immunotherapies. We unravelled local and systemic immune profiles in patients with colorectal cancer (CRC) by high-dimensional analysis to provide an unbiased characterisation of the immune contexture of CRC.
Design Thirty-six immune cell markers were simultaneously assessed at the single-cell level by mass cytometry in 35 CRC tissues, 26 tumour-associated lymph nodes, 17 colorectal healthy mucosa and 19 peripheral blood samples from 31 patients with CRC. Additionally, functional, transcriptional and spatial analyses of tumour-infiltrating lymphocytes were performed by flow cytometry, single-cell RNA-sequencing and multispectral immunofluorescence.
Results We discovered that a previously unappreciated innate lymphocyte population (Lin–CD7+CD127–CD56+CD45RO+) was enriched in CRC tissues and displayed cytotoxic activity. This subset demonstrated a tissue-resident (CD103+CD69+) phenotype and was most abundant in immunogenic mismatch repair (MMR)-deficient CRCs. Their presence in tumours was correlated with the infiltration of tumour-resident cytotoxic, helper and γδ T cells with highly similar activated (HLA-DR+CD38+PD-1+) phenotypes. Remarkably, activated γδ T cells were almost exclusively found in MMR-deficient cancers. Non-activated counterparts of tumour-resident cytotoxic and γδ T cells were present in CRC and healthy mucosa tissues, but not in lymph nodes, with the exception of tumour-positive lymph nodes.
Conclusion This work provides a blueprint for the understanding of the heterogeneous and intricate immune landscape of CRC, including the identification of previously unappreciated immune cell subsets. The concomitant presence of tumour-resident innate and adaptive immune cell populations suggests a multitargeted exploitation of their antitumour properties in a therapeutic setting.
Abstract: Hierarchical embeddings, such as HSNE, address critical visual and computational scalability issues of traditional techniques for dimensionality reduction. The improved scalability comes at the cost of the need for increased user interaction for exploration. In this paper, we provide a solution for the interactive visual Focus+Context exploration of such embeddings. We explain how to integrate embedding parts from different levels of detail, corresponding to focus and context groups, in a joint visualization. We devise an according interaction model that relates typical semantic operations on a Focus+Context visualization with the according changes in the level-of-detail-hierarchy of the embedding, including also a mode for comparative Focus+Context exploration and extend HSNE to incorporate the presented interaction model. In order to demonstrate the effectiveness of our approach, we present a use case based on the visual exploration of multi-dimensional images.
Abstract: Recent advances in single-cell acquisition technology have led to a shift towards single-cell analysis in many fields of biology. In immunology, detailed knowledge of the cellular composition is of interest, as it can be the cause of deregulated immune responses, which cause diseases. Similarly, vaccination is based on triggering proper immune responses; however, many vaccines are ineffective or only work properly in a subset of those who are vaccinated. Identifying differences in the cellular composition of the immune system in such cases can lead to more precise treatment. Cytosplore is an integrated, interactive visual analysis framework for the exploration of large single-cell datasets. We have developed Cytosplore in close collaboration with immunology researchers and several partners use the software in their daily workflow. Cytosplore enables efficient data analysis and has led to several discoveries alongside high-impact publications.
1st Prize, Dirk Bartz Prize for Visual Computing in Medicine 2019
Abstract: High-dimensional mass cytometry (CyTOF) allows the simultaneous measurement of multiple cellular markers at single cell level, providing a comprehensive view of cell compositions. However, the power of CyTOF to explore the full heterogeneity of a biological sample at the single cell level is currently limited by the number of markers measured simultaneously on a single panel. To extend the number of markers per cell, we propose an in silico method to integrate CyTOF datasets measured using multiple panels that share a set of markers. Additionally, we present an approach to select the most informative markers from an existing CyTOF dataset to be used as a shared marker set between panels. We demonstrate the feasibility of our methods by evaluating the quality of clustering and neighborhood preservation of the integrated dataset, on two public CyTOF datasets. We illustrate that by computationally extending the number of markers we can further untangle the heterogeneity of mass cytometry data, including rare cell population detection.
Abstract: Mass cytometry (CyTOF) is a valuable technology for high-dimensional analysis at the single cell level. Identification of different cell populations is an important task during the data analysis. Many clustering tools can perform this task, however, they are time consuming, often involve a manual step, and lack reproducibility when new data is included in the analysis. Learning cell types from an annotated set of cells solves these problems. However, currently available mass cytometry classifiers are either complex, dependent on prior knowledge of the cell type markers during the learning process, or can only identify canonical cell types. We propose to use a Linear Discriminant Analysis (LDA) classifier to automatically identify cell populations in CyTOF data. LDA shows comparable results with two state-of-the-art algorithms on four benchmark datasets and also outperforms a non-linear classifier such as the k-nearest neighbour classifier. To illustrate its scalability to large datasets with deeply annotated cell subtypes, we apply LDA to a dataset of ~3.5 million cells representing 57 cell types. LDA has high performance on abundant cell types as well as the majority of rare cell types, and provides accurate estimates of cell type frequencies. Further incorporating a rejection option, based on the estimated posterior probabilities, allows LDA to identify cell types that were not encountered during training. Altogether, reproducible prediction of cell type compositions using LDA opens up possibilities to analyse large cohort studies based on mass cytometry data.
Abstract: The fetus is thought to be protected from exposure to foreign antigens, yet CD45RO+ T cells reside in the fetal intestine. Here we combined functional assays with mass cytometry, single-cell RNA-sequencing and high-throughput T cell antigen receptor (TCR) sequencing to characterize the CD4+ T cell compartment in the human fetal intestine. We identified 22 CD4+ T cell clusters, including naive-like, regulatory-like and memory-like subpopulations, which were confirmed and further characterized at the transcriptional level. Memory-like CD4+ T cells had high expression of Ki-67, indicative of cell division, and CD5, a surrogate marker of TCR avidity, and produced the cytokines IFN-γ and IL-2. Pathway analysis revealed a differentiation trajectory associated with cellular activation and proinflammatory effector functions, and TCR repertoire analysis indicated clonal expansions, distinct repertoire characteristics and interconnections between subpopulations of memory-like CD4+ T cells. Imaging-mass cytometry indicated that memory-like CD4+ T cells colocalized with antigen-presenting cells. Collectively, these results provide evidence for the generation of memory-like CD4+ T cells in the human fetal intestine that is consistent with exposure to foreign antigens.
Abstract: Purpose: The tumor immune microenvironment determines clinical outcome. Whether the original tissue in which a primary tumor develops influences this microenvironment is not well understood. Experimental Design: We applied high-dimensional single-cell mass cytometry (CyTOF) analysis and functional studies to analyze immune cell populations in human papillomavirus (HPV)-induced primary tumors of the cervix (CxCa) and oropharynx (OPSCC). Results: Despite the same etiology of these tumors, the composition and functionality of their lymphocytic infiltrate substantially differed. CxCa displayed a 3-fold lower CD4:CD8 ratio, contained more activated CD8+CD103+CD161+ effector T-cells and less CD4+CD161+ effector memory T-cells than OPSCC. CD161+ effector cells produced the highest cytokine levels among tumor-specific T-cells. Differences in CD4+ T-cell infiltration between CxCa and OPSCC were reflected in the detection rate of intratumoral HPV-specific CD4+ T-cells and in their impact on OPSCC and CxCa survival. The PBMC composition of these patients, however, was similar. Conclusions: The tissue of origin significantly impacts the overall shape of the immune infiltrate in primary tumors.
Abstract: Multi-parametric flow and mass cytometry allows exceptional high-resolution exploration of the cellular composition of the immune system. A large panel of computational tools have been developed to analyze the high-dimensional landscape of the data generated. Analysis frameworks such as FlowSOM or Cytosplore incorporate clustering and dimensionality reduction techniques and include algorithms allowing visualization of multi-parametric cytometric analysis. To additionally provide means to quantify specific cell clusters and correlations between samples, we developed an R-package, called cytofast, for further downstream analysis. Specifically, cytofast enables the visualization and quantification of cell clusters for an efficient discovery of cell populations associated with diseases or physiology. We used cytofast on mass and flow cytometry datasets based on the modulation of the immune system upon immunotherapy. With cytofast, we rapidly generated visual representations of group-related immune cell clusters and showed correlations with the immune system composition. We discovered macrophage subsets that significantly decrease upon cancer immunotherapy and distinct prime-boost effects of prophylactic vaccines on the myeloid compartment. Cytofast is a time-efficient tool for comprehensive cytometric analysis to reveal immune signatures and correlations.
Abstract: Auto-reactive CD8 T-cells play an important role in the destruction of pancreatic β-cells resulting in type 1 diabetes (T1D). However, the phenotype of these auto-reactive cytolytic CD8 T-cells has not yet been extensively described. We used high-dimensional mass cytometry to phenotype autoantigen- (pre-proinsulin), neoantigen- (insulin-DRIP) and virus- (cytomegalovirus) reactive CD8 T-cells in peripheral blood mononuclear cells (PBMCs) of T1D patients. A panel of 33 monoclonal antibodies was designed to further characterise these cells at the single-cell level. HLA-A2 class I tetramers were used for the detection of antigen-specific CD8 T-cells. Using a novel Hierarchical Stochastic Neighbor Embedding (HSNE) tool (implemented in Cytosplore), we identified 42 clusters within the CD8 T-cell compartment of three T1D patients and revealed profound heterogeneity between individuals, as each patient displayed a distinct cluster distribution. Single-cell analysis of pre-proinsulin, insulin-DRIP and cytomegalovirus-specific CD8 T-cells showed that the detected specificities were heterogeneous between and within patients. These findings emphasize the challenge to define the obscure nature of auto-reactive CD8 T-cells.
Abstract: A bipartite graph is a powerful abstraction for modeling relationships between two collections. Visualizations of bipartite graphs allow users to understand the mutual relationships between the elements in the two collections, e.g., by identifying clusters of similarly connected elements. However, commonly-used visual representations do not scale for the analysis of large bipartite graphs containing tens of millions of vertices, often resorting to an a-priori clustering of the sets. To address this issue, we present the Who's-Active-On-What-Visualization (WAOW-Vis) that allows for multiscale exploration of a bipartite social network without imposing an a-priori clustering. To this end, we propose to treat a bipartite graph as a high-dimensional space and we create the WAOW-Vis adapting the multiscale dimensionality-reduction technique HSNE. The application of HSNE for bipartite graph requires several modifications that form the contributions of this work. Given the nature of the problem, a set-based similarity is proposed. For efficient and scalable computations, we use compressed bitmaps to represent sets and we present a novel space partitioning tree to efficiently compute similarities; the Sets Intersection Tree. Finally, we validate WAOWVis on several datasets connecting Twitter-users and -streams in different domains: news, computer science and politics. We show how WAOW-Vis is particularly effective in identifying hierarchies of communities among social-media users.
Abstract: Innate lymphoid cells (ILCs) are abundant in mucosal tissues and involved in tissue homeostasis and barrier function. While several ILC subsets have been identified, it is unknown if additional heterogeneity exists and their differentiation pathways remain largely unclear. We applied mass cytometry to analyze ILCs in the human fetal intestine and distinguished 34 distinct clusters through a t-SNE-based analysis. A lineage (Lin)-CD7+CD127-CD45RO+CD56+ population clustered between the CD127+ ILC and natural killer (NK) cell subsets, and expressed diverse levels of Eomes, T-bet, GATA3 and RORγt. By visualizing the dynamics of the t-SNE computation, we identified smooth phenotypic transitions from cells within the LinCD7+CD127-CD45RO+CD56+ cluster to both the NK cells and CD127+ ILCs, revealing potential differentiation trajectories. In functional differentiation assays the LinCD7+CD127-CD45RO+CD56+ CD8a cells could develop into CD45RA+ NK cells and CD127+ RORγt+ ILC3-like cells. Thus, we identified a previously unknown intermediate innate subset that can differentiate into ILC3 and NK cells.
Abstract: Technological advances in mass spectrometry imaging (MSI) have contributed to growing interest in 3D MSI. However, the large size of 3D MSI data sets has made their efficient analysis and visualization and the identification of informative molecular patterns computationally challenging. Hierarchical stochastic neighbor embedding (HSNE), a nonlinear dimensionality reduction technique that aims at finding hierarchical and multiscale representations of large data sets, is a recent development that enables the analysis of millions of data points, with manageable time and memory complexities. We demonstrate that HSNE can be used to analyze large 3D MSI data sets at full mass spectral and spatial resolution. To benchmark the technique as well as demonstrate its broad applicability, we have analyzed a number of publicly available 3D MSI data sets, recorded from various biological systems and spanning different mass-spectrometry ionization techniques. We demonstrate that HSNE is able to rapidly identify regions of interest within these large high-dimensionality data sets as well as aid the identification of molecular ions that characterize these regions of interest; furthermore, through clearly separating measurement artifacts, the HSNE analysis exhibits a degree of robustness to measurement batch effects, spatially correlated noise, and mass spectral misalignment.
Abstract: The relationship between human cytomegalovirus (HCMV) infections and accelerated immune senescence is controversial. Whereas some studies reported a CMV-associated impaired capacity to control heterologous infections at old age other studies could not confirm this. We hypothesized that these discrepancies might relate to the variability in the infectious dose of CMV occurring in real life. Here, we investigated the influence of persistent CMV infection on immune perturbations and specifically addressed the role of the infectious dose on the contribution of CMV to accelerated immune senescence. We show in experimental mouse models that the degree of mouse CMV (MCMV)-specific memory CD8+ T cell accumulation and the phenotypic T cell profile are directly influenced by the infectious dose, and data on HCMV-specific T cells indicate a similar connection. Detailed cluster analysis of the memory CD8+ T cell development showed that high dose infection causes a differentiation pathway that progresses faster throughout the life-span of the host, suggesting a virus-host balance that is influenced by aging and infectious dose. Importantly, short-term MCMV infection in adult mice is not disadvantageous for heterologous superinfection with lymphocytic choriomeningitis virus (LCMV). However, following long-term CMV infection the strength of the CD8+ T cell immunity to LCMV superinfection was affected by the initial CMV infectious dose, wherein a high infectious dose was found to be a prerequisite for impaired heterologous immunity. Altogether our results underscore the importance of stratification based on the size and differentiation of the CMV-specific memory T cell pools for the impact on immune senescence, and indicate that reduction of the latent/lytic viral load can be beneficial to diminish CMV-associated immune senescence.
Abstract: Deep neural networks are now rivaling human accuracy in several pattern recognition problems. Compared to traditional classifiers, where features are handcrafted, neural networks learn increasingly complex features directly from the data. Instead of handcrafting the features, it is now the network architecture that is manually engineered. The network architecture parameters such as the number of layers or the number of filters per layer and their interconnections are essential for good performance. Even though basic design guidelines exist, designing a neural network is an iterative trial-and-error process that takes days or even weeks to perform due to the large datasets used for training. In this paper, we present DeepEyes, a Progressive Visual Analytics system that supports the design of neural networks during training. We present novel visualizations, supporting the identification of layers that learned a stable set of patterns and, therefore, are of interest for a detailed analysis. The system facilitates the identification of problems, such as superfluous filters or layers, and information that is not being captured by the network. We demonstrate the effectiveness of our system through multiple use cases, showing how a trained network can be compressed, reshaped and adapted to different problems.
Abstract: Single-cell analysis through mass cytometry has become an increasingly important tool for immunologists to study the immune system in health and disease. Mass cytometry creates a high-dimensional description vector for single cells by time-of-flight measurement. Recently, t-Distributed Stochastic Neighborhood Embedding (t-SNE) has emerged as one of the state-of-the-art techniques for the visualization and exploration of single-cell data. Ever increasing amounts of data lead to the adoption of Hierarchical Stochastic Neighborhood Embedding (HSNE), enabling the hierarchical representation of the data. Here, the hierarchy is explored selectively by the analyst, who can request more and more detail in areas of interest. Such hierarchies are usually explored by visualizing disconnected plots of selections in different levels of the hierarchy. This poses problems for navigation, by imposing a high cognitive load on the analyst. In this work, we present an interactive summary-visualization to tackle this problem. CyteGuide guides the analyst through the exploration of hierarchically represented single-cell data, and provides a complete overview of the current state of the analysis. We conducted a two-phase user study with domain experts that use HSNE for data exploration. We first studied their problems with their current workflow using HSNE and the requirements to ease this workflow in a field study. These requirements have been the basis for our visual design. In the second phase, we verified our proposed solution in a user evaluation.
Abstract: Mass cytometry allows high-resolution dissection of the cellular composition of the immune system. However, the high-dimensionality, large size, and non-linear structure of the data poses considerable challenges for data analysis. In particular, dimensionality reduction-based techniques like t-SNE offer single-cell resolution but are limited in the number of cells that can be analysed. Here we introduce Hierarchical Stochastic Neighbor Embedding (HSNE) for the analysis of mass cytometry datasets. HSNE constructs a hierarchy of non-linear similarities that can be interactively explored with a stepwise increase in detail up to the single-cell level. We applied HSNE to a study on gastrointestinal disorders and three other available mass cytometry datasets. We found that HSNE efficiently replicates previous observations and identifies rare cell populations that were previously missed due to downsampling. Thus, HSNE removes the scalability limit of conventional t-SNE analysis, a feature that makes it highly suitable for the analysis of massive high-dimensional datasets.
Selected talk @ BioVis/ISMB 2018
Abstract: Diffusion Tensor Imaging (DTI) group studies often require the comparison of two groups of 3D diffusion tensor fields. The total number of datasets involved in the study and the multivariate nature of diffusion tensors together make this a challenging process. The traditional approach is to reduce the six-dimensional diffusion tensor to some scalar quantities, which can be analyzed with univariate statistical methods, and visualized with standard techniques such as slice views. However, this provides merely part of the whole story due to information reduction. If to take the full tensor information into account, only few methods are available, and they focus on the analysis of a single group, rather than the comparison of two groups. Simultaneously comparing two groups of diffusion tensor fields by simple juxtaposition or superposition is rather impractical. In this work, we extend previous work to visually compare two groups of diffusion tensor fields. To deal with the wealth of information, the comparison is carried out at multiple levels of detail. In the 3D spatial domain, we propose a details on demand glyph representation to support the visual comparison of the tensor ensemble summary information in a progressive manner. The spatial view guides analysts to select voxels of interest. Then at the detail level, the respective original tensor ensembles are compared in terms of tensor intrinsic properties, with special care taken to reduce visual clutter. We demonstrate the usefulness of our visual analysis system by comparing a control group and an HIV positive patient group.
Best Paper Award
Abstract: Spatial and temporal brain transcriptomics has recently emerged as an invaluable data source for molecular neuroscience. The complexity of such data poses considerable challenges for analysis and visualization. We present BrainScope: a web portal for fast, interactive visual exploration of the Allen Atlases of the adult and developing human brain transcriptome. Through a novel methodology to explore high-dimensional data (dual t-SNE), BrainScope enables the linked, all-in-one visualization of genes and samples across the whole brain and genome, and across developmental stages. We show that densities in t-SNE scatter plots of the spatial samples coincide with anatomical regions, and that densities in t-SNE scatter plots of the genes represent gene co-expression modules that are significantly enriched for biological functions. We also show that the topography of the gene t-SNE maps reflect brain region-specific gene functions, enabling hypothesis and data driven research. We demonstrate the discovery potential of BrainScope through three examples: (i) analysis of cell type specific gene sets, (ii) analysis of a set of stable gene co-expression modules across the adult human donors and (iii) analysis of the evolution of co-expression of oligodendrocyte specific genes over developmental stages.
Selected for a highlight talk @ BioVis 2017
Abstract: Progressive Visual Analytics aims at improving the interactivity in existing analytics techniques by means of visualization as well as interaction with intermediate results. One key method for data analysis is dimensionality reduction, for example, to produce 2D embeddings that can be visualized and analyzed efficiently. t-Distributed Stochastic Neighbor Embedding (tSNE) is a well-suited technique for the visualization of several high-dimensional data. tSNE can create meaningful intermediate results but suffers from a slow initialization that constrains its application in Progressive Visual Analytics. We introduce a controllable tSNE approximation (A-tSNE), which trades off speed and accuracy, to enable interactive data exploration. We offer real-time visualization techniques, including a density-based solution and a Magic Lens to inspect the degree of approximation. With this feedback, the user can decide on local refinements and steer the approximation level during the analysis. We demonstrate our technique with several datasets, in a real-world research scenario and for the real-time analysis of high-dimensional streams to illustrate its effectiveness for interactive data analysis.
Abstract: A diffusion tensor imaging group study consists of a collection of volumetric diffusion tensor datasets (i.e., an ensemble) acquired from a group of subjects. The multivariate nature of the diffusion tensor imposes challenges on the analysis and the visualization. These challenges are commonly tackled by reducing the diffusion tensors to scalar-valued quantities that can be analyzed with common statistical tools. However, reducing tensors to scalars poses the risk of losing intrinsic information about the tensor. Visualization of tensor ensemble data without loss of information is still a largely unsolved problem. In this work, we propose an overview + detail visualization to facilitate the tensor ensemble exploration. We define an ensemble representative tensor and variations in terms of the three intrinsic tensor properties (i.e., scale, shape, and orientation) separately. The ensemble summary information is visually encoded into the newly designed aggregate tensor glyph which, in a spatial layout, functions as the overview. The aggregate tensor glyph guides the analyst to interesting areas that would need further detailed inspection. The detail views reveal the original information that is lost during aggregation. It helps the analyst to further understand the sources of variation and formulate hypotheses. To illustrate the applicability of our prototype, we compare with most relevant previous work through a user study and we present a case study on the analysis of a brain diffusion tensor dataset ensemble from healthy volunteers.
Abstract: Inflammatory intestinal diseases are characterized by abnormal immune responses and affect distinct locations of the gastrointestinal tract. Although the role of several immune subsets in driving intestinal pathology has been studied, a system-wide approach that simultaneously interrogates all major lineages on a single-cell basis is lacking. We used high-dimensional mass cytometry to generate a system-wide view of the human mucosal immune system in health and disease. We distinguished 142 immune subsets and through computational applications found distinct immune subsets in peripheral blood mononuclear cells and intestinal biopsies that distinguished patients from controls. In addition, mucosal lymphoid malignancies were readily detected as well as precursors from which these likely derived. These findings indicate that an integrated high-dimensional analysis of the entire immune system can identify immune subsets associated with the pathogenesis of complex intestinal disorders. This might have implications for diagnostic procedures, immune-monitoring, and treatment of intestinal diseases and mucosal malignancies.
LUMC Best Article Prize 2016 (non clinical)
Abstract: In recent years, dimensionality-reduction techniques have been developed and are widely used for hypothesis generation in Exploratory Data Analysis. However, these techniques are confronted with overcoming the trade-off between computation time and the quality of the provided dimensionality reduction. In this work, we address this limitation, by introducing Hierarchical Stochastic Neighbor Embedding (Hierarchical-SNE). Using a hierarchical representation of the data, we incorporate the well-known mantra of Overview-First, Details-On-Demand in non-linear dimensionality reduction. First, the analysis shows an embedding, that reveals only the dominant structures in the data (Overview). Then, by selecting structures that are visible in the overview, the user can filter the data and drill down in the hierarchy. While the user descends into the hierarchy, detailed visualizations of the high-dimensional structures will lead to new insights. In this paper, we explain how Hierarchical-SNE scales to the analysis of big datasets. In addition, we show its application potential in the visualization of Deep-Learning architectures and the analysis of hyperspectral images.
Abstract: To understand how the immune system works, one needs to have a clear picture of its cellular compositon and the cells’ corresponding properties and functionality. Mass cytometry is a novel technique to determine the properties of single-cells with unprecedented detail. This amount of detail allows for much finer differentiation but also comes at the cost of more complex analysis. In this work, we present Cytosplore, implementing an interactive workflow to analyze mass cytometry data in an integrated system, providing multiple linked views, showing different levels of detail and enabling the rapid definition of known and unknown cell types. Cytosplore handles millions of cells, each represented as a high-dimensional data point, facilitates hypothesis generation and confirmation, and provides a significant speed up of the current workflow. We show the effectiveness of Cytosplore in a case study evaluation.
Abstract: Hydrocarbon reservoir simulation models produce large amounts of heterogeneous data, combining multiple variables of different dimensionality, such as two or three-dimensional geospatial estimates with abstract estimates simulated for the complete field or different wells. In addition these simulations are nowadays often run as so-called ensemble simulations, to capture uncertainty of the model, as well as boundary conditions as variation in the output. The (visual) analysis of such data is a challenging process, due to the size and complexity of the data. In this paper we present an integrated system for the visual analysis of ensemble reservoir simulation data. We provide tools to inspect forecasts for multiple variables of complete fields, as well as different wells. Finally, we present a case study highlighting the effectiveness of the presented system.
Abstract: We present a novel integrated visualization system that enables the interactive visual analysis of ensemble simulations and estimates of the sea surface height and other model variables that are used for storm surge prediction. Coastal inundation, caused by hurricanes and tropical storms, pose large risks for todays societies. High-fidelity numerical models of water levels driven by hurricane-force winds are required to predict these events, posing a challenging computational problem and even though computational models continue to improve, uncertainties in storm surge forecasts are inevitable. Today this uncertainty is often exposed to the user by running the simulation many times with different parameters or inputs following a Monte-Carlo framework in which uncertainties are represented as stochastic quantities. This results in multidimensional, multivariate and multivalued data, so-called ensemble data. While the resulting datasets are very comprehensive, they are also huge in size and thus hard to visualize and interpret. In this paper we tackle this problem by means of an interactive and integrated visual analysis system. By harnessing the power of modern graphics processing units (GPUs) for visualization as well as computation, our system allows the user to browse through the simulation ensembles in real-time, view specific parameter settings or simulation models and move between different spatial or temporal regions without delay. In addition our system provides advanced visualizations to highlight the uncertainty, or show the complete distribution of the simulations at user-defined positions over the complete time series of the prediction. We highlight the benefits of our system by presenting its application in a real world scenario using a simulation of Hurricane Ike.
Abstract: Ocean forecasts nowadays are created by running ensemble simulations in combination with data assimilation techniques. Most of these techniques resample the ensemble members after each assimilation cycle. This means that in a time series, after resampling, every member can follow up on any of the members before resampling. Tracking behavior over time, such as all possible paths of a particle in an ensemble vector field, becomes very difficult, as the number of combinations rises exponentially with the number of assimilation cycles. In general a single possible path is not of interest but only the probabilities that any point in space might be reached by a particle at some point in time. In this work we present an approach using probability-weighted piecewise particle trajectories to allow such a mapping interactively, instead of tracing quadrillions of individual particles. We achieve interactive rates by binning the domain and splitting up the tracing process into the individual assimilation cycles, so that particles that fall into the same bin after a cycle can be treated as a single particle with a larger probability as input for the next time step. As a result we loose the possibility to track individual particles, but can create probability maps for any desired seed at interactive rates.
Abstract: We present a novel integrated visualization system that enables interactive visual analysis of ensemble simulations of the sea surface height that is used in ocean forecasting. The position of eddies can be derived directly from the sea surface height and our visualization approach enables their interactive exploration and analysis. The behavior of eddies is important in different application settings of which we present two in this paper. First, we show an application for interactive planning of placement as well as operation of off-shore structures using real-world ensemble simulation data of the Gulf of Mexico. Off-shore structures, such as those used for oil exploration, are vulnerable to hazards caused by eddies, and the oil and gas industry relies on ocean forecasts for efficient operations. We enable analysis of the spatial domain, as well as the temporal evolution, for planning the placement and operation of structures. Eddies are also important for marine life. They transport water over large distances and with it also heat and other physical properties as well as biological organisms. In the second application we present the usefulness of our tool, which could be used for planning the paths of autonomous underwater vehicles, so called gliders, for marine scientists to study simulation data of the largely unexplored Red Sea.
Abstract: We present a novel integrated visualization system that enables interactive visual analysis of ensemble simulations used in ocean forecasting, i.e, simulations of sea surface elevation. Our system enables the interactive planning of both the placement and operation of off-shore structures. We illustrate this using a real-world simulation of the Gulf of Mexico. Off-shore structures, such as those used for oil exploration, are vulnerable to hazards caused by strong loop currents. The oil and gas industry therefore relies on accurate ocean forecasting systems for planning their operations. Nowadays, these forecasts are based on multiple spatio-temporal simulations resulting in multidimensional, multivariate and multivalued data, so-called ensemble data. Changes in sea surface elevation are a good indicator for the movement of loop current eddies, and our visualization approach enables their interactive exploration and analysis. We enable analysis of the spatial domain, for planning the placement of structures, as well as detailed exploration of the temporal evolution at any chosen position, for the prediction of critical ocean states that require the shutdown of rig operations.
Honorable mention for best paper award
Abstract: Seismic interpretation is an important step in building subsurface models, which are needed to efficiently exploit fossil fuel reservoirs. However, seismic features are seldom unambiguous, resulting in a high degree of uncertainty in the extracted model. In this paper we present a novel system for the extraction, analysis and visualization of ensemble data of seismic horizons. By parameterizing the cost function of a global optimization technique for seismic horizon extraction, we can create ensembles of surfaces describing each horizon, instead of just a single surface. Our system also provides the tools for a complete statistical analysis of these data. Additionally, we allow an interactive exploration of the parameter space to help finding optimal parameter setting for the current dataset.
Abstract: The most important resources to fulfill today's energy demands are fossil fuels, such as oil and natural gas. When exploiting hydrocarbon reservoirs, a detailed and credible model of the subsurface structures is crucial in order to minimize economic and ecological risks. Creating such a model is an inverse problem: reconstructing structures from measured reflection seismics. The major challenge here is twofold: First, the structures in highly ambiguous seismic data are interpreted in the time domain. Second, a velocity model has to be built from this interpretation to match the model to depth measurements from wells. If it is not possible to obtain a match at all positions, the interpretation has to be updated, going back to the first step. This results in a lengthy back and forth between the different steps, or in an unphysical velocity model in many cases. This paper presents a novel, integrated approach to interactively creating subsurface models from reflection seismics. It integrates the interpretation of the seismic data using an interactive horizon extraction technique based on piecewise global optimization with velocity modeling. Computing and visualizing the effects of changes to the interpretation and velocity model on the depth-converted model on the fly enables an integrated feedback loop that enables a completely new connection of the seismic data in time domain and well data in depth domain. Using a novel joint time/depth visualization, depicting side-by-side views of the original and the resulting depth-converted data, domain experts can directly fit their interpretation in time domain to spatial ground truth data. We have conducted a domain expert evaluation, which illustrates that the presented workflow enables the creation of exact subsurface models much more rapidly than previous approaches.
Abstract: In this paper, a method for interactive direct volume rendering is proposed that computes ambient occlusion effects for visualizations that combine both volumetric and geometric primitives, specifically tube shaped geometric objects representing streamlines, magnetic field lines or DTI fiber tracts. The proposed algorithm extends the recently proposed Directional Occlusion Shading model to allow the rendering of those geometric shapes in combination with a context providing 3D volume, considering mutual occlusion between structures represented by a volume or geometry.
Invited for extended version in TVCG
Abstract: Increasing demands in world-wide energy consumption and oil depletion of large reservoirs have resulted in the need for exploring smaller and more complex oil reservoirs. Planning of the reservoir valorization usually starts with creating a model of the subsurface structures, including seismic faults and horizons. However, seismic interpretation and horizon tracing is a difficult and error-prone task, often resulting in hours of work needing to be manually repeated. In this paper, we propose a novel, interactive workflow for horizon interpretation based on well positions, which include additional geological and geophysical data captured by actual drillings. Instead of interpreting the volume slice-by-slice in 2D, we propose 3D seismic interpretation based on well positions. We introduce a combination of 2D and 3D minimal cost path and minimal cost surface tracing for extracting horizons with very little user input. By processing the volume based on well positions rather than slice-based, we are able to create a piecewise optimal horizon surface at interactive rates. We have integrated our system into a visual analysis platform which supports multiple linked views for fast verification, exploration and analysis of the extracted horizons. The system is currently being evaluated by our collaborating domain experts.
Abstract: This paper presents a novel method for interactive exploration of industrial CT volumes such as cast metal parts, with the goal of interactively detecting, classifying, and quantifying features using a visualization-driven approach. The standard approach for defect detection builds on region growing, which requires manually tuning parameters such as target ranges for density and size, variance, as well as the specification of seed points. If the results are not satisfactory, region growing must be performed again with different parameters. In contrast, our method allows interactive exploration of the parameter space, completely separated from region growing in an unattended pre-processing stage. The pre-computed *feature volume* tracks a *feature size curve* for each voxel over *time*, which is identified with the main region growing parameter such as variance. A novel 3D transfer function domain over *(density, feature size, time)* allows for interactive exploration of feature classes. Features and feature size curves can also be explored individually, which helps with transfer function specification and allows coloring individual features and disabling features resulting from CT artifacts. Based on the classification obtained through exploration, the classified features can be quantified immediately.
Abstract: High-dimensional mass cytometry (CyTOF) permits the simultaneous measurement of many cellular markers, providing a system-wide view of immune phenotypes at the single-cell level1. Yet, the maximum number of markers that can be measure simultaneously is limited to ~50 due to several technical challenges. We propose a new method to integrate CyTOF data from several marker panels that include an overlapping set of markers, allowing for a deeper interrogation of the cellular composition of the immune system.
Abstract: Mass cytometry allows high-resolution dissection of the cellular composition of the immune system. However, the high-dimensionality, large size, and non-linear structure of the data poses considerable challenges for data analysis. We introduce Hierarchical Stochastic Neighbor Embedding (HSNE) for single-cell analysis, a computational approach that constructs a hierarchy of non-linear similarities, allowing the analysis of millions of cells via different levels of detail up to single-cell resolution within minutes. We integrated HSNE into the Cytosplore +HSNE framework to facilitate interactive exploration and analysis of the hierarchy by a set of corresponding two-dimensional plots with stepwise increase in detail up to the single-cell level. This divide and conquer approach minimizes computation time and, thereby, allows efficient and interactive visualization. We validated the discovery potential of Cytosplore+HSNE by re-analyzing a recent study on gastrointestinal disorders as well as two other publicly available mass cytometry datasets. We found that Cytosplore+HSNE efficiently identifies both abundant and rare cell populations, without resorting to downsampling of the data, including rare cell populations that were missed in a previous analysis due to downsampling. Taken together, Cytosplore +HSNE offers unprecedented possibilities for visual exploration and analysis of millions of cells measured in mass cytometry studies.
Abstract: Despite its importance to the world community for a variety of socio-economical reasons and the presence of extensive coral reef gardens along its shores, the Red Sea remains one of the most under-studied large marine physical and biological systems in the global ocean. We present our efforts to build advanced modeling, data assimilation, and uncertainty quantification capabilities for the Red Sea, which is part of the newly established Saudi ARAMCO Marine Environmental Research Center aiming at studying and forecasting the ...
Abstract: This work presents a new workflow for the interpretation of seismic volume data, as well as a novel approach to interactively tracing seismic horizons. Instead of interpreting the seismic cube slice by slice, in the proposed workflow interpretation is performed on the planes connecting wells that have been drilled. Thereby the additional data provided by the well logs can easily be used during the interpretation process. Instead of manually picking the seismic horizon, we propose an algorithm which uses numerical integration over a vector field computed with diffusion tensors for automatic tracing, based on a user-defined seed point.
Abstract: The most important resources to fulfill today's energy demands are fossil fuels, such as oil and natural gas. When exploiting hydrocarbon reservoirs, a detailed and credible model of the subsurface structures to plan the path of the borehole, is crucial in order to minimize economic and ecological risks. Before that, the placement, as well as the operations of oil rigs need to be planned carefully, as off-shore oil exploration is vulnerable to hazards caused by strong currents. The oil and gas industry therefore relies on accurate ocean forecasting systems for planning their operations. This thesis presents visual workflows for creating subsurface models as well as planning the placement and operations of off-shore structures. Creating a credible subsurface model poses two major challenges: First, the structures in highly ambiguous seismic data are interpreted in the time domain. Second, a velocity model has to be built from this interpretation to match the model to depth measurements from wells. If it is not possible to obtain a match at all positions, the interpretation has to be updated, going back to the first step. This results in a lengthy back and forth between the different steps, or in an unphysical velocity model in many cases. We present a novel, integrated approach to interactively creating subsurface models from reflection seismics, by integrating the interpretation of the seismic data using an interactive horizon extraction technique based on piecewise global optimization with velocity modeling. Computing and visualizing the effects of changes to the interpretation and velocity model on the depth-converted model, on the fly enables an integrated feedback loop that enables a completely new connection of the seismic data in time domain, and well data in depth domain. For planning the operations of off-shore structures we present a novel integrated visualization system that enables interactive visual analysis of ensemble simulations used in ocean forecasting, i.e, simulations of sea surface elevation. Changes in sea surface elevation are a good indicator for the movement of loop current eddies. Our visualization approach enables their interactive exploration and analysis. We enable analysis of the spatial domain, for planning the placement of structures, as well as detailed exploration of the temporal evolution at any chosen position, for the prediction of critical ocean states that require the shutdown of rig operations. We illustrate this using a real-world simulation of the Gulf of Mexico.
Abstract: Using programmable graphics hardware (GPU) is the de-facto standard for real time volume rendering nowadays. In addition to that, GPUs are often used for non-graphical tasks to accelerate complex computations and also allow the direct rendering of (intermediate) results. However, the amount of graphics memory can become a problem when working with large volume datasets. Even though todays graphics hardware provides more memory than ever before, the amount of data is also increasing rapidly. In order to overcome this, lots of compression algorithms have been developed and some of them even are hardware-accelerated. As these implementations only support a small range of formats and often do not provide sufficient quality, custom algorithms have been implemented which often utilize shader programs for decoding and encoding. While this has proven useful for visualization, providing interactive framerates for direct volume rendering, the algorithms focus on displaying the data, not processing it. In this thesis different compression techniques are compared with focus on their suitability for processing in the compression domain. A wavelet transform based compression scheme is implemented which allows lossless as well as lossy compression of volume data. Image processing operations are classified based on their applicability in the wavelet compression domain. Based on this classification different image operations are exemplarily implemented. Furthermore for visualization multi-planar reconstruction directly from the compressed data is presented. The results of this thesis are compared to processing in the spatial domain, showing advantages and shortcomings. Concluding an outlook on possible future work is given.