Vale, T., R. J. Dias, J. A. Silva, and J. M. Lourenço, "Execução concorrente e determinista de transações", Proceedings of INForum Simpósio de Informática, Covilhã, Portugal, 2015. Abstractinforum15-pot.pdf

Neste artigo apresentamos um protocolo de controlo de concorrência que garante que a execução concorrente de transações é equivalente à sua execução sequencial por uma ordem predefinida. Isto permite executar programas que usam transações de forma determinista. O protocolo (1) permite, pela primeira vez, a execução determinista de programas que usam memória transacional por hardware; e (2) garante a execução determinista de programas que usam memória transacional por software com um desempenho claramente superior ao estado da arte.

Cunha, J. C., P. D. Medeiros, V. Duarte, J. Lourenço, and C. Gomes, "An Experience in Building a Parallel and Distributed Problem-Solving Environment", Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'99): CSREA Press, pp. 1804–1809, 1999. Abstractpdpta99.pdf

We describe our experimentation with the design and implementation of specific environments, consisting of heterogeneous computational, visualization, and control components. We illustrate the approach with the design of a problem–solving environment supporting the execution of genetic algorithms. We describe a prototype supporting parallel execution, visualization, and steering. A life cycle for the development of applications based on genetic algorithms is proposed.

Lourenço, J. M., and J. C. Cunha, "Fiddle: A Flexible Distributed Debugging Architecture", Proceedings of the International Conference on Computational Science-Part II, London, UK, Springer-Verlag, pp. 821–830, 2001. Abstracticcs01.pdf

In the recent past, multiple techniques and tools have been proposed and contributed to improve the distributed debugging functionalities, in several distinct aspects, such as handling the non-determinism, allowing cyclic interactive debugging of parallel programs, and providing more user-friendly interfaces. However, most of these tools are tied to a specific programming language and provide rigid graphical user interfaces. So they cannot easily adapt to support distinct abstraction levels or user interfaces. They also don't provide adequate support for cooperation with other tools in a software engineering environment. In this paper we discuss several dimensions which may contribute to develop more flexible distributed debuggers. We describe Fiddle, a distributed debugging tool which aims at overcoming some of the above limitations.

Cunha, J. C., J. M. Lourenço, J. Vieira, B. Moscão, and D. Pereira, "A Framework to Support Parallel and Distributed Debugging", Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking (HPCN'98), London, UK, Springer-Verlag, pp. 708–717, 1998. Abstracthpcn98.pdf

We discuss debugging prototypes that can easily support new functionalities, depending on the requirements of high-level computational models, and allowing a coherent integration with other tools in a software engineering environment. Concerning the first aspect, we propose a framework that identifies two distinct levels of functionalities that should be supported by a parallel and distributed debugger using: a process and thread-level, and a coordination level concerning sets of processes or threads. An incremental approach is used to effectively develop prototypes that support both functionalities. Concerning the second aspect, we discuss how the interfacing with other tools has influenced the design of a process-level debugging interface (PDBG) and a distributed monitoring and control layer called (DAMS).

Orosa, L., and J. M. Lourenço, "Hardware Approach for Detecting, Exposing and Tolerating High Level Atomicity Violations", Proceedings of Joint Euro-TM/MEDIAN Workshop on Dependable Multicore and Transactional Memory Systems, Vienna, Austria, jan, 2014. Abstractdmtm-2014-lorosa.pdf

In this paper we address a solution for detecting and tolerating one of the most typical concurrency bugs: atomicity violations. More specifically, we address High-Level Atomicity Violations (HLAV). High-level atomicity violations result from the misspecification of the scope of an atomic block, by splitting it in two or more atomic blocks which may be interleaved with other atomic blocks. Figure 1 shows an example of this type of atomicity violation. The intuitive idea behind HLAV is that if two shared data items (e.g., memory locations) were both accessed inside an atomic block, they are interrelated and probably the programmer intention is that there shall be no interleavings between these two accesses. Therefore, if (in the same program) this two addresses are accessed separately in different atomic blocks, an unfortunate interleaving may cause an atomicity violation.

Duro, N., R. Santos, J. M. Lourenço, H. Paulino, and J. Martins, "Open Virtualization Framework for Testing Ground Systems", Proceedings of the 8th Workshop on Parallel and Distributed Systems (PADTAD'10), New York, NY, USA, ACM, pp. 67–73, 2010. Abstractpadtad-duro-2010.pdf

The recent developments in virtualization change completely the panorama of the Hardware/OS deployment. New bottlenecks arise in the deployment of application stacks, where IT industry will spend most of the time to assure automation. VIRTU tool aims at managing, configuring and testing distributed ground applications of space systems on a virtualized environment, based on open tools and cross virtualization support. This tool is a spin-off of previous activities performed by the European Space Operations Center (ESOC) and thus it covers the original needs from the ground data systems infrastructure division of the European Space Agency. VIRTU is a testing oriented solution. Its ability to group several virtual machines in an assembly provides the means to easily deploy a full testing infrastructure, including the client/server relationships. The possibility of making on-demand request of the testing infrastructure will provide some infrastructure optimizations, specially having in mind that ESA maintains Ground Control software of various missions, and each mission cam potentially have a different set of System baselines and last up to 15 years. The matrix array of supported system combinations is therefore enormous and any improvement on the process provides substantial benefits to ESA, by reducing the effort and schedule of each maintenance activity. The ESOC's case study focuses on the development and validation activities of infrastructure or mission Ground Systems solutions. The Ground Systems solutions are typically composed of distributed systems that could take advantage of virtualized environments for testing purposes. Virtualization is used as way to optimize maintenance for tasks such as testing new releases and patches, test different system's configurations and replicate tests. The main benefits identified are related to deployment test environment and the possibility to have on-demand infrastructure.

Lourenço, J. M., and J. C. Cunha, "The PDBG Process-Level Debugger for Parallel and Distributed Programs", Proceedings of the SIGMETRICS symposium on Parallel and Distributed Tools, New York, NY, USA, ACM, pp. 154, 1998. Abstractspdt-asbt98.pdf

In this paper we discuss several issues concerning the design and implementation of a debugger for parallel and distributed applications. This debugger uses a client-server approach to isolate the debugging user-interface from the debugging services, by way of a two-level structured approach: the component-level to observe and act upon individual processes; and the coordination-level to observe the interprocess relations and act upon them.

Pessanha, V., R. J. Dias, J. M. Lourenço, E. Farchi, and D. Sousa, "Practical verification of high-level dataraces in transactional memory programs", Proceedings of 9th the Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging, New York, NY, USA, ACM, pp. 26–34, July, 2011. Abstractisstaws11padtad-4-pessanha.pdf

In this paper we present MoTh, a tool that uses static analysis to enable the automatic verification of concurrency anomalies in Transactional Memory Java programs. Currently MoTh detects high-level dataraces and stale-value errors, but it is extendable by plugging-in sensors, each sensor implementing an anomaly detecting algorithm. We validate and benchmark MoTh by applying it to a set of well known concurrent buggy programs and by close comparison of the results with other similar tools. The results achieved so far are very promising, yielding good accuracy while triggering only a very limited number of false warnings.

Dias, R. J., V. Pessanha, and J. M. Lourenço, "Precise Detection of Atomicity Violations", Haifa Verification Conference, Haifa, Israel, Springer Berlin / Heidelberg, Nov 2012. Abstracthvc2012.pdf

Concurrent programs that are free of unsynchronized ac- cesses to shared data may still exhibit unpredictable concurrency errors called atomicity violations, which include both high-level dataraces and stale-value errors. Atomicity violations occur when programmers make wrong assumptions about the atomicity scope of a code block, incorrectly splitting it in two or more atomic blocks and allow them to be interleaved with other atomic blocks. In this paper we propose a novel static analysis algorithm that works on a dependency graph of program variables and detects both high-level dataraces and stale-value errors. The algorithm was implemented for a Java Bytecode analyzer and its effectiveness was evaluated with some well known faulty programs. The results obtained show that our algorithm performs better than previous approaches, achieving higher precision for small and medium sized programs, making it a good basis for a practical tool.

Sousa, D. G., C. Ferreira, and J. M. Lourenço, "Prevenção de Violações de Atomicidade usando Contractos", Proceedings of INForum Simpósio de Informática, Lisbon, Portugal, Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa, pp. 190–201, sep, 2013. Abstractinforum2013-sousa.pdf

A programação concorrente obriga o programador a sincronizar os acessos concorrentes a regiões de memória partilhada, contudo esta abordagem não é suficiente para evitar todas as anomalias que podem ocorrer num cenário concorrente. Executar uma sequência de operações atómicas pode causar violações de atomicidade se existir uma correlação entre essas operações, devendo o programador garantir que toda a sequência de operações é executada atomicamente. Este problema é especialmente comum quando se usam operações de pacotes ou módulos de terceiros, pois o programador pode identificar incorretamente o âmbito das regiões de código que precisam de ser atómicas para garantir o correto comportamento do programa. Para evitar este problema o programador do módulo pode criar um contrato que especifica quais as sequências de operações do módulo que devem ser sempre executadas de forma atómica. Este trabalho apresenta uma análise estática para verificação destes contratos.

Lourenço, J. M., and J. C. Cunha, "Replaying Distributed Applications with RPVM", Proceeding of the 2nd Austrian-Hungarian Workshop on Distributed and Parallel Systems (DAPSYS'98): University of Vienna, 1998. Abstractdapsys98.pdf

Parallel debugging is complex and difficult. Complex because the programmer has to deal with multiple program flows and process interactions, and difficult due to the very limited choice on effective and easy-to-use debugging tools for parallel programming. Simple and necessary features for parallel debugging are absent even from commercial debuggers, such as a record-replay feature, that allows to re-execute multiple times a parallel application assuring that during each re-execution the internal race conditions are solved in the same way they were in the first time. Some work has been done on record-replay techniques for parallel and distributed applications, but just a few have been applied to specific systems (such as PVM or MPI), and even less have produced working prototypes. In this paper we describe a method designed to work with the PVM system and how it was implemented to provide a working prototype.

Martins, H. R. L., J. Soares, J. M. Lourenço, and N. Preguiça, "Replicação Multi-nível de Bases de Dados em Memória", Proceedings of INForum Simpósio de Informática, Lisbon, Portugal, Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa, pp. 190–201, sep, 2013. Abstractinforum2013-martins.pdf

Os serviços Web são frequentemente suportados por sistemas com uma arquitetura em camadas, sendo utilizadas bases de dados relacionais para armazenamento dos dados. A replicação dos diversos componentes tem sido uma das formas utilizadas para obter melhorarias de escalabilidade destes serviços. Adicionalmente, a utilização de bases de dados em memória permite alcançar um desempenho mais elevado. No entanto é conhecida a fraca escalabilidade das bases de dados com o número de núcleos em máquinas multi-núcleo. Neste artigo propomos uma nova abordagem para lidar com este problema, intitulada MacroDDB. Utilizando uma solução de replicação hierárquica, a nossa proposta, replica a base da dados em vários nós, sendo que cada nó, por sua vez, executa um conjunto de réplicas da base de dados. Esta abordagem permite assim lidar com a falta de escalabilidade das bases de dados relacionais em máquinas multi-núcleo, o que por sua vez melhora a escalabilidade geral dos serviços.

Silva, J. A., T. M. Vale, J. M. Lourenço, and H. Paulino, "Replicação Parcial com Memória Transacional Distribuída", Proceedings of INForum Simpósio de Informática, Lisbon, Portugal, Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa, pp. 310–321, 2013. Abstractinforum13-silva.pdf

Os sistemas de memória transacional distribuída atuais recorrem essencialmente à distribuição ou à replicação total para distribuir os seus dados pelos múltiplos nós do sistema. No entanto, estas estratégias de replicação de dados apresentam limitações. A distribuição não oferece tolerância a falhas e a replicação total limita a capacidade de armazenamento do sistema. Nesse contexto, a replicação parcial de dados surge como uma solução intermédia, que combina o melhor das duas anteriores com o intuito de mitigar as suas desvantagens. Esta estratégia tem sido explorada no contexto das bases de dados distribuídas, mas tem sido pouco abordada no contexto da memória transacional e, tanto quanto sabemos, nunca antes tinha sido incorporada num sistema de memória transacional distribuída para uma linguagem de propósito geral. Assim, neste artigo propomos e avaliamos uma infraestrutura para replicação parcial de dados para programas Java bytecode, que foi desenvolvida com base num sistema já existente de memória transacional distribuída. A modularidade da infraestrutura que apresentamos permite a implementação de múltiplos algoritmos e, por conseguinte, avaliar em que contextos de utilização (workloads, número de nós, etc.) a replicação parcial se apresenta como uma alternativa viável a outras estratégias de replicação de dados.

Dias, R. J., J. Seco, and J. M. Lourenço, "Snapshot Isolation Anomalies Detection in Software Transactional Memory", Proceedings of INForum Simpósio de Informática (InForum 2010), Braga, Portugal, Universidade do Minho, 2010. AbstractINForum-dias-2010.pdf

Some performance issues of transactional memory are caused by unnecessary abort situations where non serializable and yet non conflicting transactions are scheduled to execute concurrently. Smartly relaxing the isolation properties of transactions may overcome these issues and attain considerable performance improvements. However, it is known that relaxing isolation restrictions may lead to runtime anomalies. In some situations, like database management systems, developers may choose that compromise, hence avoiding anomalies explicitly. Memory transactions protect the state of the program, therefore execution anomalies may have more severe consequences in the semantics of programs. So, the compromise between a relaxed isolation strategy and enforcing the necessary program correctness is harder to setup. The solution we devise is to statically analyse programs to detect the kind of anomalies that emerge under snapshot isolation. Our approach allows a compiler to either warn the developer about the possible snapshot isolation anomalies in a given program, or possibly inform automatic correctness strategies to ensure Serializability.

Soares, J., J. M. Lourenço, and N. Preguiça, "Software Component Replication for Improved Fault-Tolerance: Can Multicore Processors Make It Work?", Dependable Computing, vol. 7869: Springer Berlin Heidelberg, pp. 173-180, 2013. Abstractewdc2013.pdf


Teixeira, B., J. M. Lourenço, and D. Sousa, "A Static Approach for Detecting Concurrency Anomalies in Transactional Memory", Proceedings of INForum Simpósio de Informática (InForum 2010), Braga, Portugal, Universidade do Minho, 2010. AbstractINForum-teixeira-2010.pdf

Programs containing concurrency anomalies will most probably exhibit harmful erroneous and unpredictable behaviors. To ensure program correctness, the sources of those anomalies must be located and corrected. Concurrency anomalies in Transactional Memory (TM) programs should also be diagnosed and fixed. In this paper we propose a framework to deal with two different categories of concurrency anomalies in TM. First, we will address low-level TM anomalies, also called dataraces, which arise from executing programs in weak isolation. Secondly, we will address high-level TM anomalies, also called high-level dataraces, bringing the programmers attention to pairs of transactions that the programmer has misspecified, and should have been combined into a single transaction. Our framework was validated against a set of programs with well known anomalies and demonstrated high accuracy and effectiveness, thus contributing for improving the correctness of TM programs

Luís, J. E., J. M. Lourenço, and P. A. Lopes, "Suporte Transaccional para o Sistema de Ficheiros Btrfs", InForum 2011: Proceedings of InForum Simpósio de Informática, Coimbra, Universidade do Coimbra, 2011. Abstractinforum-txbtrfs-short.pdfinforum-txbtrfs-full.pdf

Em caso de falha abrupta de um sistema, é imperativo garantir a consistência do Sistema de Ficheiros (SF). Actualmente existem várias soluções que visam garantir que tanto os dados como os metadados do SF se encontram num estado consistente, mas que não contemplam a garantia de consistência dos dados do ponto de vista das aplicações. Por exemplo, aplicações que pretendam alterar vários ficheiros de configuração terão de encontrar mecanismos para garantir que, ou todos os ficheiros são devidamente alterados, ou nenhum o é, evitando assim que numa situação de falha o conteúdo dos ficheiros fique inconsistente. Do ponto de vista da aplicação, pode não ser simples implementar este comportamento sobre um SF típico; e pode também não ser razoável utilizar um Sistema de Gestão de Bases de Dados (SGBD), que oferece propriedades ACID. Neste artigo propomos, testamos e avaliamos uma integração das propriedades ACID num SF. Partindo do suporte para snapshots do sistema de ficheiros Btrfs, oferece-se uma semântica transaccional às aplicações que operam sobre volumes (sub-árvores) do SF, sem comprometer a semântica POSIX do SF.

Silva, J. A., T. M. Vale, R. J. Dias, H. Paulino, and J. M. Lourenço, "Supporting Multiple Data Replication Models in Distributed Transactional Memory", Proceedings of the 2015 International Conference on Distributed Computing and Networking, Goa, India, ACM, pp. 11:1–11:10, 2015. Abstracticdcn15-jsilva.pdf

Distributed transactional memory (DTM) presents itself as a highly expressive and programmer friendly model for concurrency control in distributed programming. Current DTM systems make use of both data distribution and replication as a way of providing scalability and fault tolerance, but both techniques have advantages and drawbacks. As such, each one is suitable for different target applications, and deployment environments. In this paper we address the support of different data replication models in DTM. To that end we propose ReDstm, a modular and non-intrusive framework for DTM, that supports multiple data replication models in a general purpose programming language (Java). We show its application in the implementation of distributed software transactional memories with different replication models, and evaluate the framework via a set of well-known benchmarks, analysing the impact of the different replication models on memory usage and transaction throughput.

Silva, J. A., T. M. Vale, R. J. Dias, H. Paulino, and J. M. Lourenço, "Supporting Partial Data Replication in Distributed Transactional Memory", Proceedings of Joint Euro-TM/MEDIAN Workshop on Dependable Multicore and Transactional Memory Systems, Vienna, Austria, jan, 2014. Abstractdmtm14-jsilva.pdf


Lourenço, J. M., and G. Cunha, "Testing patterns for software transactional memory engines", Proceedings of the 5th Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging (PADTAD'07), New York, NY, USA, ACM, pp. 36–42, 2007. Abstractpadtad21s.pdf

The emergence of multi-core processors is promoting the use of concurrency and multithreading. To raise the abstraction level of synchronization constructs is fundamental to ease the development of concurrent software, and Software Transactional Memory (STM) is a good approach towards such goal. However, execution environment issues such as the processor instruction set, caching policy, and memory model, may have strong influence upon the reliability of STM engines. This paper addresses the testing of STM engines aiming at improving their reliability and independence from execution environment. From our experience with porting and extending a specific STM engine, we report on some of the bugs found and synthesize some testing patterns that proved to be useful at testing STM engines.

Lourenço, J. M., and J. C. Cunha, "A Thread-Level Distributed Debugger", Proceedings of the 3rd International Conference on Vector and Parallel Processing (VecPar'98), Porto, Portugal, Universidade do Porto, pp. 359–366, 1998. Abstractvecpar98.pdf

In order to address the diversity of existing parallel programming models, it is important to provide development environments that can be incrementally extended with new services. Concerning the debugging of process based models, we have previously designed and implemented a basic interface that can be accessed by other tools as well as by debugging modules associated with high-level programming languages.

Cunha, J. C., J. M. Lourenço, and V. Duarte, "Tool Integration Issues for Parallel and Distributed Debugging", Proceedings of the 3rd SEIHPC Workshop, Braga, Portugal, University of Westminster, 1998. Abstractseihpc98.pdf

This paper describes our experience with the design and implementation of a distributed debugger for C/PVM programs within the scope of the SEPP and HPCTI Copernicus projects. These projects aimed at the development of an integrated parallel software engineering environment based on a high-level graphical parallel programming model (GRAPNEL) and a set of associated tools supporting graphical edition, compilation, simulated and real parallel execution, testing, debugging, performance monitoring, mapping, and load balancing. We discuss how the development of the debugging tool was strongly influenced by the requirements posed by other tools in the environment, namely support for high-level graphical debugging of GRAPNEL programs, and support for the integration of static and dynamic analysis tools. We describe the functionalities of the DDBG debugger and its internal architecture, and discuss its integration with two separate tools in the SEPP/HPCTI environment: the GRED graphical editor for GRAPNEL programs, and the STEPS testing tool for C/PVM programs.

Dikaiakos, M., O. Rana, S. Ur, and J. M. Lourenço, "Topic 1: Support Tools and Environments", Euro-Par 2008 Parallel Processing, vol. 5168, Berlin, Heidelberg, Springer-Verlag, pp. 1–2, 2008. Abstract

The spread of systems that provide parallelism either «in-the-large» (grid infrastructures, clusters) or «in-the-small» (multi-core chips), creates new opportunities for exploiting parallelism in a wider spectrum of application domains. However, the increasing complexity of parallel and distributed platforms renders the programming, the use, and the management of these systems a costly endeavor that requires advanced expertise and skills. Therefore, there is an increasing need for powerful support tools and environments that will help end-users, application programmers, software engineers and system administrators to manage the increasing complexity of parallel and distributed platforms.

Monteiro, R., J. M. Lourenço, and H. Paulino, "Um Armazenamento Distribuído para uma Rede de Dispositivos Móveis", Proceedings of INForum Simpósio de Informática, Covilhã, Portugal, sep, 2015. Abstractinforum2015-rmonteiro.pdf

Os dispositivos móveis em proximidade geográfica representam um conjunto de recursos inexplorados, tanto em termos de capacidade de processamento como de rmazenamento, o que abre caminho para novas aplicações com oportunidades e desafios únicos. Os sistemas atuais de partilha de dados (e. g., fotos, música, vídeos) para dispositivos móveis exigem que exista conectividade com a Internet para funcionarem. No entanto, em ambientes onde a conectividade com a Internet não é constante ou de boa qualidade (e. g., eventos desportivos e concertos), ou em locais remotos onde as infraestruturas de rede não existem, é difícil (ou mesmo impossível) partilhar dados entre vários dispositivos móveis. Para resolver este problema, os dispositivos móveis podem formar uma rede ad hoc para compartilhar os seus dados e recursos. Neste artigo propomos um sistema de armazenamento distribuído para partilha de dados entre dispositivos móveis de uso diário, e. g., smartphones e tablets, usando um mecanismo de melhor esforço para garantir persistência e disponibilidade de dados suportando churn (entrada e saída inesperada de dispositivos).

Silva, J. A., J. M. Lourenço, and H. Paulino, "Um Mecanismo de Caching para o Protocolo {SCORe}", Proceedings of INForum Simpósio de Informática, Porto, Portugal, FEUP Edições, pp. 260–275, sep, 2014. Abstractinforum14-jsilva.pdf

Os protocolos de replicação parcial de dados apresentam um grande potencial de escalabilidade. O SCORe é um protocolo para replicação parcial proposto recentemente que faz uso de controlo de concorrência multi-versão. Neste artigo abordamos um dos problemas principais que afeta o desempenho deste tipo de protocolos: a localidade dos dados, i.e., pode-se dar o caso do nó local não ter uma cópia dos dados a que pretende aceder, e nesse caso é necessário realizar uma ou mais operações de leitura remota. Assim, a não ser que se empreguem técnicas para melhorar a localidade no acesso aos dados, o número de operações de leitura remota aumenta com o tamanho do sistema, acabando por afetar o desempenho do mesmo. Nesse sentido, introduzimos um mecanismo de caching que permite replicar cópias de dados remotos de maneira a que seja poss{\'ı}vel servir localmente dados remotos enquanto que se mantém a consistência dos mesmos e a escalabilidade oferecida pelo protocolo. Avaliamos o mecanismo de caching com um benchmark conhecido da literatura e os resultados experimentais mostram resultados animadores com algum aumento no desempenho do sistema e uma redução considerável da quantidade de operações de leitura remota.