Thursday, 17 de April de 2014

Ficha del recurso:

Vínculo a la patente on-line

Nº de referencia de publicación:

20120159236

Fecha de publicación:

Thursday, 21 de June de 2012

Última actualización:

Thursday, 28 de June de 2012

Entrada en el observatorio:

Thursday, 28 de June de 2012

Idioma:

Castellano

Archivado en:


Holistic task scheduling for distributed computing

Embodiments of the invention include a method for fault tolerance management of workers nodes during map/reduce computing in a computing cluster. The method includes subdividing a computational problem into a set of sub-problems, mapping a selection of the sub-problems in the set to respective nodes in the cluster, directing processing of the sub-problems in the respective nodes, and collecting results from completion of processing of the sub-problems. During a first early temporal portion of processing the computational problem, failed nodes are detected and the sub-problems currently being processed by the failed nodes are re-processed. Conversely, during a second later temporal portion of processing the computational problem, sub-problems in nodes not yet completely processed are replicated into other nodes, processing of the replicated sub-problems directed, and the results from completion of processing of sub-problems collected. Finally, duplicate results are removed and remaining results reduced into a result set for the problem.