[PLUG] question on obtaining the PID numbers of a batch command and finding out when a batch script has successfully terminated

Michael Ewan michaelewan15 at gmail.com
Tue Oct 8 15:23:10 UTC 2024


I can attest to the usefulness of Slurm as we use it for managing the
supercomputing clusters at PSU.  We had this very problem yesterday where a
grad student needed to process almost a hundred large climate data files
with a Python script and kept running out of memory.  Slurm can use an
array of file names so the Python script could process them individually
but as parallel jobs across the cluster.  It might be a heavy lift for a
small compute farm though.

On Mon, Oct 7, 2024 at 8:14 PM Tomas Kuchta <tomas.kuchta.lists at gmail.com>
wrote:

> I have found pueue much better tool than parallel for my circumstances
>
> https://github.com/Nukesor/pueue
>
> For multiple machine large cluster(s), I would consider Slurm.
>
> Tomas
>
> On Mon, Oct 7, 2024, 21:29 Russell Senior <russell at personaltelco.net>
> wrote:
>
> > On Mon, Oct 7, 2024 at 3:54 AM Robert Citek <robert.citek at gmail.com>
> > wrote:
> > >
> > > Sounds like you are wanting to manage parallel jobs.  Have you looked
> > into
> > > using the parallel command?
> > >
> > > https://www.gnu.org/software/parallel/
> >
> > That's cool!  Thank you for mentioning it.
> >
> > --
> > Russell Senior
> > russell at personaltelco.net
> >
>


More information about the PLUG mailing list