Comments on: Walkthrough – Distributed training with CNTK (Cognitive Toolkit)

By: Tsuyoshi Matsuzaki

Tsuyoshi Matsuzaki — Wed, 06 May 2020 23:59:10 +0000

In reply to AlexM.

Hi,
Both “Tutorials” and “Examples” in GitHub repo will cover a lot of scenarios in practical CNTK usage.
https://github.com/Microsoft/CNTK/tree/master/Tutorials
https://github.com/microsoft/CNTK/tree/master/Examples

LikeLike

By: AlexM

AlexM — Tue, 05 May 2020 10:02:56 +0000

Tsuyoshi-san,

I second the question above, do you have any pointers to Reinforcement Learning or distributed learners with CNTK?

Thank you in advance

Alex M

LikeLike

By: Ph0123

Ph0123 — Sat, 25 Apr 2020 20:55:35 +0000

Hi,
Thank your for your share.
I am try to implement pagerank with cntk.
In page rank, we need to send and receive value from other workers.
I do not see any function to do that.
It is posible to implement page rank with cntk??

LikeLike

By: yj

yj — Fri, 25 May 2018 04:17:15 +0000

In reply to Tsuyoshi Matsuzaki.

Thanks for your kind answer, Tsuyoshi-san.
For the p.s. question, well, MPI_Barrier(MPI_COMM_WORLD) could be used to synchronize all the MPI tasks.
Is there any way to directly read data from a directory on a runtime process? like “flow_from_directory” in keras?
Making a file (CFT format or else) from a large data set seems inefficient and could cause memory insufficiency…
Does CNTK has a library or functiom to do that? Sorry for bothering you several times. I am just trying to use cntk since it seems fast and accurate on a data training phase. Thanks always.

LikeLike

By: Tsuyoshi Matsuzaki

Tsuyoshi Matsuzaki — Thu, 24 May 2018 05:25:43 +0000

Hi yj-san, sorry for my late response. (because of the preparation for participating events as a speaker etc…)
In the script in github repo, it includes script for only text-to-ctf (see https://github.com/Microsoft/CNTK/blob/master/Scripts/txt2ctf.py), but almost all official samples in CNTK is writing from scratch (see https://github.com/Microsoft/CNTK/blob/master/Examples/Image/DataSets/CIFAR-10/cifar_utils.py), and sorry but it seems not having the built-in convert functionalities like tensorflow…

P.S. Can you share the solution for your “lock” problem ?

LikeLike

By: yj

yj — Wed, 23 May 2018 11:43:59 +0000

In reply to Tsuyoshi Matsuzaki.

Tsuyoshi san, I solved my own problem. Thanks for the answers. I just wonder, is there a decent way to convert numpy arrays(image data) to a CTF typed txt file ? I want to use MinibatchSource to manipulate data.

LikeLike

By: yj

yj — Tue, 15 May 2018 02:55:21 +0000

In reply to Tsuyoshi Matsuzaki.

Thanks for the info and the example. But I still need to know how to lock(mutex) processes(threads) for loading files from HDD unless all processes(nodes) try to read the same files individually. Since each process loads train files separately, the order(sequence) of files are all different on each node. so that I can’t separate train files to nodes properly. I hope you coud help me with this. Thanks again.

LikeLike

By: Tsuyoshi Matsuzaki

Tsuyoshi Matsuzaki — Mon, 14 May 2018 11:22:35 +0000

In reply to yj.

I understood your concerns.
In usual “MinibatchSource”, you can specify “num_data_partitions” and “partition_index” in “next_minibatch” method to read different partition in different nodes. See the following example, which is specifying “partition_index = cntk.Communicator.rank()” to read different partition each other.
https://github.com/Microsoft/CNTK/blob/master/bindings/python/cntk/learners/tests/distributed_multi_learner_test.py
I’m sorry I haven’t ever tried this case, but how about to use “number_of_workers” and “worker_rank” in “MinibatchSourceFrom”, same as previous “num_data_partitions” and “partition_index” in “MinibatchSource” ?

LikeLike

By: yj

yj — Mon, 14 May 2018 10:36:42 +0000

In reply to Tsuyoshi Matsuzaki.

Hi, Tsuyoshi san, Thanks for replying. I used MinibatchSourceFromData as you said. To run my code on multiple nodes with multiple gpus, I wrote “mpiexec -n 64 -npernode 4 ${node.info.file} python3 {exce file}” on my script. but in this case, 64 threads just execute the same .py file individually(the same execution 64 times. it also causes some errors on runtime). is there any way to use mutex or lock mpi threads on a code level?(for排他処理) The normal python mutex function was not working for this for sure. Sorry for bothering you, I hope you could help me for this. Thanks!

LikeLike

By: Tsuyoshi Matsuzaki

Tsuyoshi Matsuzaki — Mon, 14 May 2018 08:57:37 +0000

In reply to yj.

Hi yj-san. Please use MinibatchSourceFromData instead.

LikeLike