Comments on: Walkthrough – Distributed training with CNTK (Cognitive Toolkit)/2018/01/25/cntk_distributed_training_with_multiple_machines/Professional Development, Data ScienceWed, 06 May 2020 23:59:10 +0000hourly1http://wordpress.com/By: Tsuyoshi Matsuzaki/2018/01/25/cntk_distributed_training_with_multiple_machines/comment-page-1/#comment-31594Wed, 06 May 2020 23:59:10 +0000/?p=6104#comment-31594In reply to AlexM.

Hi,
Both “Tutorials” and “Examples” in GitHub repo will cover a lot of scenarios in practical CNTK usage.
https://github.com/Microsoft/CNTK/tree/master/Tutorials
https://github.com/microsoft/CNTK/tree/master/Examples

Like

]]>
By: AlexM/2018/01/25/cntk_distributed_training_with_multiple_machines/comment-page-1/#comment-31564Tue, 05 May 2020 10:02:56 +0000/?p=6104#comment-31564Tsuyoshi-san,

I second the question above, do you have any pointers to Reinforcement Learning or distributed learners with CNTK?

Thank you in advance

Alex M

Like

]]>
By: Ph0123/2018/01/25/cntk_distributed_training_with_multiple_machines/comment-page-1/#comment-31343Sat, 25 Apr 2020 20:55:35 +0000/?p=6104#comment-31343Hi,
Thank your for your share.
I am try to implement pagerank with cntk.
In page rank, we need to send and receive value from other workers.
I do not see any function to do that.
It is posible to implement page rank with cntk??

Like

]]>
By: yj/2018/01/25/cntk_distributed_training_with_multiple_machines/comment-page-1/#comment-2624Fri, 25 May 2018 04:17:15 +0000/?p=6104#comment-2624In reply to Tsuyoshi Matsuzaki.

Thanks for your kind answer, Tsuyoshi-san.
For the p.s. question, well, MPI_Barrier(MPI_COMM_WORLD) could be used to synchronize all the MPI tasks.
Is there any way to directly read data from a directory on a runtime process? like “flow_from_directory” in keras?
Making a file (CFT format or else) from a large data set seems inefficient and could cause memory insufficiency…
Does CNTK has a library or functiom to do that? Sorry for bothering you several times. I am just trying to use cntk since it seems fast and accurate on a data training phase. Thanks always.

Like

]]>
By: Tsuyoshi Matsuzaki/2018/01/25/cntk_distributed_training_with_multiple_machines/comment-page-1/#comment-2614Thu, 24 May 2018 05:25:43 +0000/?p=6104#comment-2614Hi yj-san, sorry for my late response. (because of the preparation for participating events as a speaker etc…)
In the script in github repo, it includes script for only text-to-ctf (see https://github.com/Microsoft/CNTK/blob/master/Scripts/txt2ctf.py), but almost all official samples in CNTK is writing from scratch (see https://github.com/Microsoft/CNTK/blob/master/Examples/Image/DataSets/CIFAR-10/cifar_utils.py), and sorry but it seems not having the built-in convert functionalities like tensorflow…

P.S. Can you share the solution for your “lock” problem ?

Like

]]>
By: yj/2018/01/25/cntk_distributed_training_with_multiple_machines/comment-page-1/#comment-2600Wed, 23 May 2018 11:43:59 +0000/?p=6104#comment-2600In reply to Tsuyoshi Matsuzaki.

Tsuyoshi san, I solved my own problem. Thanks for the answers. I just wonder, is there a decent way to convert numpy arrays(image data) to a CTF typed txt file ? I want to use MinibatchSource to manipulate data.

Like

]]>
By: yj/2018/01/25/cntk_distributed_training_with_multiple_machines/comment-page-1/#comment-2496Tue, 15 May 2018 02:55:21 +0000/?p=6104#comment-2496In reply to Tsuyoshi Matsuzaki.

Thanks for the info and the example. But I still need to know how to lock(mutex) processes(threads) for loading files from HDD unless all processes(nodes) try to read the same files individually. Since each process loads train files separately, the order(sequence) of files are all different on each node. so that I can’t separate train files to nodes properly. I hope you coud help me with this. Thanks again.

Like

]]>
By: Tsuyoshi Matsuzaki/2018/01/25/cntk_distributed_training_with_multiple_machines/comment-page-1/#comment-2491Mon, 14 May 2018 11:22:35 +0000/?p=6104#comment-2491In reply to yj.

I understood your concerns.
In usual “MinibatchSource”, you can specify “num_data_partitions” and “partition_index” in “next_minibatch” method to read different partition in different nodes. See the following example, which is specifying “partition_index = cntk.Communicator.rank()” to read different partition each other.
https://github.com/Microsoft/CNTK/blob/master/bindings/python/cntk/learners/tests/distributed_multi_learner_test.py
I’m sorry I haven’t ever tried this case, but how about to use “number_of_workers” and “worker_rank” in “MinibatchSourceFrom”, same as previous “num_data_partitions” and “partition_index” in “MinibatchSource” ?

Like

]]>
By: yj/2018/01/25/cntk_distributed_training_with_multiple_machines/comment-page-1/#comment-2489Mon, 14 May 2018 10:36:42 +0000/?p=6104#comment-2489In reply to Tsuyoshi Matsuzaki.

Hi, Tsuyoshi san, Thanks for replying. I used MinibatchSourceFromData as you said. To run my code on multiple nodes with multiple gpus, I wrote “mpiexec -n 64 -npernode 4 ${node.info.file} python3 {exce file}” on my script. but in this case, 64 threads just execute the same .py file individually(the same execution 64 times. it also causes some errors on runtime). is there any way to use mutex or lock mpi threads on a code level?(for排他処理) The normal python mutex function was not working for this for sure. Sorry for bothering you, I hope you could help me for this. Thanks!

Like

]]>
By: Tsuyoshi Matsuzaki/2018/01/25/cntk_distributed_training_with_multiple_machines/comment-page-1/#comment-2487Mon, 14 May 2018 08:57:37 +0000/?p=6104#comment-2487In reply to yj.

Hi yj-san. Please use MinibatchSourceFromData instead.

Like

]]>