How to override an async NDB method and write your own tasklet

Go To


I am trying to grasp async operations introduced with NDB, I would like to use @ndb.tasklet to async some of my work.

The simple example would be string_id generation in the overridden get_or_insert_async

Is this a correct way to to things? What can be improved here?

def get_or_insert_async(cls, *args):
    id = cls.make_string_id(*args) 
    model = yield super(MyModel, cls).get_or_insert_async(id)
    raise ndb.Return(model)

Another example would be doing stuff in a loop in fan-out kinda way. Is this correct?

def do_stuff(cls, some_collection):

    def internal_tasklet(data):
        id = make_stuff_needed_for_id(data)
        model = yield cls.get_or_insert_async(id)
        yield model.put_async()
        raise ndb.Return(None)

    for data in some_collection:
        # will it parallelise internal_tasklet execution? 
        yield internal_tasklet(data)

    raise ndb.Return(None)


As understood the whole concept, yields are here to provide a Future objects which are then collected in parallel (where possible) and executed asynchronously. Am I correct?

After Nick's hint (is it what you meant?):

def do_stuff(cls, some_collection):

    def internal_tasklet(data):
        id = make_stuff_needed_for_id(data)
        model = yield cls.get_or_insert_async(id)
        raise ndb.Return(model)                # change here

    models = []
    for data in some_collection:
        # will it parallelise internal_tasklet execution? 
        m = yield internal_tasklet(data)       # change here
        models.appedn(m)                       # change here

    keys = yield ndb.put_multi_async(models)   # change here
    raise ndb.Return(keys)                     # change here


New revised version…

def do_stuff(cls, some_collection):

    def internal_tasklet(data):
        id = make_stuff_needed_for_id(data)
        model = yield cls.get_or_insert_async(id)
        raise ndb.Return(model)                

    futures = []
    for data in some_collection:
        # tasklets won't run in parallel but while 
        # one is waiting on a yield (and RPC underneath)  
        # the other will advance it's execution 
        # up to a next yield or return
        fut = internal_tasklet(data))          # change here
        futures.append(fut)                    # change here

    Future.wait_all(futures)                   # change here

    models = [fut.get_result() for fut in futures]
    keys = yield ndb.put_multi_async(models)   # change here
    raise ndb.Return(keys)                     # change here
2012-04-04 19:22
by Janusz Skonieczny


You don't need to use tasklets if all you want to do is call something async with different arguments - just return the wrapped function's return value, like this:

def get_or_insert_async(cls, *args):
  id = cls.make_string_id(*args)
  return super(MyModel, cls).get_or_insert_async(id)

I'd be cautious about this for several reasons, though: You're changing the meaning of a built in function, which is usually a bad idea, you're changing the signature (positional arguments but no keyword arguments), and you're not passing extra arguments through to the original function.

For your second example, yielding things one at a time will force NDB to wait on their completion - 'yield' is synonymous with 'wait'. Instead, execute the tasklet function for each element in the collection, then wait on them all (by calling yield on the list) at the same time.

2012-04-05 04:24
by Nick Johnson
Wouldn't it make make_string_id call in get_or_insert_async synchronous and only the underlying call to the original get_or_insert_async really asynchronous - Janusz Skonieczny 2012-04-05 08:03
Can you rewrite second example in your answer? I not sure witch yields to drop so the loop wouldn't wait on each element and all (or most) internal_tasklet executions where parallelised - Janusz Skonieczny 2012-04-05 08:07
@WooYek Yes, but makestringid is synchronous in your snippet anyway. And if they're not making RPCs, there's no point in doing them asynchronously - tasklets only help when time is spent doing RPCs. Regarding the second example - the problem is the anti-pattern of calling yield in a loop. Doing this waits for each individual task to finish before going onto the next one. Any time you have a loop with yield in it, you should call the function without yield, assemble a list of the results, and call yield on that list - Nick Johnson 2012-04-05 08:46
Wait, what? I thought that the whole point of a ndb.tasklet decorator for a generator function is to execute it, get a Future do other stuff in parallel and then get_result, and the whole tasklet will be executed in parallel, stopping on each yield to suspend (allow other tasklets to execute a bit). The make_string_id example is a silly, but I was going for simplicity. Are you telling that make_string_id will be executed before a Future is returned (?!) or wont be executed until I call get_result on the Future - Janusz Skonieczny 2012-04-05 09:52
@WooYek Tasklets are designed to allow parallel execution using coroutines when there are async operations (eg, RPCs) involved. They work by running a routine until it has to wait on an RPC, then running the next pending coroutine, and so on. This works very well for eliminating wasted time waiting for responses, and lets you do a lot of stuff in parallel, but it doesn't execute Python code in parallel - you need threads for that - Nick Johnson 2012-04-05 10:18
…I thought that there where threads involved underneath. OK, now I got "the point" :D, thx! Now, lets' get back to "the loop" example. 1. Am I correct, that in the revised version, there will be multiple get_or_insert_async batched (this is what I'm trying to accomplish there)? 2. If yes, is the call to internal_tasklet call need to yield or can I just put internal_tasklet code directly in the loop? PS. Sorry to keep you here so long but I really want to get the hang of it - Janusz Skonieczny 2012-04-05 11:07
@WooYek No, you're still calling yield in a loop. As I explained in my previous answer, this is an anti-pattern. Yield does not return a future - yield waits on a future and returns its result - the async functions themselves return a future - Nick Johnson 2012-04-06 00:21
Is new version OK - Janusz Skonieczny 2012-04-06 11:37