yandex
loader

please wait

  • Dilpreet Kaur Feb-12-2019 06:52:01 AM ( 1 month ago )

     

    I am new to API development and I want to create a Web API end point which will be receiving a large amount of log data. And I want to send that data to Amazon s3 bucket via Amazon Kinesis delivery stream. Below is a sample application which works FINE, but I have NO CLUE how to INGEST large inbound of data and in What format my API should be receiving data? How my API Endpoint should look like.

     
     [HttpPost]
     public async void Post() // HOW to allow it to receive large chunk of data?
     {
            await WriteToStream();
     }
    
        private async Task WriteToStream()
        {
            const string myStreamName = "test";
            Console.Error.WriteLine("Putting records in stream : " + myStreamName);
            // Write 10 UTF-8 encoded records to the stream.
            for (int j = 0; j < 10000; ++j)
            {
            // I AM HARDCODING DATA HERE FROM THE LOOP COUNTER!!! 
                byte[] dataAsBytes = Encoding.UTF8.GetBytes("testdata-" + j);
                using (MemoryStream memoryStream = new MemoryStream(dataAsBytes))
                {
                        PutRecordRequest putRecord = new PutRecordRequest();
                        putRecord.DeliveryStreamName = myStreamName;
                        Record record = new Record();
                        record.Data = memoryStream;
                        putRecord.Record = record;
                        await kinesisClient.PutRecordAsync(putRecord);
                }
            }
        }

    P.S: IN real world app I will not have that for loop. I want my API to ingest large data, what should be the definition of my API? Do I need to use something called multiform/datafile? Please guide me.

  • Sarah Jones Feb-12-2019 06:53:21 AM ( 1 month ago )

    Here is my thought process. As you are exposing a API for the logging, your input should contain below attributes

    • Log Level (info, debug, warn, fatal)
    • Log message (string)
    • Application ID
    • Application Instance ID
    • application IP
    • Host (machine in which the error was logged)
    • User ID (for whom the error occurred)
    • Time stamp in Utc (time at which the error occurred)
    • Additional Data (customisable as xml / json)

    I will suggest exposing the API as AWS lambda via Gateway API as it will help in scaling out as load increases.

    To take sample for how to build API and use model binding, you may refer https://docs.microsoft.com/en-us/aspnet/web-api/overview/formats-and-model-binding/model-validation-in-aspnet-web-api

  • Yasmin Mirza Feb-12-2019 06:54:19 AM ( 1 month ago )

    I don't have much context so basically will try to provide answer from how I see it.

    First instead of sending data to webapi I would send data directly to S3. In azure there is Share Access Token so you send request to you api to give you url where to upload file(there is many options but you can limit by time, limit by IP who can upload). So to upload file 1. Do call to get upload Url, 2. PUT to that url. Looks like in Amazon it called Signed Policy.

    After that write lambda function which will be triggered on S3 upload, this function will be sending event (Again I dont know how its in AWS but in azure I will send Blob Queue message) this event will contain url to file and start position.

    Write second Lambda which listens to events and do actually processing, so in my apps sometimes i know that to process N items it take 10 seconds so I usually choose N to be something not longer that 10-20 seconds, due to nature of deployments. After you processed N rows and not yet finished send same event but now Start position = Start position on the begging + N. More info how to read range

    Designing this way you can process large files, even more you can be smarter because you can send multiple events where you can say Start Line, End Line so you will be able to process your file in multiple instances.

    PS. Why I would not recommend you upload files to WebApi its because those files will be in memory, so lets say you have 1GB files sending from multiple sources in this case you will kill your servers in minutes.

    PS2. Format of file depends, could be json since its the easiest way to read those files, but keep in mind that if you have large files it will be expensive to read whole file to memory. Here is example how to read them properly. So other option could be just flat file then will be easy to read it, since then you can read range and process it

    PS3. In azure I would use Azure Batch Jobs

Please login

Similar Discussion

Recommended For You