Have you ever wondered whether the order of fields within a Go struct affects memory consumption or application performance? I did, and it turns out that it might indeed have an impact. The reason? Memory alignment requirements in modern CPU architectures. I documented my understanding of the topic at https://lnkd.in/ddDsG8MU . Take a look if you're interested, feedback is appreciated!
Valerio Gheri’s Post
More Relevant Posts
-
One of the things I work on a Rockset is our CPU profiling infrastructure. We've developed a cool technique to precisely correlate perf samples with application data, such as query IDs. If you profile requests in a concurrent system it might be useful for you too: https://lnkd.in/dP2DuFRY
Profiling Individual Queries in a Concurrent System
rockset.com
To view or add a comment, sign in
-
Definitely worth a read for people who are interested in understanding performance of concurrent systems.
One of the things I work on a Rockset is our CPU profiling infrastructure. We've developed a cool technique to precisely correlate perf samples with application data, such as query IDs. If you profile requests in a concurrent system it might be useful for you too: https://lnkd.in/dP2DuFRY
Profiling Individual Queries in a Concurrent System
rockset.com
To view or add a comment, sign in
-
Autonomous Vehicle Safety, Embedded Software, UL 4600, Consulting, (He/him.) Personal account; likes/shares are interest and not endorsements; silence does not imply agreement.
Here is a sneak peak section from my new book: I updated the CRC/checksum speed comparison to include a new dual-sum checksum variant and some novel speedup techniques. You can get twice the data word length at HD=3 and also twice the speed compared to Fletcher/Adler checksum by using a DualX checksum. Or use the DualXP variant to get HD=4 at about the same data word length that a Fletcher/Adler gets HD=3. (Explained in the blog, with pointer in book support site to source code.) Much longer still HD=3/HD=4 is available using a Koopman Checksum. And a CRC is still there as a further tradeoff point for length vs. speed. (Speeds depend on the CPU you're using; these are for 32-bit checksums on a 32-bit desktop CPU.) #embedded #crc #checksum https://lnkd.in/e77HZQWn
Comparative speeds for different Checksum & CRC implementations
checksumcrc.blogspot.com
To view or add a comment, sign in
-
In the world of computer science and software development, understanding CPU microarchitecture is like having a magic key to optimize your code. This complex topic delves into the inner workings of your computer's processor, and it's the cornerstone for crafting software that runs faster and smoother. https://lnkd.in/e28B_NJq is a great starting point for anyone interested in learning about pipelining, branch prediction and data dependencies with easy to follow visulization and explanation.
Architecture All Access: Modern CPU Architecture 2 - Microarchitecture Deep Dive | Intel Technology
https://www.youtube.com/
To view or add a comment, sign in
-
The CPU flame graph 🔥 is a visualization tool that unravels the mystery of CPU time consumption within the code ... How? By representing function call stack traces, so the width of each stack frame tells you how much CPU time it's responsible for, this will enable you to spot performance bottlenecks and areas crying out 😭 for optimization. There are many tools to generate the flame graph but the most popular ones are: - https://lnkd.in/ee9hVrBc - https://lnkd.in/e7Yw2Tk4 As a developer, you know what readme.md is for so there is no need to write about how to use them! yet let's explain how you can read a flame graph; * Colors are used to differentiate functions, but the specific color doesn't hold significance. * I mentioned earlier that it represents a function call stack traces. Thus, think of the y-axis as your function call stack, from the peak (root) to the base (leaves), each box symbolizes a function call; The box's width? It's a measure of the CPU time that the function consumed. * The width of the box shows how much CPU time a function and its children use, so if a box goes deeper down it means the function is riding along with its parent's call. * In order to spot the bottleneck, you have to look for wide and deep boxes these are potential performance bottlenecks, for the reason that they offer insight into where your code is spending the most time. CPU flame graphs are like a guide through the maze of code performance, and it is important that you learn how to make and understand these graphs to make your code work better (blazing faaaaaast 🚀 ). #PythonPerformance #CodeOptimization #FlameGraphs #PySpy #VisualizePerformance #TechTools #PerformanceInsights #CodeEfficiency #SoftwareDevelopment #softwareengineering #performanceoptimization #performanceengineering #OptimizeYourCode
To view or add a comment, sign in
-
-
Spark optimizations (Resource Level Optimization) As discussed in last post ,there are two ways of optimizations ,lets discuss about resource level optimization. In order for a job to run efficiently, the right amount of resources should be allocated. Resources include: Memory (RAM) CPU cores (Compute) Let's think about an example. Think about a cluster of 10 nodes, or 10 worker nodes. The following resources are available on every machine: 64GB of RAM & 16 CPU cores Let's determine the maximum number of executors a node can have in order to process data efficiently Strategies for creating Containers 1.Thin Executors: It means More executors created holding minimal resources -total 16 executors -It indicates that 16 executors altogether, each with 1 CPU core and 4GB of RAM. Disadvantages: 1. No Multithreading(no multitasking) 2. Multiple copies result from shared variables. 2.Fat Executors It means Give maximum resources to each executor -with each executor having 16 CPU cores and 64GB of RAM. Disadvantages: 1 HDFS throughput suffers 2. Takes a lot of time for garbage collection Appropriate/Well-balanced method for building Containers We have 16 cores and 64GB RAM 1 core goes for background activity 1GB RAM is allocated to the Operating System Therefore, left with 15 cores and 63GB RAM Requirement: so,5 cores and 21GB RAM / executor is considered ideal and we can have 1. Multithreading within executor 2.HDFS throughput shouldn't suffer #pyspark #apachespark #bigdata
To view or add a comment, sign in
-
Want end to end scripts for FINDING WHAT'S CAUSING HIGH CPU(Helpful for day-to-day issues/your upcoming interview) . Here you go - 1. Identify Currently Running Queries Consuming CPU: SELECT r.session_id, r.status, r.wait_type, r.cpu_time, r.total_elapsed_time, r.start_time, s.text AS [Query Text] FROM sys.dm_exec_requests r CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) s ORDER BY r.cpu_time DESC; 2. Identify Top CPU Consumers by Query: SELECT TOP 10 total_worker_time/execution_count AS [Avg CPU Time], execution_count, total_elapsed_time/execution_count AS [Avg Elapsed Time], total_logical_reads/execution_count AS [Avg Logical Reads], total_physical_reads/execution_count AS [Avg Physical Reads], total_logical_writes/execution_count AS [Avg Logical Writes], (SELECT text FROM sys.dm_exec_sql_text(sql_handle)) AS [Query Text] FROM sys.dm_exec_query_stats WHERE total_worker_time > 0 ORDER BY [Avg CPU Time] DESC; 3. Through the help of some specific wait types in SQL Server SELECT wait_type, waiting_tasks_count, wait_time_ms, max_wait_time_ms, signal_wait_time_ms FROM sys.dm_os_wait_stats WHERE wait_type LIKE 'CXPACKET%' OR wait_type LIKE 'SOS_SCHEDULER_YIELD' OR wait_type LIKE 'THREADPOOL' OR wait_type LIKE 'RESOURCE_SEMAPHORE' ORDER BY wait_time_ms DESC; I will briefly explain what does these wait types mean & please note that they represent various scenarios where SQL Server tasks might be waiting for resources, including CPU resources. CXPACKET: This wait type indicates parallel query execution waits. High values may suggest that parallelism is causing CPU contention. SOS_SCHEDULER_YIELD: This wait type occurs when a task voluntarily yields the scheduler because it has no more work to do, allowing other tasks to execute. High values may indicate CPU pressure. THREADPOOL: This wait type indicates that SQL Server is waiting for a worker thread to become available. High values may suggest that there are insufficient worker threads to handle the workload. RESOURCE_SEMAPHORE: This wait type occurs when SQL Server is waiting for memory grants. High values may indicate memory pressure leading to CPU contention.
To view or add a comment, sign in
-
Nano Tips(12): -------------- Internal fragmentation(empty space on a page) may takes more time to execute queries and consumes more resources(Cache, CPU and IO) #sql #performancetuning #databasedesign
To view or add a comment, sign in
-
A quick way to monitor processes for CPU and memory utilization irrespective of the type of operating system. https://lnkd.in/d2aZifyW
Using the script
sbytestream.pythonanywhere.com
To view or add a comment, sign in
-
Big Data | Apache Spark | Databricks | Azure | Python | SQL | IIT-Roorkee | Ex-Times Business Solution
Let's try to understand Parallel Execution and Job Creation in a Cluster Environment with an example. Cluster Configuration: - Cluster Composition: 3 worker nodes - Executor Specifications: Each node contains 3 executors with individual configurations of 3 CPU cores and 1 GB memory per executor. File Execution Details: - File Size: 4 GB - Partition Size: Default setting at 128 MB Calculations: 1. Number of Partitions: - Partition size=min(maxPartitionBytes, filesize/defaultParallelism) - File size / Partition size = 4 GB / 128 MB = 32 partitions 2. Total Number of Tasks to be Executed: - Each partition corresponds to a task, so the total tasks = Number of partitions = 32 tasks 3. Total Number of Tasks Running in Parallel: - Considering the configuration, the maximum parallelism is determined by: - Number of executors * Number of nodes * Number of CPU cores per executor = 3 executors * 3 nodes * 3 CPU cores = 27 tasks running in parallel 4. Number of Jobs: - Each Action applied constitutes a job. Therefore, the number of jobs depends on the number of Actions applied. 5. Number of Stages: - Stages consist of transformations plus any shuffling stages. The count of shuffled transformations plus one denotes the number of stages in the job execution. Tagging Sumit Mittal for review. #ApacheSpark #ClusterComputing #BigData #DataEngineering #SparkProgramming
To view or add a comment, sign in
-