summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeLines
* [lldb] Don't construct the demangled strings while indexing the symbol tableD118814Jonas Devlieghere2022-02-03-18/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The symbol table needs to demangle all symbol names when building its index. However, this doesn't require the full mangled name: we only need the base name and the function declaration context. Currently, we always construct the demangled string during indexing and cache it in the string pool as a way to speed up future lookups. Constructing the demangled string is by far the most expensive step of the demangling process, because the output string can be exponentially larger than the input and unless you're dumping the symbol table, many of those demangled names will not be needed again. This patch avoids constructing the full demangled string when we can partially demangle. This speeds up indexing and reduces memory usage. I gathered some numbers by attaching to Slack: Before ------ Memory usage: 280MB Benchmark 1: ./bin/lldb -n Slack -o quit Time (mean ± σ): 4.829 s ± 0.518 s [User: 4.012 s, System: 0.208 s] Range (min … max): 4.624 s … 6.294 s 10 runs After ----- Memory usage: 189MB Benchmark 1: ./bin/lldb -n Slack -o quit Time (mean ± σ): 4.182 s ± 0.025 s [User: 3.536 s, System: 0.192 s] Range (min … max): 4.152 s … 4.233 s 10 runs Differential revision: https://reviews.llvm.org/D118814
* [Support][NFC] Don’t duplicate class or function name in commentAmir Ayupov2022-02-03-146/+101
| | | | | | | | Refactor comments in CommandLine.h to follow the Coding Style rule Reviewed By: MaskRay, serge-sans-paille Differential Revision: https://reviews.llvm.org/D118859
* [clang-format] Avoid merging macro definitions.Marek Kurdej2022-02-03-0/+19
| | | | | | | | Fixes https://github.com/llvm/llvm-project/issues/42087. Reviewed By: HazardyKnusperkeks, owenpan Differential Revision: https://reviews.llvm.org/D118879
* [clang-format] Avoid adding space after the name of a function-like macro ↵Marek Kurdej2022-02-03-0/+18
| | | | | | | | | | | | | | | | | | | | when the name is a keyword. Fixes https://github.com/llvm/llvm-project/issues/31086. Before the code: ``` #define if(x) ``` was erroneously formatted to: ``` #define if (x) ``` Reviewed By: HazardyKnusperkeks, owenpan Differential Revision: https://reviews.llvm.org/D118844
* [RISCV] Add FMV_X_W and FMV_X_H to RISCVSExtWRemoval.Craig Topper2022-02-03-15/+69
| | | | | Add -target-abi to sextw-removal.ll RUN lines to show benefit on new test case.
* [AMDGPU] HWRegs TMA and TBA also supported on gfx9Stanislav Mekhanoshin2022-02-03-18/+26
| | | | Differential Revision: https://reviews.llvm.org/D118860
* [x86] add minimal test for sbb idiom and CPU capabilities; NFCSanjay Patel2022-02-03-0/+19
| | | | | | | | D116804 proposes to alter codegen on this example based on CPU tuning, so check a variety of models to confirm it works as expected. We already have this test mixed in with several others in another test file, but it seems wasteful to add so many RUN lines to check this difference over and over again.
* [x86] remove CPU requirement for RUN line in test file; NFCSanjay Patel2022-02-03-1/+1
| | | | | A proposed change ( D118843 ) that would affect this test will not require a specific CPU model to show a difference.
* [hwasan] add musttail IR test.Florian Mayer2022-02-03-0/+30
| | | | | | | | we currently only have a test at the clang level Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D118856
* Revert "[nfc][mlgo] De-const a parameter"Mircea Trofin2022-02-03-11/+11
| | | | | | | This reverts commit bc3b372161716a4c4845d47a877e4892df0d08da. The planned change that would have needed non-const MachineFunction refs isn't needed after all.
* [SLP] Fix a typo in commentPhilip Reames2022-02-03-1/+1
|
* [lldb] Fix windows&mac builds for c34698a811b13Pavel Labath2022-02-03-1/+4
|
* [AMDGPU] Introduce new ISel combine for trunc-slr patternsThomas Symalla2022-02-03-26/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In some cases, when selecting a (trunc (slr)) pattern, the slr gets translated to a v_lshrrev_b3e2_e64 instruction whereas the truncation gets selected to a sequence of v_and_b32_e64 and v_cmp_eq_u32_e64. In the final ISA, this appears as selecting the nth-bit: v_lshrrev_b32_e32 v0, 2, v1 v_and_b32_e32 v0, 1, v0 v_cmp_eq_u32_e32 vcc_lo, 1, v0 However, when the value used in the right shift is known at compilation time, the whole sequence can be reduced to two VALUs when the constant operand in the v_and is adjusted to (1 << lshrrev_operand): v_and_b32_e32 v0, (1 << 2), v1 v_cmp_ne_u32_e32 vcc_lo, 0, v0 In the example above, the following pseudo-code: v0 = (v1 >> 2) v0 = v0 & 1 vcc_lo = (v0 == 1) would be translated to: v0 = v1 & 0b100 vcc_lo = (v0 == 0b100) which should yield an equivalent result. This is a little bit hard to test as one needs to force the SelectionDAG to contain the nodes before instruction selection, but the test sequence was roughly derived from a production shader. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D118461
* [mlir-translate] Teach these tools about --allow-unregistered-dialectChris Lattner2022-02-03-0/+19
| | | | | | | | Some translations do work with unregistered dialects, this allows one to write testcases against them. It works the same way as it does for mlir-opt. Differential Revision: https://reviews.llvm.org/D118872
* [AARCH64][NEON] Allow to sink operands for aarch64_neon_pmullSunho Kim2022-02-03-0/+184
| | | | | | | This teaches AArch64TargetLowering::shouldSinkOperands to sink the operands of aarch64_neon_pmull intrinsic. Differential Revision: https://reviews.llvm.org/D117944
* [test] check strictest attributes possible for InferFunctionAttrs testAugie Fackler2022-02-03-1/+1
| | | | | | | | | | | | This appears to have all the same attributes as many other functions in this file, and I think the use of INACCESSIBLEMEMONLY_NOFREE_NOUNWIND instead of INACCESSIBLEMEMONLY_NOFREE_NOUNWIND_WILLRETURN was an oversight that meant aligned_alloc's attributes were just going unchecked. This patch corrects the test defect and now the attributes inferred on aligned_alloc are actually validated, and the test still passes. Differential Revision: https://reviews.llvm.org/D117922
* add IR compatability test for (upcoming) allocsize attributeAugie Fackler2022-02-03-2/+11
|
* [NFC] MemoryBuiltins: tease out a getFreeFunctionDataForFunction helperAugie Fackler2022-02-03-5/+12
|
* [RISCV] Remove createVirtualRegister from RISCVInstrInfo::movImm.Craig Topper2022-02-03-14/+5
| | | | | | | | | | | | | | Based on the discussion in D61884, this was done to enable compressed instructions by giving freedom to pick a compressible register. Integer materializing can generate LUI, ADDI, ADDIW, SLLI and some Zb* instructions. C.LI, C.LUI, C.ADDI, C.ADDIW, and C.SLLI all have a 5-bit register encoding. The Zb* instructions aren't compressible. Based on that I don't think compressibility of the register is a concern. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D118741
* [clang-tidy] Fix LLVM include order check policyKadir Cetinkaya2022-02-03-17/+56
| | | | | | | | | Clang-format LLVM style has a custom include category for gtest/ and gmock/ headers between regular includes and angled includes. Do the same here. Fixes https://github.com/llvm/llvm-project/issues/53525. Differential Revision: https://reviews.llvm.org/D118913
* [RISCV] Remove RISCVISD::SPLAT_VECTOR_I64 in favor of RISCVISD::VMV_V_X_VL.Craig Topper2022-02-03-45/+42
| | | | | | | | | | | | | | SPLAT_VECTOR_I64 has the same semantics as RISCVISD::VMV_V_X_VL, it just assumed VLMax instead of carrying a VL operand. Include order of RISCVInstrInfoVSDPatterns.td and RISCVInstrInfoVVLPatterns.td has been swapped to avoid moving riscv_vmv_v_x_vl into RISCVInstrInfoVSDPatterns.td and to allow moving other "_vl" SDNodes back to RISCVInstrInfoVVLPatterns.td Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D118841
* MemoryBuiltins: simplify isLibFreeFunction [NFC]Augie Fackler2022-02-03-36/+47
| | | | | | This is in anticipation of my next patch, where I need to store more information about free functions than just their argument count. It felt invasive enough on this function that it seemed worthwhile to just extract this as its own commit that makes no functional changes. Differential Revision: https://reviews.llvm.org/D117350
* [AMDGPU] Simplify AMDGPUAnnotateUniformValues::visitLoadInstJay Foad2022-02-03-306/+163
| | | | | | | | | | | | Always set uniform metadata on the pointer if it is an instruction, but otherwise do not bother to create a trivial getelementptr instruction, because AMDGPUInstrInfo::isUniformMMO can already detect that various non-instruction pointers are uniform. Most of the test case churn is from tests that used undef as a pointer, which AMDGPUInstrInfo::isUniformMMO treats as uniform. Differential Revision: https://reviews.llvm.org/D118909
* [mlir][taco] Uses sparse_tensor.new to read tensor input data from files.Bixia Zheng2022-02-03-120/+215
| | | | | | | | | | | | | | | | Replace the Python implementation for reading tensor input data from files with create_sparse_tensor that uses sparse_tensor.new. The MLIR TNS format has two extra meta data lines. Add the extra meta data to a test data file. Implement TACO tensor methods evaluate and unpack. Add unit tests. Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D118803
* [MLIR][SCF] Remove loop invariant arguments of scf.whileAbhishek Varma2022-02-03-2/+362
| | | | | | | | | | | | | | | | -- This commit adds a canonicalization pattern on scf.while to remove the loop invariant arguments. -- An argument is considered loop invariant if the iteration argument value is the same as the corresponding one being yielded (at the same position) in both the before/after block of scf.while. -- For the arguments removed, their use within scf.while and their corresponding scf.while's result are replaced with their corresponding initial value. Signed-off-by: Abhishek Varma <abhishek.varma@polymagelabs.com> Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D116923
* [AMDGPU] Tweak tests in noclobber-barrier.llJay Foad2022-02-03-40/+26
| | | | | | | Tweak some of the tests to demonstrate AMDGPUAnnotateUniformValues::visitLoadInst inserting a trivial getelementptr instruction, just to have somewhere to put amdgpu.uniform metadata. NFC.
* [gn build] Port c34698a811b1LLVM GN Syncbot2022-02-03-1/+1
|
* MipsABIFlagsSection.h - replace unnecessary StringRef include with forward ↵Simon Pilgrim2022-02-03-1/+1
| | | | declaration
* [X86] simplifyX86varShift - use KnownBits.getMaxValue().ult() to check for ↵Simon Pilgrim2022-02-03-4/+3
| | | | | | out of bounds shift amounts This is easier to grok than MaskedValueIsZero for high bits.
* [gn build] (manually) port 20e05b9f0ebe (ClangPseudoTests)Nico Weber2022-02-03-0/+28
|
* [clang][driver][wasm] Remove unneeded default labelsTimm Bäder2022-02-03-4/+0
| | | | Fix build fallout from b5787a0c6cc4da47b7d7b218e23f780076ad2f5f
* [LV] Use VScaleForTuning to allow wider epilogue VFs.Sander de Smalen2022-02-03-10/+218
| | | | | | | | | | | | | | | | | | | When the main loop is e.g. VF=vscale x 1 and the epilogue VF cannot be any smaller, the vectorizer should try to estimate how many lanes are executed at runtime and allow a suitable fixed-width VF to be chosen. It can use VScaleForTuning to figure out what a suitable fixed-width VF could be. For the case where the main loop VF is VF=vscale x 1, and VScaleForTuning=8, it could still choose an epilogue VF upto VF=4. This was a bit tricky to test, so this patch also introduces a wrapper function to get 'VScaleForTuning' by also considering vscale_range. If min and max are equal, then that will be the vscale we compile for. It makes little sense to tune for a different width if the code will not be portable for other widths. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D118709
* [clang][driver][wasm] Support -stdlib=libstdc++ for WebAssemblyTimm Bäder2022-02-03-19/+145
| | | | | | | | The WebAssembly toolchain currently supports only -stdlib=libc++ and implicitly assumes the c++ stdlib to be libc++. Change this to also support libstdc++. Differential Revision: https://reviews.llvm.org/D117888#3290628
* Revert "[flang] Debugging of ACCESS='STREAM' I/O"Andrzej Warzynski2022-02-03-180/+143
| | | | | | | This reverts commit be9946b877add0db906090d22840b213c3f41dd2. This change has caused Flang's Windows buildbot to start failing: * https://lab.llvm.org/buildbot/#/builders/172/builds/7664
* [Lanai] Remove orphan LanaiInstPrinter::printAluOperand declaration. NFCI.Simon Pilgrim2022-02-03-1/+0
|
* LanaiInstPrinter.h - replace unnecessary StringRef include with forward ↵Simon Pilgrim2022-02-03-1/+1
| | | | declaration
* [SLP]Excluded external uses from the reordering estimation.Alexey Bataev2022-02-03-130/+179
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Compiler adds the estimation for the external uses during operands reordering analysis, which makes it tend to prefer duplicates in the lanes rather than diamond/shuffled match in the graph. It changes the sizes of the vector operands and may prevent some vectorization. We don't need this kind of estimation for the analysis phase, because we just need to choose the most compatible instruction and it does not matter if it has external user or used in the non-matching lane. Instead, we count the number of unique instruction in the lane and see if the reassociation changes the number of unique scalars to be power of 2 or not. If we have power of 2 unique scalars in the lane, it is considered more profitable rather than having non-power-of-2 number of unique scalars. Metric: SLP.NumVectorInstructions test-suite :: MultiSource/Benchmarks/FreeBench/distray/distray.test 70.00 86.00 22.9% test-suite :: External/SPEC/CFP2017rate/544.nab_r/544.nab_r.test 346.00 353.00 2.0% test-suite :: External/SPEC/CFP2017speed/644.nab_s/644.nab_s.test 346.00 353.00 2.0% test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test 235.00 239.00 1.7% test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test 235.00 239.00 1.7% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 8723.00 8834.00 1.3% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 1051.00 1064.00 1.2% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 1628.00 1646.00 1.1% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 1628.00 1646.00 1.1% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 9100.00 9184.00 0.9% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 3565.00 3577.00 0.3% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 3565.00 3577.00 0.3% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 4235.00 4245.00 0.2% test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1996.00 1998.00 0.1% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 1671.00 1672.00 0.1% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 783.00 782.00 -0.1% test-suite :: SingleSource/Benchmarks/Misc/oourafft.test 69.00 68.00 -1.4% test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test 207.00 192.00 -7.2% test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test 207.00 192.00 -7.2% test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test 89.00 80.00 -10.1% test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test 89.00 80.00 -10.1% test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test 260.00 215.00 -17.3% test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test 256.00 211.00 -17.6% MultiSource/Benchmarks/Prolangs-C/TimberWolfMC - pretty the same. SingleSource/Benchmarks/Misc/oourafft.test - 2 <2 x > loads replaced by one <4 x> load. External/SPEC/CINT2017speed/641.leela_s - function gets vectorized and not inlined anymore. External/SPEC/CINT2017rate/541.leela_r - same xternal/SPEC/CINT2017rate/531.deepsjeng_r - changed the order in multi-block tree, the result is pretty the same. External/SPEC/CINT2017speed/631.deepsjeng_s - same. MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a - the result is the same as before. MultiSource/Benchmarks/MiBench/consumer-jpeg - same. Differential Revision: https://reviews.llvm.org/D116688
* [NFC] Move FoldingSetNodeID::AddInteger and FoldingSetNodeID::AddPointer ↵Dawid Jurczak2022-02-03-42/+27
| | | | | | | | | | | | | definitions to header Lack of AddInteger/AddPointer inlining slows down NodeEquals/Profile/:operator== calls. Inlining makes FunctionProtoTypes/PointerTypes/ElaboratedTypes/ParenTypes Profile functions faster but since NodeEquals is still called indirectly through function pointer from FindNodeOrInsertPos there is room for further inlining improvements. Extracted from: https://reviews.llvm.org/D118385 Differential Revision: https://reviews.llvm.org/D118610
* XCoreTargetMachine.h - replace unnecessary StringRef include with forward ↵Simon Pilgrim2022-02-03-1/+1
| | | | declaration
* XCoreInstPrinter.h - replace unnecessary StringRef include with forward ↵Simon Pilgrim2022-02-03-1/+1
| | | | declaration
* [XCore] Remove orphan XCoreInstPrinter::printMemOperand declaration. NFCI.Simon Pilgrim2022-02-03-1/+0
|
* [SLP]Alternate vectorization for cmp instructions.Alexey Bataev2022-02-03-111/+227
| | | | | | | | | Added support for alternate ops vectorization of the cmp instructions. It allows to vectorize either cmp instructions with same/swapped predicate but different (swapped) operands kinds or cmp instructions with different predicates and compatible operands kinds. Differential Revision: https://reviews.llvm.org/D115955
* [clang][docs] Regenerate ASTMatchers documentationNathan James2022-02-03-17/+68
|
* [mlir][Rewrite] Add support for using an operation with no results as locationMarkus Böck2022-02-03-3/+20
| | | | | | | Prior to this patch, using an operation without any results as the location would result in the generation of invalid C++ code. It'd try to format using the result values, which would would end up being an empty string for an operation without any. This patch fixes that issue by instead using getValueAndRangeUse which handles both ranges as well as the case for an op without any results. Differential Revision: https://reviews.llvm.org/D118885
* [fir] Add fir.array_amend operation definitionValentin Clement2022-02-03-0/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the fir.array_amend operation. this op is used later in upstreaming patches for the F95 compliance. The `array_amend` operation marks an array value as having been changed via a reference obtain by an `array_access`. It acts as a logical transaction log that is used to merge the final result back with an `array_merge_store` operation. ```mlir // fetch the value of one of the array value's elements %1 = fir.array_access %v, %i, %j : (!fir.array<?x?xT>, index, index) -> !fir.ref<T> // modify the element by storing data using %1 as a reference %2 = ... %1 ... // mark the array value %new_v = fir.array_amend %v, %2 : (!fir.array<?x?xT>, !fir.ref<T>) -> !fir.array<?x?xT> ``` This patch is part of the upstreaming effort from fir-dev branch. Reviewed By: kiranchandramohan, schweitz Differential Revision: https://reviews.llvm.org/D112448 Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
* [flang] Remove ununsed variable in ScalarExprLoweringValentin Clement2022-02-03-6/+4
| | | | Fix buildbot failure https://lab.llvm.org/buildbot/#/builders/180/builds/3066
* [NFC] TypePromotion test for AArch64Sam Parker2022-02-03-0/+435
|
* [lldb] Rename Logging.h to LLDBLog.h and clean up includesPavel Labath2022-02-03-85/+240
| | | | | | | | | | | | | Most of our code was including Log.h even though that is not where the "lldb" log channel is defined (Log.h defines the generic logging infrastructure). This worked because Log.h included Logging.h, even though it should. After the recent refactor, it became impossible the two files include each other in this direction (the opposite inclusion is needed), so this patch removes the workaround that was put in place and cleans up all files to include the right thing. It also renames the file to LLDBLog to better reflect its purpose.
* [AArch64][SVE] Fold vselect into predicated fmul, fsub and faddMatt Devereau2022-02-03-5/+112
| | | | | | | | | Fold vselect with an unpredicated fmul/fsub/fadd operand into a predicated fmul/fsub/fadd: (vselect (p) (op (a) (b)) (a)) => (op -> (p) (a) (b)) Differential Revision: https://reviews.llvm.org/D117689
* [clangd] IncludeCleaner: Decrease API dependency on clangdKirill Bobyrev2022-02-03-18/+44
| | | | | | Reviewed By: sammccall Differential Revision: https://reviews.llvm.org/D118882